2023-11-20 15:43:22,499 INFO [train_asr.py:1289] (1/4) Training started 2023-11-20 15:43:22,500 INFO [train_asr.py:1299] (1/4) Device: cuda:1 2023-11-20 15:43:22,503 INFO [train_asr.py:1311] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': '16e77b48-dirty', 'icefall-git-date': 'Mon Nov 20 11:32:19 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-qfn6b', 'IP address': '10.177.58.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 15, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-20 15:43:22,503 INFO [train_asr.py:1320] (1/4) About to create model 2023-11-20 15:43:23,554 INFO [train_asr.py:1324] (1/4) Number of model parameters: 65819362 2023-11-20 15:43:23,555 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-14.pt 2023-11-20 15:43:27,110 INFO [train_asr.py:1352] (1/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-20 15:43:32,062 INFO [train_asr.py:1361] (1/4) Using DDP 2023-11-20 15:43:32,379 INFO [train_asr.py:1384] (1/4) Loading optimizer state dict 2023-11-20 15:43:33,230 INFO [train_asr.py:1392] (1/4) Loading scheduler state dict 2023-11-20 15:43:33,233 INFO [train_asr.py:1414] (1/4) Getting audioset cuts 2023-11-20 15:43:33,233 INFO [kd_datamodule.py:796] (1/4) About to get the audioset cuts. 2023-11-20 15:43:33,236 INFO [train_asr.py:1420] (1/4) Using mux to combine Librispeech with audioset 2023-11-20 15:43:33,236 INFO [train_asr.py:1430] (1/4) CutSet(len=2748469) [underlying data type: ] 2023-11-20 15:43:48,686 INFO [kd_datamodule.py:396] (1/4) Enable MUSAN 2023-11-20 15:43:48,686 INFO [kd_datamodule.py:397] (1/4) About to get Musan cuts 2023-11-20 15:43:52,249 INFO [kd_datamodule.py:427] (1/4) Enable SpecAugment 2023-11-20 15:43:52,250 INFO [kd_datamodule.py:428] (1/4) Time warp factor: 80 2023-11-20 15:43:52,250 INFO [kd_datamodule.py:438] (1/4) Num frame mask: 10 2023-11-20 15:43:52,250 INFO [kd_datamodule.py:451] (1/4) About to create train dataset 2023-11-20 15:43:52,257 INFO [kd_datamodule.py:487] (1/4) Using SimpleCutSampler 2023-11-20 15:43:52,257 INFO [kd_datamodule.py:495] (1/4) About to create train dataloader 2023-11-20 15:43:52,275 INFO [kd_datamodule.py:814] (1/4) About to get the audioset eval cuts. 2023-11-20 15:43:52,277 INFO [train_asr.py:1494] (1/4) CutSet(len=20681) [underlying data type: ] 2023-11-20 15:43:52,379 INFO [kd_datamodule.py:529] (1/4) About to create dev dataset 2023-11-20 15:43:53,191 INFO [kd_datamodule.py:550] (1/4) About to create dev dataloader 2023-11-20 15:43:53,191 INFO [train_asr.py:1508] (1/4) Loading grad scaler state dict 2023-11-20 15:44:29,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-20 15:44:29,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-20 15:44:30,160 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 0, loss[loss=0.1081, simple_loss=0.1263, pruned_loss=0.02442, audio_tagging_loss=0.02051, over 15873.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1263, pruned_loss=0.02442, audio_tagging_loss=0.02051, over 15873.00 frames. ], batch size: 56, lr: 4.68e-03, grad_scale: 32.0 2023-11-20 15:44:30,160 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 15:44:54,991 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1806, 2.2459, 5.0578, 2.6221], device='cuda:1') 2023-11-20 15:45:06,709 INFO [train_asr.py:1253] (1/4) Epoch 15, validation: loss=0.06153, simple_loss=0.05347, pruned_loss=0.005654, audio_tagging_loss=0.02914, over 4681554.00 frames. 2023-11-20 15:45:06,710 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 15:45:10,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2023-11-20 15:45:11,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1122200.0, ans=0.125 2023-11-20 15:45:12,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.292e+01 9.006e+01 9.945e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 15:45:19,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2023-11-20 15:45:32,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168350 2023-11-20 15:45:32,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1122266.6666666667, ans=0.0 2023-11-20 15:45:39,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1122333.3333333333, ans=0.125 2023-11-20 15:45:44,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1122333.3333333333, ans=0.035 2023-11-20 15:46:04,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-20 15:46:12,506 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 50, loss[loss=0.1003, simple_loss=0.127, pruned_loss=0.02113, audio_tagging_loss=0.01573, over 15203.00 frames. ], tot_loss[loss=0.08892, simple_loss=0.1009, pruned_loss=0.0194, audio_tagging_loss=0.01908, over 687095.34 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 15:46:36,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1122600.0, ans=0.125 2023-11-20 15:46:38,034 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168400 2023-11-20 15:46:50,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1122666.6666666667, ans=0.125 2023-11-20 15:47:00,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1122733.3333333333, ans=0.0 2023-11-20 15:47:11,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1122800.0, ans=0.125 2023-11-20 15:47:20,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1122866.6666666667, ans=0.125 2023-11-20 15:47:21,449 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 100, loss[loss=0.08062, simple_loss=0.1012, pruned_loss=0.01565, audio_tagging_loss=0.01436, over 15213.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.09988, pruned_loss=0.01902, audio_tagging_loss=0.01831, over 1207160.22 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:47:27,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.856e+01 9.417e+01 9.999e+01 1.434e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-20 15:47:44,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168450 2023-11-20 15:48:07,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1123066.6666666667, ans=0.125 2023-11-20 15:48:22,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1123133.3333333333, ans=0.125 2023-11-20 15:48:27,328 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 150, loss[loss=0.07988, simple_loss=0.08448, pruned_loss=0.02408, audio_tagging_loss=0.01357, over 15005.00 frames. ], tot_loss[loss=0.08482, simple_loss=0.09927, pruned_loss=0.0187, audio_tagging_loss=0.01649, over 1617806.60 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:48:40,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1123266.6666666667, ans=0.125 2023-11-20 15:48:50,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168500 2023-11-20 15:49:03,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1123333.3333333333, ans=0.025 2023-11-20 15:49:13,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1123400.0, ans=0.125 2023-11-20 15:49:21,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1123466.6666666667, ans=0.2 2023-11-20 15:49:31,866 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 200, loss[loss=0.08133, simple_loss=0.1079, pruned_loss=0.0193, audio_tagging_loss=0.008056, over 15647.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.09942, pruned_loss=0.01894, audio_tagging_loss=0.01471, over 1929667.32 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:49:38,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.283e+01 8.917e+01 9.795e+01 1.407e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 15:49:56,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168550 2023-11-20 15:50:14,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2023-11-20 15:50:38,419 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 250, loss[loss=0.08868, simple_loss=0.1092, pruned_loss=0.02477, audio_tagging_loss=0.009323, over 14965.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1001, pruned_loss=0.01922, audio_tagging_loss=0.01322, over 2177684.21 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:51:00,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168600 2023-11-20 15:51:00,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1123933.3333333333, ans=0.125 2023-11-20 15:51:20,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1124066.6666666667, ans=0.125 2023-11-20 15:51:31,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1124133.3333333333, ans=0.0 2023-11-20 15:51:43,526 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 300, loss[loss=0.0687, simple_loss=0.09189, pruned_loss=0.01441, audio_tagging_loss=0.008346, over 14643.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1001, pruned_loss=0.0194, audio_tagging_loss=0.01209, over 2375716.54 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:51:49,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.496e+01 9.198e+01 1.000e+02 1.340e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-20 15:52:05,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168650 2023-11-20 15:52:45,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1124466.6666666667, ans=0.0 2023-11-20 15:52:47,377 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 350, loss[loss=0.04973, simple_loss=0.05948, pruned_loss=0.008276, audio_tagging_loss=0.01171, over 14430.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1, pruned_loss=0.01929, audio_tagging_loss=0.0115, over 2520800.73 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:53:01,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1124600.0, ans=0.125 2023-11-20 15:53:11,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168700 2023-11-20 15:53:16,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1124666.6666666667, ans=0.2 2023-11-20 15:53:18,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-20 15:53:20,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-20 15:53:20,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1124666.6666666667, ans=15.0 2023-11-20 15:53:24,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1124666.6666666667, ans=0.0 2023-11-20 15:53:28,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-11-20 15:53:47,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1124800.0, ans=0.0 2023-11-20 15:53:52,673 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 400, loss[loss=0.07566, simple_loss=0.1011, pruned_loss=0.01611, audio_tagging_loss=0.008986, over 16307.00 frames. ], tot_loss[loss=0.08018, simple_loss=0.09997, pruned_loss=0.01906, audio_tagging_loss=0.01113, over 2635577.49 frames. ], batch size: 61, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 15:53:54,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1124866.6666666667, ans=0.02 2023-11-20 15:53:59,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.749e+01 8.311e+01 9.076e+01 1.027e+02 1.296e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-20 15:54:15,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168750 2023-11-20 15:54:44,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1125133.3333333333, ans=0.125 2023-11-20 15:54:44,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.51 vs. limit=22.5 2023-11-20 15:54:57,580 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 450, loss[loss=0.103, simple_loss=0.1329, pruned_loss=0.02976, audio_tagging_loss=0.006831, over 16123.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1002, pruned_loss=0.01923, audio_tagging_loss=0.01081, over 2733405.37 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 15:55:19,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168800 2023-11-20 15:55:24,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-20 15:55:44,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1125400.0, ans=0.2 2023-11-20 15:55:47,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-20 15:56:03,100 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 500, loss[loss=0.06516, simple_loss=0.08361, pruned_loss=0.01364, audio_tagging_loss=0.009719, over 15480.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09952, pruned_loss=0.01902, audio_tagging_loss=0.0105, over 2799937.52 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:56:08,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1125533.3333333333, ans=0.125 2023-11-20 15:56:10,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.084e+01 8.651e+01 9.631e+01 1.244e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 15:56:27,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168850 2023-11-20 15:56:40,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-20 15:56:41,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1125666.6666666667, ans=0.125 2023-11-20 15:57:00,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-20 15:57:02,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1125800.0, ans=10.0 2023-11-20 15:57:06,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1125800.0, ans=0.02 2023-11-20 15:57:10,344 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 550, loss[loss=0.07418, simple_loss=0.1046, pruned_loss=0.01483, audio_tagging_loss=0.007032, over 14681.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.09948, pruned_loss=0.01912, audio_tagging_loss=0.01039, over 2850841.25 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:57:10,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1125866.6666666667, ans=10.0 2023-11-20 15:57:12,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1125866.6666666667, ans=10.0 2023-11-20 15:57:15,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1125866.6666666667, ans=0.04949747468305833 2023-11-20 15:57:22,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-11-20 15:57:29,030 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:57:30,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1125933.3333333333, ans=15.0 2023-11-20 15:57:32,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1125933.3333333333, ans=0.0 2023-11-20 15:57:34,031 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168900 2023-11-20 15:57:59,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1126066.6666666667, ans=0.1 2023-11-20 15:58:06,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1126133.3333333333, ans=0.2 2023-11-20 15:58:11,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1126133.3333333333, ans=0.0 2023-11-20 15:58:16,251 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 600, loss[loss=0.0657, simple_loss=0.07204, pruned_loss=0.01676, audio_tagging_loss=0.01292, over 14202.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09909, pruned_loss=0.01909, audio_tagging_loss=0.01033, over 2889192.46 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:58:23,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 7.985e+01 8.751e+01 9.752e+01 1.592e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 15:58:38,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168950 2023-11-20 15:58:59,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1126400.0, ans=0.0 2023-11-20 15:59:20,618 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 650, loss[loss=0.1008, simple_loss=0.1347, pruned_loss=0.02704, audio_tagging_loss=0.006414, over 15366.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1002, pruned_loss=0.01917, audio_tagging_loss=0.01018, over 2925964.97 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 15:59:39,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1126600.0, ans=0.125 2023-11-20 15:59:44,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169000 2023-11-20 15:59:47,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1126666.6666666667, ans=0.0 2023-11-20 15:59:52,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1126666.6666666667, ans=0.125 2023-11-20 15:59:56,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1126666.6666666667, ans=0.125 2023-11-20 16:00:00,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1126733.3333333333, ans=0.07 2023-11-20 16:00:07,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1126733.3333333333, ans=0.0 2023-11-20 16:00:14,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1126800.0, ans=0.1 2023-11-20 16:00:15,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 16:00:20,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-11-20 16:00:26,799 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 700, loss[loss=0.08157, simple_loss=0.1054, pruned_loss=0.01593, audio_tagging_loss=0.01292, over 15680.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.1008, pruned_loss=0.01934, audio_tagging_loss=0.01014, over 2950689.46 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 16:00:28,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1126866.6666666667, ans=0.09899494936611666 2023-11-20 16:00:34,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.436e+01 8.027e+01 8.656e+01 9.309e+01 1.974e+02, threshold=1.731e+02, percent-clipped=1.0 2023-11-20 16:00:39,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1126933.3333333333, ans=0.125 2023-11-20 16:00:50,254 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169050 2023-11-20 16:01:10,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1127066.6666666667, ans=0.0 2023-11-20 16:01:18,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-11-20 16:01:25,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1127133.3333333333, ans=0.125 2023-11-20 16:01:31,574 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 750, loss[loss=0.0754, simple_loss=0.09985, pruned_loss=0.0187, audio_tagging_loss=0.006773, over 14811.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1015, pruned_loss=0.01942, audio_tagging_loss=0.01012, over 2977649.01 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 16:01:32,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-20 16:01:33,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1127200.0, ans=0.05 2023-11-20 16:01:35,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-20 16:01:35,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-20 16:01:44,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 16:01:48,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2023-11-20 16:01:49,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 16:01:49,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1127266.6666666667, ans=0.0 2023-11-20 16:01:50,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 16:01:52,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2023-11-20 16:01:54,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169100 2023-11-20 16:01:59,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1127333.3333333333, ans=0.1 2023-11-20 16:02:09,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1127400.0, ans=0.1 2023-11-20 16:02:11,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1127400.0, ans=0.0 2023-11-20 16:02:15,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1127400.0, ans=0.125 2023-11-20 16:02:27,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1127466.6666666667, ans=0.125 2023-11-20 16:02:30,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1127466.6666666667, ans=0.0 2023-11-20 16:02:36,480 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 800, loss[loss=0.06272, simple_loss=0.07575, pruned_loss=0.01445, audio_tagging_loss=0.01039, over 15751.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1013, pruned_loss=0.01933, audio_tagging_loss=0.01022, over 2992689.74 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:02:37,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2023-11-20 16:02:43,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.081e+01 8.651e+01 9.218e+01 1.161e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 16:02:50,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:03:01,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169150 2023-11-20 16:03:24,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1127733.3333333333, ans=0.1 2023-11-20 16:03:29,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1127800.0, ans=0.125 2023-11-20 16:03:35,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-20 16:03:38,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1127800.0, ans=0.125 2023-11-20 16:03:41,760 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 850, loss[loss=0.08337, simple_loss=0.1074, pruned_loss=0.02048, audio_tagging_loss=0.009198, over 15493.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.09979, pruned_loss=0.01904, audio_tagging_loss=0.01033, over 3010848.32 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:04:05,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169200 2023-11-20 16:04:12,097 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:04:19,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1128000.0, ans=0.2 2023-11-20 16:04:26,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-20 16:04:40,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1128133.3333333333, ans=0.0 2023-11-20 16:04:44,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1128133.3333333333, ans=0.05 2023-11-20 16:04:48,005 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 900, loss[loss=0.06033, simple_loss=0.07662, pruned_loss=0.01166, audio_tagging_loss=0.01036, over 15738.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1011, pruned_loss=0.01913, audio_tagging_loss=0.01025, over 3025095.06 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:04:54,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1128200.0, ans=0.0 2023-11-20 16:04:55,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.576e+01 8.050e+01 8.717e+01 9.449e+01 1.348e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 16:05:11,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169250 2023-11-20 16:05:41,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1128466.6666666667, ans=0.125 2023-11-20 16:05:41,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-11-20 16:05:49,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1128466.6666666667, ans=0.0 2023-11-20 16:05:52,995 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 950, loss[loss=0.07506, simple_loss=0.09183, pruned_loss=0.01875, audio_tagging_loss=0.01039, over 14325.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.1006, pruned_loss=0.01911, audio_tagging_loss=0.01024, over 3030944.96 frames. ], batch size: 53, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:05:55,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-20 16:06:17,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169300 2023-11-20 16:06:18,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1128666.6666666667, ans=0.05 2023-11-20 16:06:37,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-11-20 16:06:46,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:06:46,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1128800.0, ans=0.125 2023-11-20 16:06:54,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1128800.0, ans=0.125 2023-11-20 16:06:57,905 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1000, loss[loss=0.09248, simple_loss=0.1213, pruned_loss=0.02683, audio_tagging_loss=0.005015, over 15018.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09969, pruned_loss=0.01883, audio_tagging_loss=0.01002, over 3034349.78 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:07:06,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.366e+01 7.987e+01 8.758e+01 9.641e+01 1.276e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 16:07:14,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1128933.3333333333, ans=0.1 2023-11-20 16:07:22,267 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169350 2023-11-20 16:07:25,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2023-11-20 16:07:25,967 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:07:52,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1129133.3333333333, ans=0.95 2023-11-20 16:08:04,568 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1050, loss[loss=0.0889, simple_loss=0.1186, pruned_loss=0.02351, audio_tagging_loss=0.006107, over 15109.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.1001, pruned_loss=0.0191, audio_tagging_loss=0.009798, over 3039173.50 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:08:06,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1129200.0, ans=0.2 2023-11-20 16:08:08,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1129200.0, ans=0.0 2023-11-20 16:08:27,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169400 2023-11-20 16:08:30,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1129333.3333333333, ans=0.125 2023-11-20 16:08:48,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-11-20 16:08:52,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-20 16:09:03,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1129466.6666666667, ans=0.0 2023-11-20 16:09:04,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129466.6666666667, ans=0.1 2023-11-20 16:09:07,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1129466.6666666667, ans=0.125 2023-11-20 16:09:09,316 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1100, loss[loss=0.05351, simple_loss=0.0656, pruned_loss=0.009057, audio_tagging_loss=0.01165, over 13970.00 frames. ], tot_loss[loss=0.07816, simple_loss=0.09917, pruned_loss=0.01879, audio_tagging_loss=0.009781, over 3039480.85 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:09:11,874 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:09:16,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.137e+01 8.767e+01 9.510e+01 1.319e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 16:09:20,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1129600.0, ans=0.1 2023-11-20 16:09:33,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169450 2023-11-20 16:09:34,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129600.0, ans=0.1 2023-11-20 16:09:37,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1129666.6666666667, ans=0.125 2023-11-20 16:09:57,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1129733.3333333333, ans=0.125 2023-11-20 16:10:06,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1129800.0, ans=0.04949747468305833 2023-11-20 16:10:07,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1129800.0, ans=0.125 2023-11-20 16:10:13,840 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1150, loss[loss=0.08833, simple_loss=0.1206, pruned_loss=0.02011, audio_tagging_loss=0.007916, over 15105.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09959, pruned_loss=0.0188, audio_tagging_loss=0.009804, over 3039845.61 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:10:15,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1129866.6666666667, ans=0.0 2023-11-20 16:10:34,011 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:10:36,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1129933.3333333333, ans=0.125 2023-11-20 16:10:38,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169500 2023-11-20 16:10:52,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1130066.6666666667, ans=0.0 2023-11-20 16:10:52,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1130066.6666666667, ans=0.0 2023-11-20 16:10:56,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1130066.6666666667, ans=0.0 2023-11-20 16:10:56,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1130066.6666666667, ans=0.125 2023-11-20 16:11:05,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1130133.3333333333, ans=0.125 2023-11-20 16:11:09,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2023-11-20 16:11:21,448 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1200, loss[loss=0.06792, simple_loss=0.0869, pruned_loss=0.01474, audio_tagging_loss=0.009725, over 15441.00 frames. ], tot_loss[loss=0.07872, simple_loss=0.1003, pruned_loss=0.01888, audio_tagging_loss=0.009686, over 3043701.68 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 16:11:30,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.141e+01 8.800e+01 9.449e+01 1.223e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-20 16:11:44,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169550 2023-11-20 16:11:44,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-20 16:11:50,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 16:11:56,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-20 16:12:06,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1130400.0, ans=0.125 2023-11-20 16:12:11,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1130400.0, ans=0.125 2023-11-20 16:12:16,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1130466.6666666667, ans=0.1 2023-11-20 16:12:26,143 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1250, loss[loss=0.07797, simple_loss=0.09986, pruned_loss=0.0194, audio_tagging_loss=0.008641, over 14542.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.1001, pruned_loss=0.01886, audio_tagging_loss=0.009689, over 3038978.46 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 16:12:36,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1130533.3333333333, ans=0.2 2023-11-20 16:12:44,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1130600.0, ans=0.035 2023-11-20 16:12:47,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1130600.0, ans=0.125 2023-11-20 16:12:49,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169600 2023-11-20 16:12:53,031 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:12:55,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1130666.6666666667, ans=0.0 2023-11-20 16:13:30,636 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1300, loss[loss=0.0715, simple_loss=0.09557, pruned_loss=0.01367, audio_tagging_loss=0.01004, over 16123.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.09974, pruned_loss=0.01867, audio_tagging_loss=0.009712, over 3040413.93 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 16:13:41,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.174e+01 8.827e+01 9.605e+01 1.804e+02, threshold=1.765e+02, percent-clipped=1.0 2023-11-20 16:13:55,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169650 2023-11-20 16:14:10,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:14:14,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1131066.6666666667, ans=0.0 2023-11-20 16:14:20,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131066.6666666667, ans=0.1 2023-11-20 16:14:37,373 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1350, loss[loss=0.07749, simple_loss=0.0965, pruned_loss=0.01888, audio_tagging_loss=0.01036, over 13911.00 frames. ], tot_loss[loss=0.07832, simple_loss=0.09979, pruned_loss=0.01872, audio_tagging_loss=0.009702, over 3043714.47 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 16:14:39,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1131200.0, ans=0.125 2023-11-20 16:15:00,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169700 2023-11-20 16:15:23,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1131400.0, ans=0.0 2023-11-20 16:15:23,986 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:15:27,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1131400.0, ans=0.125 2023-11-20 16:15:30,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1131466.6666666667, ans=0.2 2023-11-20 16:15:35,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1131466.6666666667, ans=0.125 2023-11-20 16:15:35,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1131466.6666666667, ans=0.0 2023-11-20 16:15:42,466 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1400, loss[loss=0.08037, simple_loss=0.09747, pruned_loss=0.02131, audio_tagging_loss=0.01033, over 15785.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.09947, pruned_loss=0.01879, audio_tagging_loss=0.009828, over 3040442.20 frames. ], batch size: 61, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 16:15:52,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 7.986e+01 8.672e+01 9.769e+01 1.289e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 16:16:05,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169750 2023-11-20 16:16:11,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1131666.6666666667, ans=0.125 2023-11-20 16:16:24,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1131733.3333333333, ans=0.015 2023-11-20 16:16:28,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1131733.3333333333, ans=0.025 2023-11-20 16:16:47,119 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1450, loss[loss=0.07472, simple_loss=0.1022, pruned_loss=0.01432, audio_tagging_loss=0.009308, over 14984.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09975, pruned_loss=0.01897, audio_tagging_loss=0.009899, over 3041417.40 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 16:17:11,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169800 2023-11-20 16:17:15,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1132000.0, ans=0.2 2023-11-20 16:17:17,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1132000.0, ans=22.5 2023-11-20 16:17:20,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1132000.0, ans=0.1 2023-11-20 16:17:23,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=22.5 2023-11-20 16:17:33,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1132066.6666666667, ans=0.2 2023-11-20 16:17:38,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2023-11-20 16:17:47,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-20 16:17:52,597 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1500, loss[loss=0.07926, simple_loss=0.1093, pruned_loss=0.01617, audio_tagging_loss=0.008441, over 14695.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1009, pruned_loss=0.01928, audio_tagging_loss=0.009864, over 3046133.39 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 8.0 2023-11-20 16:17:52,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1132200.0, ans=0.09899494936611666 2023-11-20 16:18:04,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.236e+01 9.090e+01 9.607e+01 1.235e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 16:18:16,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169850 2023-11-20 16:18:25,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1132333.3333333333, ans=0.0 2023-11-20 16:18:32,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1132400.0, ans=0.1 2023-11-20 16:18:46,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1132466.6666666667, ans=0.0 2023-11-20 16:18:58,735 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1550, loss[loss=0.09703, simple_loss=0.122, pruned_loss=0.02533, audio_tagging_loss=0.01072, over 15498.00 frames. ], tot_loss[loss=0.07978, simple_loss=0.101, pruned_loss=0.01933, audio_tagging_loss=0.009955, over 3048937.50 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 8.0 2023-11-20 16:18:59,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-20 16:19:09,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.27 vs. limit=10.0 2023-11-20 16:19:10,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1132600.0, ans=0.2 2023-11-20 16:19:21,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169900 2023-11-20 16:20:03,526 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1600, loss[loss=0.06508, simple_loss=0.08137, pruned_loss=0.01242, audio_tagging_loss=0.01198, over 15551.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.1008, pruned_loss=0.01921, audio_tagging_loss=0.01003, over 3045253.72 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:20:14,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.371e+01 8.069e+01 8.651e+01 9.390e+01 1.565e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 16:20:18,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1132933.3333333333, ans=0.1 2023-11-20 16:20:27,759 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169950 2023-11-20 16:20:36,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1133000.0, ans=0.125 2023-11-20 16:20:36,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1133000.0, ans=0.125 2023-11-20 16:21:10,160 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1650, loss[loss=0.09821, simple_loss=0.1201, pruned_loss=0.02848, audio_tagging_loss=0.009668, over 15495.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1011, pruned_loss=0.0192, audio_tagging_loss=0.01005, over 3053442.92 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:21:24,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1133266.6666666667, ans=0.05 2023-11-20 16:21:33,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170000 2023-11-20 16:21:34,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-20 16:21:53,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1133400.0, ans=0.09899494936611666 2023-11-20 16:22:15,872 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1700, loss[loss=0.112, simple_loss=0.1465, pruned_loss=0.03161, audio_tagging_loss=0.00709, over 14583.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1017, pruned_loss=0.01921, audio_tagging_loss=0.01008, over 3053230.79 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:22:27,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.019e+01 8.780e+01 9.587e+01 1.403e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 16:22:29,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-11-20 16:22:35,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1133600.0, ans=0.125 2023-11-20 16:22:38,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170050 2023-11-20 16:22:39,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-11-20 16:22:50,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1133666.6666666667, ans=0.125 2023-11-20 16:23:00,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1133733.3333333333, ans=0.0 2023-11-20 16:23:21,152 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1750, loss[loss=0.06944, simple_loss=0.08596, pruned_loss=0.01678, audio_tagging_loss=0.009679, over 14915.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.101, pruned_loss=0.01897, audio_tagging_loss=0.009938, over 3044110.60 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:23:44,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170100 2023-11-20 16:23:51,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1134000.0, ans=0.125 2023-11-20 16:24:01,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2023-11-20 16:24:04,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 16:24:07,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 16:24:10,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1134066.6666666667, ans=0.0 2023-11-20 16:24:16,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2023-11-20 16:24:21,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1134133.3333333333, ans=0.125 2023-11-20 16:24:26,745 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1800, loss[loss=0.07731, simple_loss=0.1013, pruned_loss=0.01961, audio_tagging_loss=0.007057, over 14632.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.1011, pruned_loss=0.01884, audio_tagging_loss=0.009881, over 3043152.27 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:24:27,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1134200.0, ans=10.0 2023-11-20 16:24:39,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.001e+01 8.710e+01 9.346e+01 1.351e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 16:24:51,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170150 2023-11-20 16:25:00,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 16:25:04,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2023-11-20 16:25:12,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1134400.0, ans=0.05 2023-11-20 16:25:18,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1134466.6666666667, ans=0.0 2023-11-20 16:25:32,385 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1850, loss[loss=0.07099, simple_loss=0.09451, pruned_loss=0.01333, audio_tagging_loss=0.01041, over 16119.00 frames. ], tot_loss[loss=0.07827, simple_loss=0.09957, pruned_loss=0.01869, audio_tagging_loss=0.009797, over 3040004.34 frames. ], batch size: 62, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:25:32,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1134533.3333333333, ans=0.125 2023-11-20 16:25:41,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1134533.3333333333, ans=0.125 2023-11-20 16:25:46,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1134600.0, ans=0.125 2023-11-20 16:25:52,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-20 16:25:55,416 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170200 2023-11-20 16:26:02,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1134666.6666666667, ans=0.0 2023-11-20 16:26:11,566 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:26:30,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1134800.0, ans=0.125 2023-11-20 16:26:32,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1134800.0, ans=0.025 2023-11-20 16:26:32,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-20 16:26:33,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1134800.0, ans=0.04949747468305833 2023-11-20 16:26:34,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-20 16:26:38,595 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1900, loss[loss=0.07854, simple_loss=0.1111, pruned_loss=0.01392, audio_tagging_loss=0.00909, over 16360.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.1005, pruned_loss=0.01882, audio_tagging_loss=0.009707, over 3043244.21 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:26:45,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1134866.6666666667, ans=0.05 2023-11-20 16:26:48,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1134866.6666666667, ans=0.125 2023-11-20 16:26:49,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.240e+01 7.901e+01 8.698e+01 9.524e+01 1.257e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 16:26:55,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1134933.3333333333, ans=0.0 2023-11-20 16:27:02,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170250 2023-11-20 16:27:14,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1135000.0, ans=0.125 2023-11-20 16:27:17,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1135066.6666666667, ans=0.0 2023-11-20 16:27:30,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-20 16:27:33,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1135133.3333333333, ans=0.125 2023-11-20 16:27:43,349 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 1950, loss[loss=0.06981, simple_loss=0.09083, pruned_loss=0.01457, audio_tagging_loss=0.009823, over 15000.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.1002, pruned_loss=0.01873, audio_tagging_loss=0.009734, over 3043571.47 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:28:07,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170300 2023-11-20 16:28:17,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1135333.3333333333, ans=0.125 2023-11-20 16:28:49,887 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2000, loss[loss=0.07006, simple_loss=0.08561, pruned_loss=0.01534, audio_tagging_loss=0.01191, over 13589.00 frames. ], tot_loss[loss=0.07865, simple_loss=0.09993, pruned_loss=0.0189, audio_tagging_loss=0.009787, over 3032147.47 frames. ], batch size: 52, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 16:29:00,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 7.997e+01 8.751e+01 9.649e+01 1.208e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 16:29:12,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170350 2023-11-20 16:29:36,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1135733.3333333333, ans=0.125 2023-11-20 16:29:54,325 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2050, loss[loss=0.07143, simple_loss=0.09023, pruned_loss=0.01735, audio_tagging_loss=0.008972, over 15265.00 frames. ], tot_loss[loss=0.07821, simple_loss=0.09937, pruned_loss=0.01878, audio_tagging_loss=0.009751, over 3035155.22 frames. ], batch size: 60, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 16:30:18,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170400 2023-11-20 16:30:22,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1136000.0, ans=0.125 2023-11-20 16:30:23,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1136000.0, ans=0.125 2023-11-20 16:31:00,365 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2100, loss[loss=0.06977, simple_loss=0.08553, pruned_loss=0.01643, audio_tagging_loss=0.01057, over 14933.00 frames. ], tot_loss[loss=0.07824, simple_loss=0.09917, pruned_loss=0.01886, audio_tagging_loss=0.009792, over 3039149.44 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:31:14,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 8.301e+01 9.019e+01 9.806e+01 1.324e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 16:31:25,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170450 2023-11-20 16:31:31,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1136333.3333333333, ans=0.1 2023-11-20 16:31:41,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1136400.0, ans=6.0 2023-11-20 16:32:07,313 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2150, loss[loss=0.07933, simple_loss=0.09332, pruned_loss=0.02221, audio_tagging_loss=0.01047, over 15308.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.09901, pruned_loss=0.01877, audio_tagging_loss=0.009779, over 3043629.14 frames. ], batch size: 59, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:32:07,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136533.3333333333, ans=0.1 2023-11-20 16:32:07,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1136533.3333333333, ans=0.2 2023-11-20 16:32:09,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-20 16:32:25,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-20 16:32:30,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170500 2023-11-20 16:32:33,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1136666.6666666667, ans=0.0 2023-11-20 16:32:46,631 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:32:59,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1136800.0, ans=0.05 2023-11-20 16:33:12,828 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2200, loss[loss=0.08986, simple_loss=0.1139, pruned_loss=0.02395, audio_tagging_loss=0.008973, over 16219.00 frames. ], tot_loss[loss=0.07792, simple_loss=0.0987, pruned_loss=0.01876, audio_tagging_loss=0.009807, over 3048023.60 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 16:33:16,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1136866.6666666667, ans=0.125 2023-11-20 16:33:25,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.465e+01 9.006e+01 9.663e+01 1.304e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 16:33:35,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136933.3333333333, ans=0.1 2023-11-20 16:33:36,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170550 2023-11-20 16:33:40,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-20 16:34:03,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137066.6666666667, ans=0.1 2023-11-20 16:34:04,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1137133.3333333333, ans=0.0 2023-11-20 16:34:18,609 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2250, loss[loss=0.08938, simple_loss=0.1083, pruned_loss=0.02565, audio_tagging_loss=0.009561, over 15148.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09859, pruned_loss=0.01872, audio_tagging_loss=0.009879, over 3045785.03 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 16:34:22,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1137200.0, ans=0.125 2023-11-20 16:34:30,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1137200.0, ans=0.025 2023-11-20 16:34:33,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137266.6666666667, ans=0.1 2023-11-20 16:34:43,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170600 2023-11-20 16:34:55,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1137333.3333333333, ans=0.125 2023-11-20 16:35:23,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1137466.6666666667, ans=0.125 2023-11-20 16:35:25,862 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2300, loss[loss=0.05903, simple_loss=0.06873, pruned_loss=0.009875, audio_tagging_loss=0.01479, over 14344.00 frames. ], tot_loss[loss=0.07791, simple_loss=0.09863, pruned_loss=0.01866, audio_tagging_loss=0.009944, over 3045462.53 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 16:35:32,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-11-20 16:35:38,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.000e+01 8.643e+01 9.323e+01 1.269e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 16:35:48,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170650 2023-11-20 16:35:58,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2023-11-20 16:36:02,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1137666.6666666667, ans=0.125 2023-11-20 16:36:03,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1137733.3333333333, ans=0.05 2023-11-20 16:36:13,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1137733.3333333333, ans=10.0 2023-11-20 16:36:23,775 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:36:31,382 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2350, loss[loss=0.08071, simple_loss=0.09767, pruned_loss=0.02015, audio_tagging_loss=0.01172, over 15124.00 frames. ], tot_loss[loss=0.07804, simple_loss=0.09903, pruned_loss=0.01861, audio_tagging_loss=0.009922, over 3040876.21 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 16:36:35,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1137866.6666666667, ans=0.0 2023-11-20 16:36:35,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1137866.6666666667, ans=0.125 2023-11-20 16:36:36,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1137866.6666666667, ans=0.125 2023-11-20 16:36:54,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-20 16:36:54,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170700 2023-11-20 16:36:54,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1137933.3333333333, ans=0.125 2023-11-20 16:37:02,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1138000.0, ans=0.125 2023-11-20 16:37:03,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1138000.0, ans=0.0 2023-11-20 16:37:16,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1138066.6666666667, ans=10.0 2023-11-20 16:37:20,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1138066.6666666667, ans=0.025 2023-11-20 16:37:29,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1138133.3333333333, ans=0.1 2023-11-20 16:37:36,634 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2400, loss[loss=0.08229, simple_loss=0.1035, pruned_loss=0.02066, audio_tagging_loss=0.009899, over 14785.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09841, pruned_loss=0.01844, audio_tagging_loss=0.009909, over 3040525.67 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:37:41,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2023-11-20 16:37:50,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.259e+01 8.944e+01 9.702e+01 1.216e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 16:38:01,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170750 2023-11-20 16:38:07,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1138333.3333333333, ans=0.2 2023-11-20 16:38:25,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1138400.0, ans=0.125 2023-11-20 16:38:43,365 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2450, loss[loss=0.0882, simple_loss=0.1089, pruned_loss=0.02032, audio_tagging_loss=0.01344, over 15960.00 frames. ], tot_loss[loss=0.07735, simple_loss=0.09795, pruned_loss=0.01834, audio_tagging_loss=0.01004, over 3046503.86 frames. ], batch size: 61, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:38:43,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1138533.3333333333, ans=0.125 2023-11-20 16:38:56,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1138600.0, ans=0.0 2023-11-20 16:39:06,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170800 2023-11-20 16:39:11,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1138666.6666666667, ans=0.125 2023-11-20 16:39:23,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1138733.3333333333, ans=0.125 2023-11-20 16:39:35,969 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.588e-03 2023-11-20 16:39:48,711 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2500, loss[loss=0.06791, simple_loss=0.08575, pruned_loss=0.01461, audio_tagging_loss=0.01042, over 14971.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09921, pruned_loss=0.01864, audio_tagging_loss=0.009981, over 3038097.23 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:39:48,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1138866.6666666667, ans=0.0 2023-11-20 16:39:57,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1138866.6666666667, ans=0.2 2023-11-20 16:40:00,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138933.3333333333, ans=0.1 2023-11-20 16:40:01,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.113e+01 8.897e+01 9.641e+01 1.351e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 16:40:08,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-20 16:40:11,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170850 2023-11-20 16:40:35,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-11-20 16:40:43,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2023-11-20 16:40:46,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-20 16:40:53,583 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2550, loss[loss=0.09253, simple_loss=0.1044, pruned_loss=0.03165, audio_tagging_loss=0.008679, over 13750.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.09952, pruned_loss=0.01883, audio_tagging_loss=0.01, over 3043900.68 frames. ], batch size: 53, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:41:16,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1139266.6666666667, ans=0.0 2023-11-20 16:41:17,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170900 2023-11-20 16:41:19,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 16:41:19,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1139333.3333333333, ans=0.025 2023-11-20 16:41:22,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1139333.3333333333, ans=0.0 2023-11-20 16:41:22,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2023-11-20 16:41:39,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1139400.0, ans=0.2 2023-11-20 16:41:40,696 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.558e-01 2023-11-20 16:41:41,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2023-11-20 16:41:59,389 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2600, loss[loss=0.06965, simple_loss=0.09166, pruned_loss=0.01497, audio_tagging_loss=0.008847, over 14839.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.098, pruned_loss=0.01838, audio_tagging_loss=0.009957, over 3040587.49 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:42:02,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-11-20 16:42:05,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1139533.3333333333, ans=0.125 2023-11-20 16:42:05,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1139533.3333333333, ans=0.2 2023-11-20 16:42:13,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.062e+01 8.877e+01 9.575e+01 1.292e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 16:42:13,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1139600.0, ans=0.125 2023-11-20 16:42:23,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170950 2023-11-20 16:42:29,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1139666.6666666667, ans=0.0 2023-11-20 16:42:35,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 16:42:35,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1139666.6666666667, ans=0.125 2023-11-20 16:42:44,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1139733.3333333333, ans=0.125 2023-11-20 16:42:58,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1139800.0, ans=0.125 2023-11-20 16:43:01,130 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.535e-01 2023-11-20 16:43:03,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1139800.0, ans=0.1 2023-11-20 16:43:05,753 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2650, loss[loss=0.0886, simple_loss=0.1077, pruned_loss=0.02495, audio_tagging_loss=0.009817, over 14239.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09903, pruned_loss=0.01872, audio_tagging_loss=0.009803, over 3049468.17 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:43:05,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1139866.6666666667, ans=0.0 2023-11-20 16:43:09,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1139866.6666666667, ans=0.125 2023-11-20 16:43:28,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171000 2023-11-20 16:44:06,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-20 16:44:11,713 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2700, loss[loss=0.06563, simple_loss=0.08864, pruned_loss=0.0101, audio_tagging_loss=0.0112, over 16472.00 frames. ], tot_loss[loss=0.07768, simple_loss=0.09869, pruned_loss=0.01859, audio_tagging_loss=0.009747, over 3051680.79 frames. ], batch size: 62, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:44:11,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1140200.0, ans=0.1 2023-11-20 16:44:16,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1140200.0, ans=0.0 2023-11-20 16:44:24,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.274e+01 8.361e+01 8.930e+01 9.610e+01 1.277e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 16:44:34,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-11-20 16:44:35,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171050 2023-11-20 16:44:37,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-20 16:44:44,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140333.3333333333, ans=0.1 2023-11-20 16:44:58,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1140400.0, ans=0.125 2023-11-20 16:45:02,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1140466.6666666667, ans=0.2 2023-11-20 16:45:07,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140466.6666666667, ans=0.1 2023-11-20 16:45:14,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1140533.3333333333, ans=15.0 2023-11-20 16:45:15,523 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2750, loss[loss=0.07549, simple_loss=0.08468, pruned_loss=0.02049, audio_tagging_loss=0.01266, over 15543.00 frames. ], tot_loss[loss=0.0785, simple_loss=0.09932, pruned_loss=0.01903, audio_tagging_loss=0.009809, over 3044983.57 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:45:30,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1140600.0, ans=0.125 2023-11-20 16:45:39,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1140600.0, ans=0.04949747468305833 2023-11-20 16:45:40,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171100 2023-11-20 16:45:50,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1140666.6666666667, ans=0.125 2023-11-20 16:45:55,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-20 16:46:12,582 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:46:13,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-20 16:46:22,411 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2800, loss[loss=0.08577, simple_loss=0.11, pruned_loss=0.01932, audio_tagging_loss=0.01147, over 15305.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.09909, pruned_loss=0.01893, audio_tagging_loss=0.00989, over 3044681.87 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:46:28,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1140866.6666666667, ans=0.125 2023-11-20 16:46:34,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 7.973e+01 8.545e+01 9.454e+01 1.250e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 16:46:37,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2023-11-20 16:46:38,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1140933.3333333333, ans=0.09899494936611666 2023-11-20 16:46:43,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-20 16:46:44,788 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171150 2023-11-20 16:46:49,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-20 16:47:03,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1141066.6666666667, ans=0.2 2023-11-20 16:47:06,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-20 16:47:17,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2023-11-20 16:47:26,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-20 16:47:26,739 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2850, loss[loss=0.08645, simple_loss=0.1162, pruned_loss=0.01923, audio_tagging_loss=0.009124, over 15556.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09915, pruned_loss=0.0189, audio_tagging_loss=0.009863, over 3039625.17 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:47:27,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141200.0, ans=0.1 2023-11-20 16:47:28,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-20 16:47:50,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171200 2023-11-20 16:47:54,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1141333.3333333333, ans=0.2 2023-11-20 16:48:03,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1141333.3333333333, ans=0.1 2023-11-20 16:48:24,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1141466.6666666667, ans=0.125 2023-11-20 16:48:27,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1141466.6666666667, ans=0.2 2023-11-20 16:48:29,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1141466.6666666667, ans=0.125 2023-11-20 16:48:31,822 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2900, loss[loss=0.1094, simple_loss=0.1559, pruned_loss=0.0246, audio_tagging_loss=0.006838, over 16284.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09936, pruned_loss=0.01892, audio_tagging_loss=0.009796, over 3040975.35 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:48:37,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1141533.3333333333, ans=0.125 2023-11-20 16:48:45,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.270e+01 7.954e+01 8.776e+01 9.382e+01 1.409e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 16:48:56,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171250 2023-11-20 16:49:22,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2023-11-20 16:49:26,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-11-20 16:49:37,926 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 2950, loss[loss=0.07144, simple_loss=0.0874, pruned_loss=0.01532, audio_tagging_loss=0.01242, over 15494.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.09978, pruned_loss=0.01905, audio_tagging_loss=0.00983, over 3044179.49 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 16:50:01,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171300 2023-11-20 16:50:24,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1142066.6666666667, ans=0.1 2023-11-20 16:50:27,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2023-11-20 16:50:38,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1142133.3333333333, ans=0.125 2023-11-20 16:50:43,598 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3000, loss[loss=0.08223, simple_loss=0.1024, pruned_loss=0.0206, audio_tagging_loss=0.01041, over 15100.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09939, pruned_loss=0.01901, audio_tagging_loss=0.009833, over 3046537.87 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:50:43,599 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 16:51:23,417 INFO [train_asr.py:1253] (1/4) Epoch 15, validation: loss=0.06163, simple_loss=0.05329, pruned_loss=0.005569, audio_tagging_loss=0.02942, over 4681554.00 frames. 2023-11-20 16:51:23,418 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 16:51:23,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1142200.0, ans=0.125 2023-11-20 16:51:30,040 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:51:36,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.037e+01 8.379e+01 9.112e+01 1.041e+02 3.133e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-20 16:51:42,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2023-11-20 16:51:46,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171350 2023-11-20 16:51:52,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1142333.3333333333, ans=0.125 2023-11-20 16:52:28,800 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3050, loss[loss=0.07836, simple_loss=0.1082, pruned_loss=0.01672, audio_tagging_loss=0.007526, over 13659.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.09974, pruned_loss=0.01903, audio_tagging_loss=0.009843, over 3048493.54 frames. ], batch size: 53, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:52:29,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1142533.3333333333, ans=0.2 2023-11-20 16:52:31,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1142533.3333333333, ans=0.0 2023-11-20 16:52:40,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1142600.0, ans=0.0 2023-11-20 16:52:51,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171400 2023-11-20 16:53:07,710 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 16:53:23,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2023-11-20 16:53:34,128 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3100, loss[loss=0.08775, simple_loss=0.1105, pruned_loss=0.02226, audio_tagging_loss=0.01025, over 15385.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.1005, pruned_loss=0.01909, audio_tagging_loss=0.009931, over 3040926.23 frames. ], batch size: 58, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:53:35,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1142866.6666666667, ans=15.0 2023-11-20 16:53:39,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1142866.6666666667, ans=0.07 2023-11-20 16:53:46,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.278e+01 8.696e+01 9.346e+01 1.301e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 16:53:50,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1142933.3333333333, ans=0.125 2023-11-20 16:53:57,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171450 2023-11-20 16:54:12,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1143066.6666666667, ans=0.125 2023-11-20 16:54:33,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1143133.3333333333, ans=0.0 2023-11-20 16:54:35,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-20 16:54:39,654 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3150, loss[loss=0.09115, simple_loss=0.1187, pruned_loss=0.02561, audio_tagging_loss=0.00621, over 15324.00 frames. ], tot_loss[loss=0.07917, simple_loss=0.1004, pruned_loss=0.01896, audio_tagging_loss=0.01002, over 3037723.98 frames. ], batch size: 58, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:54:42,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1143200.0, ans=0.125 2023-11-20 16:54:52,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1143266.6666666667, ans=0.025 2023-11-20 16:54:59,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1143266.6666666667, ans=0.025 2023-11-20 16:55:00,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1143266.6666666667, ans=0.0 2023-11-20 16:55:03,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171500 2023-11-20 16:55:04,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.67 vs. limit=10.0 2023-11-20 16:55:10,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1143333.3333333333, ans=0.125 2023-11-20 16:55:45,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1143533.3333333333, ans=0.07 2023-11-20 16:55:45,982 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3200, loss[loss=0.08894, simple_loss=0.1165, pruned_loss=0.02125, audio_tagging_loss=0.009427, over 15172.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1004, pruned_loss=0.01902, audio_tagging_loss=0.01009, over 3035663.70 frames. ], batch size: 55, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:55:53,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-20 16:55:58,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.146e+01 8.709e+01 9.815e+01 1.794e+02, threshold=1.742e+02, percent-clipped=1.0 2023-11-20 16:56:09,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171550 2023-11-20 16:56:19,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1143666.6666666667, ans=0.125 2023-11-20 16:56:37,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1143800.0, ans=0.125 2023-11-20 16:56:47,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1143800.0, ans=0.0 2023-11-20 16:56:49,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1143866.6666666667, ans=0.0 2023-11-20 16:56:50,380 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3250, loss[loss=0.06629, simple_loss=0.07828, pruned_loss=0.01748, audio_tagging_loss=0.009673, over 15148.00 frames. ], tot_loss[loss=0.07945, simple_loss=0.1006, pruned_loss=0.01901, audio_tagging_loss=0.01013, over 3041125.99 frames. ], batch size: 57, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:56:54,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1143866.6666666667, ans=0.125 2023-11-20 16:56:56,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1143866.6666666667, ans=0.2 2023-11-20 16:57:14,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171600 2023-11-20 16:57:14,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-20 16:57:27,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1144000.0, ans=0.0 2023-11-20 16:57:38,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1144066.6666666667, ans=0.125 2023-11-20 16:57:38,957 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:57:40,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:57:49,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-20 16:57:50,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1144133.3333333333, ans=0.0 2023-11-20 16:57:55,239 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3300, loss[loss=0.09583, simple_loss=0.1283, pruned_loss=0.02342, audio_tagging_loss=0.008233, over 15180.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1012, pruned_loss=0.01912, audio_tagging_loss=0.01016, over 3048176.39 frames. ], batch size: 55, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:58:05,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:58:09,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.004e+01 8.604e+01 9.340e+01 1.280e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 16:58:16,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1144266.6666666667, ans=0.07 2023-11-20 16:58:20,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171650 2023-11-20 16:58:38,210 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 16:59:02,017 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3350, loss[loss=0.0645, simple_loss=0.08683, pruned_loss=0.0126, audio_tagging_loss=0.008489, over 16191.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.1002, pruned_loss=0.01882, audio_tagging_loss=0.01011, over 3055424.85 frames. ], batch size: 60, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 16:59:24,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171700 2023-11-20 16:59:34,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1144666.6666666667, ans=0.2 2023-11-20 16:59:46,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1144733.3333333333, ans=0.5 2023-11-20 16:59:54,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1144800.0, ans=0.05 2023-11-20 17:00:01,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-20 17:00:06,503 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3400, loss[loss=0.07844, simple_loss=0.09847, pruned_loss=0.0196, audio_tagging_loss=0.009599, over 13823.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.1003, pruned_loss=0.01882, audio_tagging_loss=0.01009, over 3048345.78 frames. ], batch size: 52, lr: 4.63e-03, grad_scale: 16.0 2023-11-20 17:00:11,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1144866.6666666667, ans=0.125 2023-11-20 17:00:16,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1144866.6666666667, ans=0.2 2023-11-20 17:00:20,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.042e+01 8.596e+01 9.172e+01 1.141e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 17:00:22,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1144933.3333333333, ans=0.0 2023-11-20 17:00:22,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-20 17:00:30,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171750 2023-11-20 17:00:32,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1145000.0, ans=0.0 2023-11-20 17:00:38,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1145000.0, ans=0.0 2023-11-20 17:01:12,060 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3450, loss[loss=0.08522, simple_loss=0.1064, pruned_loss=0.02314, audio_tagging_loss=0.008868, over 14774.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09953, pruned_loss=0.01871, audio_tagging_loss=0.01009, over 3048117.18 frames. ], batch size: 58, lr: 4.63e-03, grad_scale: 16.0 2023-11-20 17:01:36,306 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171800 2023-11-20 17:01:37,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-11-20 17:02:14,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1145466.6666666667, ans=0.125 2023-11-20 17:02:16,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1145466.6666666667, ans=0.2 2023-11-20 17:02:18,803 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3500, loss[loss=0.078, simple_loss=0.1036, pruned_loss=0.01822, audio_tagging_loss=0.007999, over 16006.00 frames. ], tot_loss[loss=0.07845, simple_loss=0.09959, pruned_loss=0.01873, audio_tagging_loss=0.009924, over 3051298.34 frames. ], batch size: 60, lr: 4.63e-03, grad_scale: 16.0 2023-11-20 17:02:33,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.582e+01 8.137e+01 8.586e+01 9.265e+01 1.254e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 17:02:41,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171850 2023-11-20 17:02:48,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1145666.6666666667, ans=0.95 2023-11-20 17:02:51,765 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:03:14,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1145800.0, ans=0.125 2023-11-20 17:03:20,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1145800.0, ans=0.125 2023-11-20 17:03:24,264 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3550, loss[loss=0.09096, simple_loss=0.1147, pruned_loss=0.02467, audio_tagging_loss=0.008946, over 15105.00 frames. ], tot_loss[loss=0.07788, simple_loss=0.0991, pruned_loss=0.01851, audio_tagging_loss=0.009824, over 3054694.10 frames. ], batch size: 54, lr: 4.63e-03, grad_scale: 16.0 2023-11-20 17:03:47,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171900 2023-11-20 17:04:03,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146066.6666666667, ans=0.1 2023-11-20 17:04:25,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146133.3333333333, ans=0.1 2023-11-20 17:04:29,143 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3600, loss[loss=0.08944, simple_loss=0.1088, pruned_loss=0.02576, audio_tagging_loss=0.009272, over 15919.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.09824, pruned_loss=0.01856, audio_tagging_loss=0.009768, over 3059971.63 frames. ], batch size: 59, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 17:04:44,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.710e+01 8.052e+01 8.613e+01 9.290e+01 1.173e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 17:04:47,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1146266.6666666667, ans=0.95 2023-11-20 17:04:48,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1146266.6666666667, ans=0.125 2023-11-20 17:04:52,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 17:04:53,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171950 2023-11-20 17:05:17,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146400.0, ans=0.1 2023-11-20 17:05:31,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1146466.6666666667, ans=0.125 2023-11-20 17:05:31,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1146466.6666666667, ans=0.125 2023-11-20 17:05:34,616 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3650, loss[loss=0.1037, simple_loss=0.1271, pruned_loss=0.03076, audio_tagging_loss=0.009332, over 16057.00 frames. ], tot_loss[loss=0.07812, simple_loss=0.09912, pruned_loss=0.0188, audio_tagging_loss=0.009761, over 3052525.40 frames. ], batch size: 58, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 17:05:57,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172000 2023-11-20 17:06:16,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1146733.3333333333, ans=0.125 2023-11-20 17:06:33,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-20 17:06:42,353 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3700, loss[loss=0.06924, simple_loss=0.0825, pruned_loss=0.01696, audio_tagging_loss=0.01103, over 16355.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09959, pruned_loss=0.01883, audio_tagging_loss=0.009675, over 3059752.94 frames. ], batch size: 63, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 17:06:56,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 8.121e+01 8.690e+01 9.493e+01 1.302e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 17:07:05,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172050 2023-11-20 17:07:09,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1147000.0, ans=0.1 2023-11-20 17:07:16,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1147000.0, ans=0.5 2023-11-20 17:07:19,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1147000.0, ans=0.1 2023-11-20 17:07:19,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=12.0 2023-11-20 17:07:34,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1147133.3333333333, ans=0.0 2023-11-20 17:07:46,895 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3750, loss[loss=0.09003, simple_loss=0.1201, pruned_loss=0.0226, audio_tagging_loss=0.007373, over 16211.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09991, pruned_loss=0.01902, audio_tagging_loss=0.009648, over 3064765.24 frames. ], batch size: 59, lr: 4.62e-03, grad_scale: 32.0 2023-11-20 17:07:55,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1147200.0, ans=0.1 2023-11-20 17:07:56,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2023-11-20 17:07:59,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2023-11-20 17:08:11,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172100 2023-11-20 17:08:11,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1147266.6666666667, ans=0.125 2023-11-20 17:08:11,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1147266.6666666667, ans=0.125 2023-11-20 17:08:15,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1147333.3333333333, ans=0.125 2023-11-20 17:08:16,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2023-11-20 17:08:28,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1147400.0, ans=0.125 2023-11-20 17:08:32,669 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:08:45,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1147466.6666666667, ans=0.125 2023-11-20 17:08:49,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1147466.6666666667, ans=0.2 2023-11-20 17:08:52,189 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3800, loss[loss=0.07707, simple_loss=0.1009, pruned_loss=0.01955, audio_tagging_loss=0.007062, over 15131.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.0994, pruned_loss=0.01894, audio_tagging_loss=0.009734, over 3064985.43 frames. ], batch size: 55, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:08:52,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1147533.3333333333, ans=0.0 2023-11-20 17:08:58,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1147533.3333333333, ans=0.125 2023-11-20 17:09:08,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.118e+01 8.690e+01 9.561e+01 1.262e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 17:09:11,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1147600.0, ans=0.125 2023-11-20 17:09:15,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172150 2023-11-20 17:09:16,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1147600.0, ans=0.09899494936611666 2023-11-20 17:09:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1147733.3333333333, ans=0.1 2023-11-20 17:09:36,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-11-20 17:09:46,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1147800.0, ans=0.0 2023-11-20 17:09:58,572 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3850, loss[loss=0.08263, simple_loss=0.1072, pruned_loss=0.01882, audio_tagging_loss=0.01021, over 15336.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1013, pruned_loss=0.01931, audio_tagging_loss=0.009683, over 3066027.73 frames. ], batch size: 57, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:10:21,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172200 2023-11-20 17:10:49,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1148066.6666666667, ans=0.125 2023-11-20 17:10:53,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1148133.3333333333, ans=0.0 2023-11-20 17:10:54,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-11-20 17:11:04,028 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3900, loss[loss=0.1057, simple_loss=0.1387, pruned_loss=0.03087, audio_tagging_loss=0.005499, over 15598.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1015, pruned_loss=0.01936, audio_tagging_loss=0.009745, over 3060538.82 frames. ], batch size: 57, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:11:19,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.599e+01 8.094e+01 8.942e+01 9.624e+01 1.235e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 17:11:28,394 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172250 2023-11-20 17:11:43,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1148400.0, ans=0.2 2023-11-20 17:11:53,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1148400.0, ans=0.2 2023-11-20 17:11:59,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1148466.6666666667, ans=0.125 2023-11-20 17:12:05,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1148466.6666666667, ans=0.0 2023-11-20 17:12:08,856 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 3950, loss[loss=0.05991, simple_loss=0.06882, pruned_loss=0.01485, audio_tagging_loss=0.01065, over 15482.00 frames. ], tot_loss[loss=0.07908, simple_loss=0.1002, pruned_loss=0.01912, audio_tagging_loss=0.009882, over 3049795.16 frames. ], batch size: 59, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:12:24,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1148600.0, ans=0.125 2023-11-20 17:12:25,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1148600.0, ans=0.0 2023-11-20 17:12:32,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172300 2023-11-20 17:12:36,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1148666.6666666667, ans=0.0 2023-11-20 17:12:39,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1148666.6666666667, ans=0.1 2023-11-20 17:12:41,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1148666.6666666667, ans=0.0 2023-11-20 17:12:42,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1148666.6666666667, ans=0.0 2023-11-20 17:13:14,336 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4000, loss[loss=0.08234, simple_loss=0.1, pruned_loss=0.02119, audio_tagging_loss=0.01114, over 14698.00 frames. ], tot_loss[loss=0.07916, simple_loss=0.09996, pruned_loss=0.01915, audio_tagging_loss=0.01003, over 3049656.74 frames. ], batch size: 55, lr: 4.62e-03, grad_scale: 32.0 2023-11-20 17:13:20,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1148866.6666666667, ans=0.125 2023-11-20 17:13:29,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.881e+01 7.998e+01 8.708e+01 9.550e+01 1.131e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 17:13:33,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:13:37,131 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172350 2023-11-20 17:13:46,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-11-20 17:13:54,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1149066.6666666667, ans=0.125 2023-11-20 17:14:13,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1149133.3333333333, ans=0.2 2023-11-20 17:14:18,638 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4050, loss[loss=0.08967, simple_loss=0.1074, pruned_loss=0.02467, audio_tagging_loss=0.01129, over 14564.00 frames. ], tot_loss[loss=0.07888, simple_loss=0.09935, pruned_loss=0.01907, audio_tagging_loss=0.01014, over 3047931.04 frames. ], batch size: 56, lr: 4.62e-03, grad_scale: 32.0 2023-11-20 17:14:22,394 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:14:25,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1149200.0, ans=0.1 2023-11-20 17:14:42,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172400 2023-11-20 17:14:55,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1149333.3333333333, ans=0.1 2023-11-20 17:15:15,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1149466.6666666667, ans=0.125 2023-11-20 17:15:24,176 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4100, loss[loss=0.06735, simple_loss=0.08719, pruned_loss=0.01394, audio_tagging_loss=0.009815, over 15597.00 frames. ], tot_loss[loss=0.07891, simple_loss=0.09947, pruned_loss=0.01901, audio_tagging_loss=0.01016, over 3059245.03 frames. ], batch size: 58, lr: 4.62e-03, grad_scale: 32.0 2023-11-20 17:15:42,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.151e+01 8.717e+01 9.713e+01 1.333e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 17:15:48,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172450 2023-11-20 17:15:51,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1149666.6666666667, ans=0.0 2023-11-20 17:15:52,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1149666.6666666667, ans=0.07 2023-11-20 17:16:04,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-11-20 17:16:07,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2023-11-20 17:16:21,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1149800.0, ans=0.125 2023-11-20 17:16:30,785 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4150, loss[loss=0.06595, simple_loss=0.08937, pruned_loss=0.0148, audio_tagging_loss=0.006462, over 15417.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09911, pruned_loss=0.01888, audio_tagging_loss=0.01004, over 3056464.06 frames. ], batch size: 58, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:16:34,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1149866.6666666667, ans=0.125 2023-11-20 17:16:45,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1149933.3333333333, ans=0.2 2023-11-20 17:16:53,830 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172500 2023-11-20 17:17:05,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1150000.0, ans=0.125 2023-11-20 17:17:07,228 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:17:10,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1150066.6666666667, ans=0.0 2023-11-20 17:17:18,211 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:17:23,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1150133.3333333333, ans=0.0 2023-11-20 17:17:27,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1150133.3333333333, ans=0.09899494936611666 2023-11-20 17:17:36,199 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4200, loss[loss=0.07866, simple_loss=0.1161, pruned_loss=0.0154, audio_tagging_loss=0.005223, over 15105.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09986, pruned_loss=0.01894, audio_tagging_loss=0.009867, over 3058003.94 frames. ], batch size: 57, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:17:41,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-11-20 17:17:46,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1150200.0, ans=0.125 2023-11-20 17:17:52,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.093e+01 8.811e+01 9.474e+01 1.183e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 17:17:59,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172550 2023-11-20 17:18:05,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1150333.3333333333, ans=0.0 2023-11-20 17:18:22,644 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:18:23,931 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.873e-01 2023-11-20 17:18:41,006 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4250, loss[loss=0.07121, simple_loss=0.09723, pruned_loss=0.01632, audio_tagging_loss=0.006283, over 14821.00 frames. ], tot_loss[loss=0.07871, simple_loss=0.1001, pruned_loss=0.01889, audio_tagging_loss=0.009777, over 3054443.06 frames. ], batch size: 56, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:18:58,042 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:19:05,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172600 2023-11-20 17:19:11,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1150666.6666666667, ans=0.1 2023-11-20 17:19:23,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1150733.3333333333, ans=0.125 2023-11-20 17:19:28,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1150733.3333333333, ans=0.0 2023-11-20 17:19:35,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1150800.0, ans=0.0 2023-11-20 17:19:38,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2023-11-20 17:19:47,528 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4300, loss[loss=0.08051, simple_loss=0.1048, pruned_loss=0.02053, audio_tagging_loss=0.007555, over 15560.00 frames. ], tot_loss[loss=0.07919, simple_loss=0.1008, pruned_loss=0.01902, audio_tagging_loss=0.009739, over 3053658.12 frames. ], batch size: 56, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:20:04,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.964e+01 7.979e+01 8.573e+01 9.478e+01 1.236e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 17:20:10,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172650 2023-11-20 17:20:42,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1151133.3333333333, ans=10.0 2023-11-20 17:20:44,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1151133.3333333333, ans=0.0 2023-11-20 17:20:51,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1151200.0, ans=0.09899494936611666 2023-11-20 17:20:52,649 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4350, loss[loss=0.07759, simple_loss=0.09355, pruned_loss=0.02023, audio_tagging_loss=0.01059, over 15210.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09961, pruned_loss=0.01882, audio_tagging_loss=0.00979, over 3048581.86 frames. ], batch size: 56, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:20:59,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-20 17:21:07,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1151266.6666666667, ans=0.0 2023-11-20 17:21:15,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172700 2023-11-20 17:21:21,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1151333.3333333333, ans=0.125 2023-11-20 17:21:57,248 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4400, loss[loss=0.08554, simple_loss=0.1006, pruned_loss=0.02418, audio_tagging_loss=0.01106, over 14536.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.0993, pruned_loss=0.01874, audio_tagging_loss=0.009837, over 3050728.44 frames. ], batch size: 55, lr: 4.62e-03, grad_scale: 32.0 2023-11-20 17:22:00,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1151533.3333333333, ans=0.1 2023-11-20 17:22:05,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1151533.3333333333, ans=0.125 2023-11-20 17:22:07,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-11-20 17:22:16,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.328e+01 8.168e+01 8.779e+01 9.674e+01 1.363e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 17:22:21,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172750 2023-11-20 17:22:21,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1151600.0, ans=0.1 2023-11-20 17:22:21,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-20 17:22:58,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1151800.0, ans=0.2 2023-11-20 17:23:02,742 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4450, loss[loss=0.07501, simple_loss=0.09121, pruned_loss=0.01783, audio_tagging_loss=0.01158, over 16229.00 frames. ], tot_loss[loss=0.0793, simple_loss=0.1008, pruned_loss=0.0192, audio_tagging_loss=0.009679, over 3054768.99 frames. ], batch size: 61, lr: 4.62e-03, grad_scale: 16.0 2023-11-20 17:23:04,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1151866.6666666667, ans=0.0 2023-11-20 17:23:26,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172800 2023-11-20 17:23:40,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1152066.6666666667, ans=0.1 2023-11-20 17:23:58,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1152133.3333333333, ans=0.0 2023-11-20 17:24:08,396 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4500, loss[loss=0.09581, simple_loss=0.1228, pruned_loss=0.02392, audio_tagging_loss=0.01047, over 16034.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.1004, pruned_loss=0.01908, audio_tagging_loss=0.009714, over 3049762.96 frames. ], batch size: 58, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:24:18,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1152200.0, ans=0.0 2023-11-20 17:24:26,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.191e+01 8.215e+01 8.732e+01 9.444e+01 1.886e+02, threshold=1.746e+02, percent-clipped=1.0 2023-11-20 17:24:31,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172850 2023-11-20 17:24:36,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1152333.3333333333, ans=0.0 2023-11-20 17:24:41,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.16 vs. limit=15.0 2023-11-20 17:24:44,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1152333.3333333333, ans=0.0 2023-11-20 17:24:49,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1152400.0, ans=0.1 2023-11-20 17:24:56,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1152400.0, ans=0.125 2023-11-20 17:25:06,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1152466.6666666667, ans=0.04949747468305833 2023-11-20 17:25:12,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1152533.3333333333, ans=0.0 2023-11-20 17:25:13,103 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4550, loss[loss=0.08468, simple_loss=0.1139, pruned_loss=0.01754, audio_tagging_loss=0.01019, over 14929.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09929, pruned_loss=0.01889, audio_tagging_loss=0.009857, over 3048526.53 frames. ], batch size: 55, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:25:31,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1152600.0, ans=0.125 2023-11-20 17:25:32,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-20 17:25:37,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172900 2023-11-20 17:26:02,829 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:26:09,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1152800.0, ans=0.0 2023-11-20 17:26:10,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1152800.0, ans=0.125 2023-11-20 17:26:18,995 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4600, loss[loss=0.06237, simple_loss=0.06791, pruned_loss=0.01585, audio_tagging_loss=0.01257, over 15483.00 frames. ], tot_loss[loss=0.07847, simple_loss=0.09944, pruned_loss=0.01887, audio_tagging_loss=0.00988, over 3051912.04 frames. ], batch size: 61, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:26:27,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1152866.6666666667, ans=0.125 2023-11-20 17:26:36,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.375e+01 8.346e+01 9.151e+01 1.010e+02 1.307e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 17:26:42,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 172950 2023-11-20 17:26:52,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1153000.0, ans=0.2 2023-11-20 17:27:21,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1153133.3333333333, ans=22.5 2023-11-20 17:27:21,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1153133.3333333333, ans=0.125 2023-11-20 17:27:24,109 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4650, loss[loss=0.1045, simple_loss=0.1384, pruned_loss=0.02618, audio_tagging_loss=0.009124, over 15092.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.0996, pruned_loss=0.01897, audio_tagging_loss=0.009871, over 3046689.73 frames. ], batch size: 57, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:27:25,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1153200.0, ans=0.1 2023-11-20 17:27:29,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1153200.0, ans=0.125 2023-11-20 17:27:46,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173000 2023-11-20 17:27:52,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-11-20 17:28:20,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1153466.6666666667, ans=0.125 2023-11-20 17:28:29,538 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4700, loss[loss=0.07065, simple_loss=0.09092, pruned_loss=0.0139, audio_tagging_loss=0.01129, over 15120.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.09935, pruned_loss=0.01897, audio_tagging_loss=0.01002, over 3048111.58 frames. ], batch size: 55, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:28:36,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1153533.3333333333, ans=0.125 2023-11-20 17:28:49,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 7.947e+01 8.690e+01 9.286e+01 1.416e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 17:28:54,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173050 2023-11-20 17:29:13,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1153733.3333333333, ans=0.2 2023-11-20 17:29:25,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=22.5 2023-11-20 17:29:36,593 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4750, loss[loss=0.06998, simple_loss=0.08061, pruned_loss=0.02017, audio_tagging_loss=0.009511, over 16281.00 frames. ], tot_loss[loss=0.078, simple_loss=0.09836, pruned_loss=0.01872, audio_tagging_loss=0.01011, over 3046732.03 frames. ], batch size: 65, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:29:52,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-11-20 17:29:59,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173100 2023-11-20 17:30:08,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1154000.0, ans=0.2 2023-11-20 17:30:28,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1154133.3333333333, ans=0.125 2023-11-20 17:30:30,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1154133.3333333333, ans=0.0 2023-11-20 17:30:42,298 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4800, loss[loss=0.07113, simple_loss=0.09154, pruned_loss=0.01555, audio_tagging_loss=0.009815, over 16079.00 frames. ], tot_loss[loss=0.07822, simple_loss=0.09872, pruned_loss=0.01868, audio_tagging_loss=0.01018, over 3046027.26 frames. ], batch size: 63, lr: 4.61e-03, grad_scale: 32.0 2023-11-20 17:30:47,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1154200.0, ans=0.07 2023-11-20 17:30:59,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 7.777e+01 8.468e+01 9.403e+01 1.279e+02, threshold=1.694e+02, percent-clipped=0.0 2023-11-20 17:31:00,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1154266.6666666667, ans=0.0 2023-11-20 17:31:04,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173150 2023-11-20 17:31:28,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1154400.0, ans=0.0 2023-11-20 17:31:33,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1154466.6666666667, ans=0.2 2023-11-20 17:31:34,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1154466.6666666667, ans=0.0 2023-11-20 17:31:41,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1154466.6666666667, ans=0.125 2023-11-20 17:31:47,285 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4850, loss[loss=0.09211, simple_loss=0.1237, pruned_loss=0.02171, audio_tagging_loss=0.008577, over 16369.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09914, pruned_loss=0.01868, audio_tagging_loss=0.01029, over 3050098.02 frames. ], batch size: 61, lr: 4.61e-03, grad_scale: 32.0 2023-11-20 17:32:07,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1154600.0, ans=0.2 2023-11-20 17:32:11,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173200 2023-11-20 17:32:13,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2023-11-20 17:32:21,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1154666.6666666667, ans=0.125 2023-11-20 17:32:26,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-20 17:32:39,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1154800.0, ans=0.125 2023-11-20 17:32:51,435 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4900, loss[loss=0.07957, simple_loss=0.1086, pruned_loss=0.0168, audio_tagging_loss=0.008444, over 14734.00 frames. ], tot_loss[loss=0.07871, simple_loss=0.09929, pruned_loss=0.0188, audio_tagging_loss=0.01027, over 3043064.83 frames. ], batch size: 56, lr: 4.61e-03, grad_scale: 32.0 2023-11-20 17:33:09,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-20 17:33:09,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.105e+01 8.801e+01 9.548e+01 1.347e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-20 17:33:13,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1154933.3333333333, ans=0.1 2023-11-20 17:33:14,825 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173250 2023-11-20 17:33:37,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1155066.6666666667, ans=0.125 2023-11-20 17:33:55,025 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 4950, loss[loss=0.0584, simple_loss=0.06695, pruned_loss=0.01409, audio_tagging_loss=0.01083, over 14900.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09909, pruned_loss=0.01866, audio_tagging_loss=0.0101, over 3042035.87 frames. ], batch size: 55, lr: 4.61e-03, grad_scale: 32.0 2023-11-20 17:33:55,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1155200.0, ans=0.125 2023-11-20 17:34:04,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1155200.0, ans=0.125 2023-11-20 17:34:16,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173300 2023-11-20 17:34:19,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1155333.3333333333, ans=0.0 2023-11-20 17:34:23,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1155333.3333333333, ans=0.125 2023-11-20 17:34:31,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-20 17:34:32,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:34:57,665 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5000, loss[loss=0.07484, simple_loss=0.103, pruned_loss=0.01496, audio_tagging_loss=0.008355, over 16055.00 frames. ], tot_loss[loss=0.0777, simple_loss=0.09823, pruned_loss=0.01855, audio_tagging_loss=0.01004, over 3038257.67 frames. ], batch size: 62, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:35:10,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:35:15,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1155600.0, ans=0.125 2023-11-20 17:35:16,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.227e+01 8.824e+01 9.880e+01 1.350e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 17:35:20,306 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173350 2023-11-20 17:35:35,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-20 17:35:54,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1155800.0, ans=0.0 2023-11-20 17:35:59,952 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5050, loss[loss=0.06612, simple_loss=0.0841, pruned_loss=0.01209, audio_tagging_loss=0.01199, over 14626.00 frames. ], tot_loss[loss=0.07763, simple_loss=0.09825, pruned_loss=0.01863, audio_tagging_loss=0.00988, over 3045807.60 frames. ], batch size: 53, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:36:19,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1155933.3333333333, ans=0.0 2023-11-20 17:36:23,936 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173400 2023-11-20 17:36:43,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2023-11-20 17:36:56,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1156133.3333333333, ans=0.2 2023-11-20 17:37:02,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1156133.3333333333, ans=0.2 2023-11-20 17:37:04,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1156200.0, ans=0.125 2023-11-20 17:37:04,827 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5100, loss[loss=0.1089, simple_loss=0.1167, pruned_loss=0.03994, audio_tagging_loss=0.01058, over 14939.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.0972, pruned_loss=0.01853, audio_tagging_loss=0.009934, over 3040566.82 frames. ], batch size: 56, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:37:11,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1156200.0, ans=0.125 2023-11-20 17:37:23,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.611e+01 8.188e+01 8.649e+01 9.367e+01 2.807e+02, threshold=1.730e+02, percent-clipped=1.0 2023-11-20 17:37:26,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173450 2023-11-20 17:37:27,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1156266.6666666667, ans=0.0 2023-11-20 17:38:02,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1156466.6666666667, ans=0.1 2023-11-20 17:38:07,894 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5150, loss[loss=0.06538, simple_loss=0.08067, pruned_loss=0.01587, audio_tagging_loss=0.009176, over 15984.00 frames. ], tot_loss[loss=0.07729, simple_loss=0.09766, pruned_loss=0.01844, audio_tagging_loss=0.01002, over 3035392.31 frames. ], batch size: 61, lr: 4.61e-03, grad_scale: 16.0 2023-11-20 17:38:08,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=22.5 2023-11-20 17:38:19,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1156600.0, ans=0.1 2023-11-20 17:38:19,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1156600.0, ans=0.2 2023-11-20 17:38:28,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1156600.0, ans=0.0 2023-11-20 17:38:30,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173500 2023-11-20 17:38:34,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1156666.6666666667, ans=0.1 2023-11-20 17:38:37,096 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:38:38,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1156666.6666666667, ans=0.125 2023-11-20 17:38:44,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2023-11-20 17:38:48,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-20 17:38:58,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-20 17:39:07,469 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:39:10,954 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5200, loss[loss=0.09286, simple_loss=0.1196, pruned_loss=0.02772, audio_tagging_loss=0.00532, over 14006.00 frames. ], tot_loss[loss=0.07759, simple_loss=0.09813, pruned_loss=0.01849, audio_tagging_loss=0.01004, over 3033173.83 frames. ], batch size: 53, lr: 4.61e-03, grad_scale: 32.0 2023-11-20 17:39:16,088 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:39:18,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1156866.6666666667, ans=0.125 2023-11-20 17:39:18,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1156866.6666666667, ans=0.0 2023-11-20 17:39:18,113 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:39:20,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1156866.6666666667, ans=0.125 2023-11-20 17:39:28,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1156933.3333333333, ans=0.125 2023-11-20 17:39:31,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.205e+01 8.715e+01 9.461e+01 1.258e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 17:39:35,257 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173550 2023-11-20 17:39:41,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1157000.0, ans=0.0 2023-11-20 17:39:57,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1157066.6666666667, ans=0.125 2023-11-20 17:40:15,222 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5250, loss[loss=0.07324, simple_loss=0.1006, pruned_loss=0.01647, audio_tagging_loss=0.006497, over 14986.00 frames. ], tot_loss[loss=0.07792, simple_loss=0.09875, pruned_loss=0.01862, audio_tagging_loss=0.009923, over 3040282.70 frames. ], batch size: 57, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:40:38,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173600 2023-11-20 17:40:48,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-20 17:40:58,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157400.0, ans=0.1 2023-11-20 17:41:14,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:41:19,890 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5300, loss[loss=0.06843, simple_loss=0.08822, pruned_loss=0.01491, audio_tagging_loss=0.009411, over 14511.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09995, pruned_loss=0.01889, audio_tagging_loss=0.009738, over 3043851.84 frames. ], batch size: 54, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:41:33,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1157600.0, ans=0.125 2023-11-20 17:41:37,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.599e+01 8.138e+01 8.869e+01 9.897e+01 2.566e+02, threshold=1.774e+02, percent-clipped=1.0 2023-11-20 17:41:41,706 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173650 2023-11-20 17:42:03,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1157733.3333333333, ans=0.125 2023-11-20 17:42:17,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1157800.0, ans=0.0 2023-11-20 17:42:22,883 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5350, loss[loss=0.07348, simple_loss=0.09128, pruned_loss=0.01699, audio_tagging_loss=0.01084, over 15441.00 frames. ], tot_loss[loss=0.0782, simple_loss=0.09968, pruned_loss=0.01865, audio_tagging_loss=0.009711, over 3046692.09 frames. ], batch size: 57, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:42:25,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1157866.6666666667, ans=0.04949747468305833 2023-11-20 17:42:27,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157866.6666666667, ans=0.1 2023-11-20 17:42:46,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173700 2023-11-20 17:42:58,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1158000.0, ans=0.125 2023-11-20 17:43:01,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-11-20 17:43:26,669 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5400, loss[loss=0.06677, simple_loss=0.08103, pruned_loss=0.01509, audio_tagging_loss=0.01117, over 15044.00 frames. ], tot_loss[loss=0.07829, simple_loss=0.09984, pruned_loss=0.0186, audio_tagging_loss=0.009771, over 3058544.36 frames. ], batch size: 58, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:43:27,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1158200.0, ans=0.1 2023-11-20 17:43:36,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-11-20 17:43:39,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1158266.6666666667, ans=0.125 2023-11-20 17:43:46,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.350e+01 7.982e+01 8.687e+01 9.392e+01 1.840e+02, threshold=1.737e+02, percent-clipped=1.0 2023-11-20 17:43:46,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1158266.6666666667, ans=0.0 2023-11-20 17:43:49,987 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173750 2023-11-20 17:44:02,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-20 17:44:21,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1158466.6666666667, ans=0.1 2023-11-20 17:44:28,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1158466.6666666667, ans=0.0 2023-11-20 17:44:30,556 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5450, loss[loss=0.07144, simple_loss=0.08885, pruned_loss=0.01801, audio_tagging_loss=0.009005, over 16025.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.1001, pruned_loss=0.01855, audio_tagging_loss=0.009748, over 3060993.19 frames. ], batch size: 61, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:44:43,712 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:44:53,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173800 2023-11-20 17:45:16,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1158733.3333333333, ans=0.1 2023-11-20 17:45:18,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1158733.3333333333, ans=0.2 2023-11-20 17:45:21,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1158800.0, ans=0.125 2023-11-20 17:45:34,254 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5500, loss[loss=0.07041, simple_loss=0.09573, pruned_loss=0.01308, audio_tagging_loss=0.00946, over 15349.00 frames. ], tot_loss[loss=0.07804, simple_loss=0.09948, pruned_loss=0.01845, audio_tagging_loss=0.009852, over 3049347.57 frames. ], batch size: 57, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:45:54,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.155e+01 8.582e+01 9.481e+01 1.342e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-20 17:45:57,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173850 2023-11-20 17:46:07,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1159000.0, ans=0.2 2023-11-20 17:46:07,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-11-20 17:46:13,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1159066.6666666667, ans=0.07 2023-11-20 17:46:25,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=22.5 2023-11-20 17:46:30,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1159133.3333333333, ans=0.2 2023-11-20 17:46:33,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-20 17:46:37,034 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5550, loss[loss=0.08079, simple_loss=0.09752, pruned_loss=0.0174, audio_tagging_loss=0.01464, over 15395.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.09941, pruned_loss=0.0184, audio_tagging_loss=0.009954, over 3058217.53 frames. ], batch size: 59, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:46:39,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1159200.0, ans=0.0 2023-11-20 17:46:44,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-11-20 17:46:46,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1159200.0, ans=0.1 2023-11-20 17:46:48,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1159266.6666666667, ans=0.125 2023-11-20 17:46:59,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173900 2023-11-20 17:47:13,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1159400.0, ans=0.1 2023-11-20 17:47:18,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1159400.0, ans=0.0 2023-11-20 17:47:18,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1159400.0, ans=0.07 2023-11-20 17:47:26,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1159466.6666666667, ans=0.05 2023-11-20 17:47:31,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1159466.6666666667, ans=0.2 2023-11-20 17:47:40,117 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5600, loss[loss=0.06634, simple_loss=0.08266, pruned_loss=0.01524, audio_tagging_loss=0.009772, over 16212.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.09965, pruned_loss=0.01849, audio_tagging_loss=0.01004, over 3062647.65 frames. ], batch size: 59, lr: 4.60e-03, grad_scale: 32.0 2023-11-20 17:47:51,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1159600.0, ans=0.125 2023-11-20 17:48:00,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.883e+01 7.988e+01 8.538e+01 9.609e+01 1.592e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-20 17:48:02,743 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 173950 2023-11-20 17:48:13,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:48:22,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159733.3333333333, ans=0.1 2023-11-20 17:48:25,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1159733.3333333333, ans=0.2 2023-11-20 17:48:26,227 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:48:26,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1159733.3333333333, ans=0.1 2023-11-20 17:48:28,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1159733.3333333333, ans=0.0 2023-11-20 17:48:30,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1159800.0, ans=0.125 2023-11-20 17:48:35,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1159800.0, ans=0.125 2023-11-20 17:48:44,028 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5650, loss[loss=0.08324, simple_loss=0.09694, pruned_loss=0.02355, audio_tagging_loss=0.01123, over 15647.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09999, pruned_loss=0.01873, audio_tagging_loss=0.01009, over 3058579.72 frames. ], batch size: 60, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:49:07,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174000 2023-11-20 17:49:12,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-20 17:49:15,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-11-20 17:49:48,558 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5700, loss[loss=0.06295, simple_loss=0.08563, pruned_loss=0.01123, audio_tagging_loss=0.008895, over 16029.00 frames. ], tot_loss[loss=0.07888, simple_loss=0.1001, pruned_loss=0.01876, audio_tagging_loss=0.01007, over 3061228.99 frames. ], batch size: 60, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:49:52,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160200.0, ans=0.1 2023-11-20 17:49:53,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1160200.0, ans=0.2 2023-11-20 17:50:04,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1160266.6666666667, ans=0.125 2023-11-20 17:50:07,797 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:50:09,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1160266.6666666667, ans=0.125 2023-11-20 17:50:10,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.368e+01 9.255e+01 1.006e+02 1.511e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-20 17:50:11,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174050 2023-11-20 17:50:26,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1160400.0, ans=0.125 2023-11-20 17:50:44,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-20 17:50:52,581 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5750, loss[loss=0.05696, simple_loss=0.0688, pruned_loss=0.01092, audio_tagging_loss=0.01164, over 15674.00 frames. ], tot_loss[loss=0.07822, simple_loss=0.09887, pruned_loss=0.0187, audio_tagging_loss=0.01009, over 3054582.58 frames. ], batch size: 59, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:51:15,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174100 2023-11-20 17:51:17,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1160666.6666666667, ans=0.04949747468305833 2023-11-20 17:51:26,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-11-20 17:51:55,652 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5800, loss[loss=0.09247, simple_loss=0.1233, pruned_loss=0.02381, audio_tagging_loss=0.007035, over 15473.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09982, pruned_loss=0.01896, audio_tagging_loss=0.009939, over 3057148.05 frames. ], batch size: 57, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:51:57,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1160866.6666666667, ans=0.2 2023-11-20 17:51:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1160866.6666666667, ans=0.125 2023-11-20 17:51:58,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1160866.6666666667, ans=0.125 2023-11-20 17:52:06,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1160866.6666666667, ans=0.125 2023-11-20 17:52:14,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1160933.3333333333, ans=0.0 2023-11-20 17:52:17,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.088e+01 8.646e+01 9.359e+01 1.315e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 17:52:19,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174150 2023-11-20 17:52:33,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1161066.6666666667, ans=0.125 2023-11-20 17:52:58,005 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:52:58,988 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5850, loss[loss=0.07462, simple_loss=0.09329, pruned_loss=0.01539, audio_tagging_loss=0.01259, over 15477.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.09961, pruned_loss=0.01889, audio_tagging_loss=0.009964, over 3052549.46 frames. ], batch size: 57, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:53:10,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-20 17:53:17,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1161266.6666666667, ans=0.125 2023-11-20 17:53:19,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1161266.6666666667, ans=0.125 2023-11-20 17:53:22,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174200 2023-11-20 17:53:28,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-20 17:53:34,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1161333.3333333333, ans=0.125 2023-11-20 17:53:38,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1161400.0, ans=0.0 2023-11-20 17:53:53,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1161466.6666666667, ans=0.0 2023-11-20 17:54:02,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2023-11-20 17:54:03,766 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5900, loss[loss=0.09465, simple_loss=0.1225, pruned_loss=0.02267, audio_tagging_loss=0.01072, over 15734.00 frames. ], tot_loss[loss=0.07886, simple_loss=0.1, pruned_loss=0.01897, audio_tagging_loss=0.009881, over 3055157.55 frames. ], batch size: 55, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:54:05,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1161533.3333333333, ans=0.125 2023-11-20 17:54:24,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.417e+01 8.980e+01 9.978e+01 1.286e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 17:54:25,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174250 2023-11-20 17:54:32,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2023-11-20 17:54:33,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=22.5 2023-11-20 17:54:35,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1161666.6666666667, ans=0.2 2023-11-20 17:54:37,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1161666.6666666667, ans=0.125 2023-11-20 17:54:40,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1161733.3333333333, ans=0.125 2023-11-20 17:54:44,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1161733.3333333333, ans=0.125 2023-11-20 17:55:04,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161800.0, ans=0.1 2023-11-20 17:55:06,755 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 5950, loss[loss=0.0714, simple_loss=0.09064, pruned_loss=0.01798, audio_tagging_loss=0.008103, over 16708.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.1004, pruned_loss=0.01905, audio_tagging_loss=0.009821, over 3056005.10 frames. ], batch size: 63, lr: 4.60e-03, grad_scale: 16.0 2023-11-20 17:55:12,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1161866.6666666667, ans=0.0 2023-11-20 17:55:22,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1161933.3333333333, ans=0.125 2023-11-20 17:55:22,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1161933.3333333333, ans=0.1 2023-11-20 17:55:30,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174300 2023-11-20 17:55:51,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1162066.6666666667, ans=0.2 2023-11-20 17:55:58,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-20 17:55:59,885 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:56:10,518 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6000, loss[loss=0.1031, simple_loss=0.1441, pruned_loss=0.02547, audio_tagging_loss=0.005573, over 15624.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.1001, pruned_loss=0.01886, audio_tagging_loss=0.009782, over 3054377.46 frames. ], batch size: 56, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 17:56:10,519 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 17:56:51,551 INFO [train_asr.py:1253] (1/4) Epoch 15, validation: loss=0.06114, simple_loss=0.05327, pruned_loss=0.005599, audio_tagging_loss=0.02891, over 4681554.00 frames. 2023-11-20 17:56:51,552 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 17:57:04,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1162266.6666666667, ans=0.125 2023-11-20 17:57:05,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-20 17:57:12,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.279e+01 8.029e+01 8.706e+01 9.735e+01 1.152e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 17:57:13,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174350 2023-11-20 17:57:16,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.51 vs. limit=10.0 2023-11-20 17:57:38,267 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 17:57:39,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1162400.0, ans=0.0 2023-11-20 17:57:53,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1162466.6666666667, ans=0.0 2023-11-20 17:57:55,645 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6050, loss[loss=0.07778, simple_loss=0.09758, pruned_loss=0.01704, audio_tagging_loss=0.01195, over 14207.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09983, pruned_loss=0.01886, audio_tagging_loss=0.009753, over 3051636.25 frames. ], batch size: 53, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 17:58:00,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.92 vs. limit=22.5 2023-11-20 17:58:19,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174400 2023-11-20 17:58:34,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1162733.3333333333, ans=0.125 2023-11-20 17:58:40,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1162733.3333333333, ans=0.125 2023-11-20 17:58:46,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2023-11-20 17:58:47,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1162800.0, ans=0.2 2023-11-20 17:58:59,418 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6100, loss[loss=0.08487, simple_loss=0.1284, pruned_loss=0.01451, audio_tagging_loss=0.006178, over 15201.00 frames. ], tot_loss[loss=0.07767, simple_loss=0.09869, pruned_loss=0.01853, audio_tagging_loss=0.0098, over 3046629.62 frames. ], batch size: 55, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 17:59:09,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1162866.6666666667, ans=0.125 2023-11-20 17:59:12,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1162933.3333333333, ans=0.125 2023-11-20 17:59:13,219 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 17:59:22,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 7.952e+01 8.424e+01 9.114e+01 1.496e+02, threshold=1.685e+02, percent-clipped=0.0 2023-11-20 17:59:23,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174450 2023-11-20 17:59:31,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-20 18:00:01,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-20 18:00:04,777 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6150, loss[loss=0.08403, simple_loss=0.1019, pruned_loss=0.02326, audio_tagging_loss=0.009802, over 14574.00 frames. ], tot_loss[loss=0.07757, simple_loss=0.09836, pruned_loss=0.01853, audio_tagging_loss=0.009862, over 3047532.81 frames. ], batch size: 54, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:00:27,088 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174500 2023-11-20 18:00:30,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-20 18:00:54,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1163400.0, ans=0.0 2023-11-20 18:00:58,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1163466.6666666667, ans=0.125 2023-11-20 18:01:07,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1163466.6666666667, ans=0.125 2023-11-20 18:01:09,260 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6200, loss[loss=0.07108, simple_loss=0.07941, pruned_loss=0.01852, audio_tagging_loss=0.01285, over 14914.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.09804, pruned_loss=0.01843, audio_tagging_loss=0.01005, over 3049055.89 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:01:15,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1163533.3333333333, ans=0.0 2023-11-20 18:01:21,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1163600.0, ans=0.1 2023-11-20 18:01:23,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1163600.0, ans=0.2 2023-11-20 18:01:24,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1163600.0, ans=0.0 2023-11-20 18:01:33,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.483e+01 8.002e+01 8.659e+01 9.317e+01 2.710e+02, threshold=1.732e+02, percent-clipped=1.0 2023-11-20 18:01:33,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174550 2023-11-20 18:01:37,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1163666.6666666667, ans=0.125 2023-11-20 18:01:45,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-20 18:02:08,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1163800.0, ans=0.1 2023-11-20 18:02:11,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1163800.0, ans=0.09899494936611666 2023-11-20 18:02:13,308 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6250, loss[loss=0.04575, simple_loss=0.05971, pruned_loss=0.005781, audio_tagging_loss=0.01011, over 15298.00 frames. ], tot_loss[loss=0.07746, simple_loss=0.09774, pruned_loss=0.01843, audio_tagging_loss=0.01016, over 3050800.93 frames. ], batch size: 60, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:02:37,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174600 2023-11-20 18:02:40,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1164000.0, ans=0.125 2023-11-20 18:02:42,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1164000.0, ans=0.0 2023-11-20 18:02:42,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1164000.0, ans=10.0 2023-11-20 18:02:58,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-20 18:02:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1164066.6666666667, ans=0.125 2023-11-20 18:03:12,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1164133.3333333333, ans=0.0 2023-11-20 18:03:18,938 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6300, loss[loss=0.09061, simple_loss=0.1181, pruned_loss=0.02159, audio_tagging_loss=0.009976, over 15233.00 frames. ], tot_loss[loss=0.07763, simple_loss=0.0979, pruned_loss=0.01842, audio_tagging_loss=0.01026, over 3048023.70 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:03:19,191 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:03:31,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1164266.6666666667, ans=0.0 2023-11-20 18:03:41,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 8.184e+01 8.938e+01 9.921e+01 1.399e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 18:03:41,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174650 2023-11-20 18:03:46,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1164333.3333333333, ans=0.0 2023-11-20 18:03:57,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-20 18:04:02,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1164400.0, ans=0.1 2023-11-20 18:04:03,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1164400.0, ans=0.1 2023-11-20 18:04:03,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1164400.0, ans=0.1 2023-11-20 18:04:07,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-20 18:04:14,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1164466.6666666667, ans=0.125 2023-11-20 18:04:23,043 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6350, loss[loss=0.0836, simple_loss=0.1187, pruned_loss=0.0167, audio_tagging_loss=0.007547, over 15306.00 frames. ], tot_loss[loss=0.07774, simple_loss=0.09806, pruned_loss=0.01846, audio_tagging_loss=0.01026, over 3044613.18 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:04:25,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1164533.3333333333, ans=0.0 2023-11-20 18:04:33,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1164533.3333333333, ans=0.125 2023-11-20 18:04:46,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174700 2023-11-20 18:04:52,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1164666.6666666667, ans=0.04949747468305833 2023-11-20 18:05:05,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1164733.3333333333, ans=0.0 2023-11-20 18:05:09,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:05:10,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=22.5 2023-11-20 18:05:11,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1164733.3333333333, ans=0.0 2023-11-20 18:05:15,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1164800.0, ans=0.2 2023-11-20 18:05:19,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1164800.0, ans=0.125 2023-11-20 18:05:26,447 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6400, loss[loss=0.09307, simple_loss=0.1222, pruned_loss=0.0233, audio_tagging_loss=0.008683, over 14845.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.0983, pruned_loss=0.01858, audio_tagging_loss=0.0103, over 3038003.88 frames. ], batch size: 55, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 18:05:28,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1164866.6666666667, ans=0.95 2023-11-20 18:05:48,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1164933.3333333333, ans=0.125 2023-11-20 18:05:50,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.152e+01 8.050e+01 8.682e+01 9.459e+01 1.192e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 18:05:50,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174750 2023-11-20 18:06:04,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1165066.6666666667, ans=0.125 2023-11-20 18:06:13,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1165066.6666666667, ans=0.125 2023-11-20 18:06:31,318 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6450, loss[loss=0.07328, simple_loss=0.09348, pruned_loss=0.01431, audio_tagging_loss=0.01223, over 15878.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09904, pruned_loss=0.0187, audio_tagging_loss=0.0103, over 3041544.32 frames. ], batch size: 60, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 18:06:40,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1165200.0, ans=10.0 2023-11-20 18:06:41,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1165200.0, ans=0.0 2023-11-20 18:06:54,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174800 2023-11-20 18:07:09,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1165400.0, ans=0.1 2023-11-20 18:07:26,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1165466.6666666667, ans=0.125 2023-11-20 18:07:36,633 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6500, loss[loss=0.0636, simple_loss=0.08265, pruned_loss=0.01173, audio_tagging_loss=0.01054, over 14878.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.09835, pruned_loss=0.01858, audio_tagging_loss=0.01023, over 3039029.70 frames. ], batch size: 54, lr: 4.59e-03, grad_scale: 32.0 2023-11-20 18:07:46,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1165533.3333333333, ans=0.0 2023-11-20 18:07:55,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1165600.0, ans=0.0 2023-11-20 18:07:58,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174850 2023-11-20 18:07:59,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1165600.0, ans=0.125 2023-11-20 18:07:59,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2023-11-20 18:08:00,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 7.879e+01 8.698e+01 9.570e+01 1.330e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 18:08:20,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1165733.3333333333, ans=0.0 2023-11-20 18:08:30,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-20 18:08:39,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1165866.6666666667, ans=0.0 2023-11-20 18:08:40,132 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6550, loss[loss=0.09836, simple_loss=0.1243, pruned_loss=0.02689, audio_tagging_loss=0.009345, over 14883.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.09858, pruned_loss=0.01869, audio_tagging_loss=0.01, over 3031557.99 frames. ], batch size: 56, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:08:42,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1165866.6666666667, ans=0.05 2023-11-20 18:08:51,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1165866.6666666667, ans=0.0 2023-11-20 18:09:03,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174900 2023-11-20 18:09:44,716 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6600, loss[loss=0.09877, simple_loss=0.1286, pruned_loss=0.02516, audio_tagging_loss=0.00934, over 16397.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.09872, pruned_loss=0.01856, audio_tagging_loss=0.009988, over 3036636.76 frames. ], batch size: 61, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:10:08,425 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 174950 2023-11-20 18:10:09,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.156e+01 8.016e+01 8.893e+01 9.402e+01 1.270e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 18:10:34,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2023-11-20 18:10:48,689 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6650, loss[loss=0.07929, simple_loss=0.0999, pruned_loss=0.0175, audio_tagging_loss=0.01184, over 15736.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.1006, pruned_loss=0.01896, audio_tagging_loss=0.009833, over 3039564.73 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:10:51,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1166533.3333333333, ans=0.05 2023-11-20 18:10:59,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1166533.3333333333, ans=0.0 2023-11-20 18:11:11,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175000 2023-11-20 18:11:16,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1166666.6666666667, ans=0.125 2023-11-20 18:11:18,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1166666.6666666667, ans=0.125 2023-11-20 18:11:39,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1166800.0, ans=0.0 2023-11-20 18:11:40,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-11-20 18:11:46,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1166800.0, ans=0.125 2023-11-20 18:11:51,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2023-11-20 18:11:53,094 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6700, loss[loss=0.08777, simple_loss=0.1182, pruned_loss=0.02057, audio_tagging_loss=0.008107, over 15474.00 frames. ], tot_loss[loss=0.07863, simple_loss=0.1001, pruned_loss=0.01884, audio_tagging_loss=0.009749, over 3042027.51 frames. ], batch size: 56, lr: 4.59e-03, grad_scale: 16.0 2023-11-20 18:12:01,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1166866.6666666667, ans=0.2 2023-11-20 18:12:06,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1166933.3333333333, ans=0.0 2023-11-20 18:12:14,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1166933.3333333333, ans=0.0 2023-11-20 18:12:15,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175050 2023-11-20 18:12:17,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.109e+01 8.688e+01 9.322e+01 1.481e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 18:12:28,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-20 18:12:31,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1167066.6666666667, ans=0.125 2023-11-20 18:12:38,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1167066.6666666667, ans=0.125 2023-11-20 18:12:57,099 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6750, loss[loss=0.07071, simple_loss=0.08632, pruned_loss=0.01677, audio_tagging_loss=0.01077, over 15301.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.1001, pruned_loss=0.01874, audio_tagging_loss=0.00975, over 3038491.51 frames. ], batch size: 57, lr: 4.58e-03, grad_scale: 16.0 2023-11-20 18:12:58,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1167200.0, ans=0.2 2023-11-20 18:12:59,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1167200.0, ans=0.125 2023-11-20 18:13:10,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1167266.6666666667, ans=0.1 2023-11-20 18:13:10,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1167266.6666666667, ans=0.125 2023-11-20 18:13:20,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175100 2023-11-20 18:14:01,474 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6800, loss[loss=0.06027, simple_loss=0.0717, pruned_loss=0.01274, audio_tagging_loss=0.01168, over 15024.00 frames. ], tot_loss[loss=0.07845, simple_loss=0.09995, pruned_loss=0.01874, audio_tagging_loss=0.009741, over 3037226.67 frames. ], batch size: 57, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:14:12,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-20 18:14:13,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1167600.0, ans=0.125 2023-11-20 18:14:24,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175150 2023-11-20 18:14:25,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.227e+01 9.120e+01 1.003e+02 1.352e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 18:14:30,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1167666.6666666667, ans=0.0 2023-11-20 18:15:04,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1167866.6666666667, ans=0.1 2023-11-20 18:15:05,529 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6850, loss[loss=0.05131, simple_loss=0.05913, pruned_loss=0.009698, audio_tagging_loss=0.01205, over 14657.00 frames. ], tot_loss[loss=0.07827, simple_loss=0.09979, pruned_loss=0.01864, audio_tagging_loss=0.009738, over 3039114.67 frames. ], batch size: 57, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:15:09,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1167866.6666666667, ans=0.09899494936611666 2023-11-20 18:15:10,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1167866.6666666667, ans=0.125 2023-11-20 18:15:15,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1167866.6666666667, ans=0.2 2023-11-20 18:15:28,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175200 2023-11-20 18:15:48,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1168066.6666666667, ans=0.125 2023-11-20 18:15:52,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1168066.6666666667, ans=0.95 2023-11-20 18:15:53,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1168066.6666666667, ans=0.125 2023-11-20 18:16:03,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1168133.3333333333, ans=0.125 2023-11-20 18:16:10,199 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6900, loss[loss=0.08838, simple_loss=0.1167, pruned_loss=0.02073, audio_tagging_loss=0.00928, over 16214.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.0994, pruned_loss=0.01858, audio_tagging_loss=0.00978, over 3043514.25 frames. ], batch size: 56, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:16:10,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1168200.0, ans=0.125 2023-11-20 18:16:12,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1168200.0, ans=0.0 2023-11-20 18:16:16,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1168200.0, ans=0.125 2023-11-20 18:16:32,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1168266.6666666667, ans=0.0 2023-11-20 18:16:33,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175250 2023-11-20 18:16:34,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.085e+01 8.884e+01 9.597e+01 1.266e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 18:16:34,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1168333.3333333333, ans=0.125 2023-11-20 18:16:54,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1168400.0, ans=0.1 2023-11-20 18:17:00,873 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 18:17:14,800 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 6950, loss[loss=0.06696, simple_loss=0.08647, pruned_loss=0.01363, audio_tagging_loss=0.01009, over 16089.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09895, pruned_loss=0.01872, audio_tagging_loss=0.009849, over 3049205.74 frames. ], batch size: 62, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:17:19,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1168533.3333333333, ans=0.125 2023-11-20 18:17:31,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1168600.0, ans=0.125 2023-11-20 18:17:37,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175300 2023-11-20 18:17:58,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1168733.3333333333, ans=0.125 2023-11-20 18:17:59,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1168733.3333333333, ans=0.125 2023-11-20 18:18:08,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1168800.0, ans=0.125 2023-11-20 18:18:09,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1168800.0, ans=0.125 2023-11-20 18:18:18,358 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7000, loss[loss=0.07527, simple_loss=0.09975, pruned_loss=0.01683, audio_tagging_loss=0.008567, over 14671.00 frames. ], tot_loss[loss=0.07697, simple_loss=0.09748, pruned_loss=0.01828, audio_tagging_loss=0.009957, over 3041315.91 frames. ], batch size: 58, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:18:41,613 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175350 2023-11-20 18:18:42,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 7.873e+01 8.691e+01 9.185e+01 1.165e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 18:18:56,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1169066.6666666667, ans=22.5 2023-11-20 18:18:59,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1169066.6666666667, ans=0.07 2023-11-20 18:19:07,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1169066.6666666667, ans=0.0 2023-11-20 18:19:11,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1169133.3333333333, ans=0.035 2023-11-20 18:19:14,854 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:19:18,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1169133.3333333333, ans=0.125 2023-11-20 18:19:19,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2023-11-20 18:19:21,840 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7050, loss[loss=0.09125, simple_loss=0.1141, pruned_loss=0.0253, audio_tagging_loss=0.008912, over 15717.00 frames. ], tot_loss[loss=0.07737, simple_loss=0.09785, pruned_loss=0.01849, audio_tagging_loss=0.009961, over 3039620.82 frames. ], batch size: 60, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:19:27,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1169200.0, ans=0.2 2023-11-20 18:19:45,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175400 2023-11-20 18:19:50,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1169333.3333333333, ans=0.0 2023-11-20 18:20:22,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1169466.6666666667, ans=0.05 2023-11-20 18:20:26,369 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7100, loss[loss=0.08025, simple_loss=0.1049, pruned_loss=0.01635, audio_tagging_loss=0.01146, over 15339.00 frames. ], tot_loss[loss=0.07769, simple_loss=0.09818, pruned_loss=0.01858, audio_tagging_loss=0.01002, over 3037022.20 frames. ], batch size: 58, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:20:32,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1169533.3333333333, ans=0.125 2023-11-20 18:20:48,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175450 2023-11-20 18:20:49,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.001e+01 8.747e+01 9.712e+01 1.225e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 18:20:52,474 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:20:52,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1169666.6666666667, ans=0.0 2023-11-20 18:21:27,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1169800.0, ans=0.0 2023-11-20 18:21:29,404 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7150, loss[loss=0.07507, simple_loss=0.09701, pruned_loss=0.01579, audio_tagging_loss=0.01078, over 15903.00 frames. ], tot_loss[loss=0.07786, simple_loss=0.09848, pruned_loss=0.01858, audio_tagging_loss=0.01004, over 3038632.13 frames. ], batch size: 60, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:21:30,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169866.6666666667, ans=0.1 2023-11-20 18:21:43,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1169933.3333333333, ans=0.125 2023-11-20 18:21:48,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169933.3333333333, ans=0.1 2023-11-20 18:21:53,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175500 2023-11-20 18:22:01,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1170000.0, ans=0.125 2023-11-20 18:22:11,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1170066.6666666667, ans=0.0 2023-11-20 18:22:19,525 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:22:28,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1170133.3333333333, ans=0.125 2023-11-20 18:22:32,636 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7200, loss[loss=0.09264, simple_loss=0.1205, pruned_loss=0.02331, audio_tagging_loss=0.009091, over 15305.00 frames. ], tot_loss[loss=0.07779, simple_loss=0.09825, pruned_loss=0.01854, audio_tagging_loss=0.01012, over 3037037.57 frames. ], batch size: 55, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:22:39,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1170200.0, ans=0.09899494936611666 2023-11-20 18:22:56,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175550 2023-11-20 18:22:57,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.535e+01 8.191e+01 8.742e+01 9.688e+01 2.740e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-20 18:23:10,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-20 18:23:11,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1170400.0, ans=0.125 2023-11-20 18:23:37,157 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7250, loss[loss=0.08514, simple_loss=0.1159, pruned_loss=0.01807, audio_tagging_loss=0.00913, over 14478.00 frames. ], tot_loss[loss=0.07748, simple_loss=0.0977, pruned_loss=0.01834, audio_tagging_loss=0.01029, over 3044731.88 frames. ], batch size: 53, lr: 4.58e-03, grad_scale: 32.0 2023-11-20 18:23:47,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2023-11-20 18:23:58,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1170600.0, ans=0.2 2023-11-20 18:23:59,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175600 2023-11-20 18:24:00,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1170600.0, ans=0.1 2023-11-20 18:24:13,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 18:24:18,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1170733.3333333333, ans=0.125 2023-11-20 18:24:22,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1170733.3333333333, ans=0.0 2023-11-20 18:24:24,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1170733.3333333333, ans=0.125 2023-11-20 18:24:27,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-20 18:24:30,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1170800.0, ans=0.1 2023-11-20 18:24:37,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1170800.0, ans=0.0 2023-11-20 18:24:41,088 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7300, loss[loss=0.05643, simple_loss=0.06323, pruned_loss=0.01211, audio_tagging_loss=0.0127, over 14043.00 frames. ], tot_loss[loss=0.0776, simple_loss=0.09832, pruned_loss=0.01829, audio_tagging_loss=0.01015, over 3043864.65 frames. ], batch size: 55, lr: 4.58e-03, grad_scale: 16.0 2023-11-20 18:24:47,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1170866.6666666667, ans=0.1 2023-11-20 18:24:48,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1170866.6666666667, ans=0.125 2023-11-20 18:24:51,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1170866.6666666667, ans=0.05 2023-11-20 18:24:57,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1170933.3333333333, ans=0.5 2023-11-20 18:25:03,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175650 2023-11-20 18:25:05,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1171000.0, ans=0.2 2023-11-20 18:25:06,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 7.796e+01 8.638e+01 9.366e+01 1.171e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 18:25:15,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1171000.0, ans=0.125 2023-11-20 18:25:43,771 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7350, loss[loss=0.09211, simple_loss=0.119, pruned_loss=0.02589, audio_tagging_loss=0.006708, over 14972.00 frames. ], tot_loss[loss=0.07688, simple_loss=0.09761, pruned_loss=0.01808, audio_tagging_loss=0.01, over 3048995.03 frames. ], batch size: 56, lr: 4.58e-03, grad_scale: 16.0 2023-11-20 18:26:06,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1171266.6666666667, ans=0.0 2023-11-20 18:26:07,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175700 2023-11-20 18:26:08,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1171333.3333333333, ans=0.125 2023-11-20 18:26:27,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-20 18:26:29,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1171400.0, ans=0.125 2023-11-20 18:26:47,851 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7400, loss[loss=0.0469, simple_loss=0.0545, pruned_loss=0.009174, audio_tagging_loss=0.01048, over 13790.00 frames. ], tot_loss[loss=0.07612, simple_loss=0.09659, pruned_loss=0.01794, audio_tagging_loss=0.009885, over 3042848.28 frames. ], batch size: 53, lr: 4.58e-03, grad_scale: 16.0 2023-11-20 18:26:54,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171533.3333333333, ans=0.1 2023-11-20 18:27:00,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1171600.0, ans=0.125 2023-11-20 18:27:06,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1171600.0, ans=0.125 2023-11-20 18:27:10,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175750 2023-11-20 18:27:12,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.603e+01 8.125e+01 8.808e+01 9.818e+01 1.259e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 18:27:22,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-20 18:27:51,340 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7450, loss[loss=0.07997, simple_loss=0.103, pruned_loss=0.01991, audio_tagging_loss=0.008541, over 14899.00 frames. ], tot_loss[loss=0.07729, simple_loss=0.09848, pruned_loss=0.0183, audio_tagging_loss=0.009754, over 3041152.82 frames. ], batch size: 57, lr: 4.58e-03, grad_scale: 16.0 2023-11-20 18:27:55,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171866.6666666667, ans=0.1 2023-11-20 18:27:56,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1171866.6666666667, ans=0.125 2023-11-20 18:28:13,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175800 2023-11-20 18:28:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1172066.6666666667, ans=0.1 2023-11-20 18:28:35,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1172066.6666666667, ans=10.0 2023-11-20 18:28:54,516 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7500, loss[loss=0.05466, simple_loss=0.0679, pruned_loss=0.009818, audio_tagging_loss=0.01089, over 15516.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09863, pruned_loss=0.01852, audio_tagging_loss=0.009727, over 3045311.69 frames. ], batch size: 60, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:28:59,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1172200.0, ans=0.125 2023-11-20 18:29:02,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1172200.0, ans=0.0 2023-11-20 18:29:13,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1172266.6666666667, ans=0.125 2023-11-20 18:29:19,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175850 2023-11-20 18:29:21,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.057e+01 8.747e+01 9.687e+01 1.264e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 18:29:56,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1172466.6666666667, ans=0.125 2023-11-20 18:29:59,019 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7550, loss[loss=0.06985, simple_loss=0.09452, pruned_loss=0.0132, audio_tagging_loss=0.009394, over 15261.00 frames. ], tot_loss[loss=0.07701, simple_loss=0.09776, pruned_loss=0.01844, audio_tagging_loss=0.009688, over 3046959.83 frames. ], batch size: 57, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:30:22,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175900 2023-11-20 18:30:32,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2023-11-20 18:30:57,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1172800.0, ans=0.0 2023-11-20 18:30:58,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1172800.0, ans=0.0 2023-11-20 18:31:03,463 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7600, loss[loss=0.0825, simple_loss=0.1061, pruned_loss=0.02163, audio_tagging_loss=0.007809, over 15391.00 frames. ], tot_loss[loss=0.07672, simple_loss=0.0975, pruned_loss=0.01831, audio_tagging_loss=0.009659, over 3047902.49 frames. ], batch size: 56, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:31:04,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1172866.6666666667, ans=0.2 2023-11-20 18:31:08,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1172866.6666666667, ans=0.125 2023-11-20 18:31:11,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1172866.6666666667, ans=0.0 2023-11-20 18:31:17,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1172933.3333333333, ans=0.1 2023-11-20 18:31:20,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1172933.3333333333, ans=0.125 2023-11-20 18:31:25,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 175950 2023-11-20 18:31:25,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1172933.3333333333, ans=0.0 2023-11-20 18:31:27,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.999e+01 7.835e+01 8.443e+01 9.077e+01 1.271e+02, threshold=1.689e+02, percent-clipped=0.0 2023-11-20 18:31:35,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1173000.0, ans=0.0 2023-11-20 18:31:51,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1173066.6666666667, ans=0.2 2023-11-20 18:31:58,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1173133.3333333333, ans=0.125 2023-11-20 18:32:06,906 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7650, loss[loss=0.08624, simple_loss=0.1151, pruned_loss=0.01716, audio_tagging_loss=0.01152, over 15281.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.0962, pruned_loss=0.01792, audio_tagging_loss=0.009763, over 3050482.55 frames. ], batch size: 56, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:32:30,218 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176000 2023-11-20 18:32:44,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-11-20 18:32:50,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1173400.0, ans=0.125 2023-11-20 18:33:01,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1173466.6666666667, ans=0.125 2023-11-20 18:33:06,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1173466.6666666667, ans=0.0 2023-11-20 18:33:08,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1173466.6666666667, ans=0.2 2023-11-20 18:33:14,119 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7700, loss[loss=0.05042, simple_loss=0.05546, pruned_loss=0.006179, audio_tagging_loss=0.01652, over 15094.00 frames. ], tot_loss[loss=0.07667, simple_loss=0.0974, pruned_loss=0.01822, audio_tagging_loss=0.009749, over 3046392.45 frames. ], batch size: 60, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:33:27,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1173600.0, ans=0.0 2023-11-20 18:33:27,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1173600.0, ans=0.1 2023-11-20 18:33:37,298 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176050 2023-11-20 18:33:39,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.684e+01 8.040e+01 8.573e+01 9.268e+01 1.195e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 18:33:42,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1173666.6666666667, ans=0.0 2023-11-20 18:33:42,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1173666.6666666667, ans=0.2 2023-11-20 18:33:57,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.97 vs. limit=10.0 2023-11-20 18:33:59,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2023-11-20 18:34:00,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1173733.3333333333, ans=0.1 2023-11-20 18:34:18,335 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7750, loss[loss=0.06603, simple_loss=0.07759, pruned_loss=0.01553, audio_tagging_loss=0.0117, over 16237.00 frames. ], tot_loss[loss=0.07677, simple_loss=0.09716, pruned_loss=0.01834, audio_tagging_loss=0.009844, over 3041640.73 frames. ], batch size: 64, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:34:41,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176100 2023-11-20 18:34:53,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1174000.0, ans=0.015 2023-11-20 18:34:58,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1174066.6666666667, ans=0.0 2023-11-20 18:34:58,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1174066.6666666667, ans=0.0 2023-11-20 18:34:59,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1174066.6666666667, ans=0.125 2023-11-20 18:35:22,312 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7800, loss[loss=0.07966, simple_loss=0.1038, pruned_loss=0.01682, audio_tagging_loss=0.01092, over 15057.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09778, pruned_loss=0.01838, audio_tagging_loss=0.009788, over 3046973.23 frames. ], batch size: 57, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:35:42,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1174266.6666666667, ans=0.02 2023-11-20 18:35:45,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176150 2023-11-20 18:35:48,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.331e+01 9.045e+01 1.006e+02 1.273e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 18:35:58,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1174333.3333333333, ans=0.04949747468305833 2023-11-20 18:35:59,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-20 18:36:14,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1174466.6666666667, ans=0.125 2023-11-20 18:36:22,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1174466.6666666667, ans=0.0 2023-11-20 18:36:26,125 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7850, loss[loss=0.09077, simple_loss=0.1187, pruned_loss=0.0217, audio_tagging_loss=0.009702, over 15717.00 frames. ], tot_loss[loss=0.07713, simple_loss=0.09771, pruned_loss=0.01835, audio_tagging_loss=0.009925, over 3048783.30 frames. ], batch size: 57, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:36:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1174533.3333333333, ans=0.2 2023-11-20 18:36:35,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1174533.3333333333, ans=0.1 2023-11-20 18:36:37,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-20 18:36:43,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1174600.0, ans=0.2 2023-11-20 18:36:50,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176200 2023-11-20 18:37:10,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1174733.3333333333, ans=0.125 2023-11-20 18:37:13,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1174733.3333333333, ans=0.125 2023-11-20 18:37:14,095 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:37:17,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1174800.0, ans=0.95 2023-11-20 18:37:23,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1174800.0, ans=0.0 2023-11-20 18:37:31,663 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7900, loss[loss=0.08568, simple_loss=0.1014, pruned_loss=0.02345, audio_tagging_loss=0.01151, over 15700.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09772, pruned_loss=0.01835, audio_tagging_loss=0.01012, over 3048921.30 frames. ], batch size: 58, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:37:32,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1174866.6666666667, ans=0.125 2023-11-20 18:37:39,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1174866.6666666667, ans=0.125 2023-11-20 18:37:54,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176250 2023-11-20 18:37:55,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1175000.0, ans=0.0 2023-11-20 18:37:57,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.028e+01 8.743e+01 9.651e+01 2.118e+02, threshold=1.749e+02, percent-clipped=1.0 2023-11-20 18:38:15,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1175066.6666666667, ans=0.125 2023-11-20 18:38:20,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1175066.6666666667, ans=0.2 2023-11-20 18:38:25,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1175133.3333333333, ans=0.2 2023-11-20 18:38:35,859 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 7950, loss[loss=0.07321, simple_loss=0.09418, pruned_loss=0.01612, audio_tagging_loss=0.01, over 16047.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.09852, pruned_loss=0.01868, audio_tagging_loss=0.01021, over 3044400.73 frames. ], batch size: 61, lr: 4.57e-03, grad_scale: 16.0 2023-11-20 18:38:51,798 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 18:38:59,234 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176300 2023-11-20 18:38:59,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1175266.6666666667, ans=0.0 2023-11-20 18:39:39,801 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8000, loss[loss=0.0918, simple_loss=0.1224, pruned_loss=0.02415, audio_tagging_loss=0.006448, over 15013.00 frames. ], tot_loss[loss=0.07753, simple_loss=0.09747, pruned_loss=0.01854, audio_tagging_loss=0.01025, over 3038429.87 frames. ], batch size: 54, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:39:47,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-20 18:39:50,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2023-11-20 18:40:03,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176350 2023-11-20 18:40:07,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.314e+01 8.360e+01 9.089e+01 9.769e+01 1.308e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 18:40:20,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1175733.3333333333, ans=0.125 2023-11-20 18:40:20,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1175733.3333333333, ans=0.1 2023-11-20 18:40:22,620 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:40:25,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1175733.3333333333, ans=0.0 2023-11-20 18:40:44,403 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8050, loss[loss=0.07636, simple_loss=0.1055, pruned_loss=0.01331, audio_tagging_loss=0.01032, over 16230.00 frames. ], tot_loss[loss=0.07796, simple_loss=0.09801, pruned_loss=0.0187, audio_tagging_loss=0.01025, over 3038126.56 frames. ], batch size: 60, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:41:01,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1175933.3333333333, ans=0.125 2023-11-20 18:41:07,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176400 2023-11-20 18:41:15,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1176000.0, ans=0.95 2023-11-20 18:41:28,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-20 18:41:43,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1176133.3333333333, ans=0.125 2023-11-20 18:41:43,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-11-20 18:41:49,091 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8100, loss[loss=0.08053, simple_loss=0.1009, pruned_loss=0.02064, audio_tagging_loss=0.009427, over 14895.00 frames. ], tot_loss[loss=0.07812, simple_loss=0.09869, pruned_loss=0.01867, audio_tagging_loss=0.01011, over 3035952.59 frames. ], batch size: 56, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:42:13,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176450 2023-11-20 18:42:16,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 7.970e+01 8.520e+01 9.468e+01 1.287e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-20 18:42:26,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176333.3333333333, ans=0.1 2023-11-20 18:42:30,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1176400.0, ans=0.125 2023-11-20 18:42:38,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1176400.0, ans=0.5 2023-11-20 18:42:46,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2023-11-20 18:42:49,903 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:42:53,294 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8150, loss[loss=0.05667, simple_loss=0.07018, pruned_loss=0.01222, audio_tagging_loss=0.009362, over 15480.00 frames. ], tot_loss[loss=0.07852, simple_loss=0.09954, pruned_loss=0.01885, audio_tagging_loss=0.009899, over 3040543.33 frames. ], batch size: 61, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:42:59,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1176533.3333333333, ans=0.125 2023-11-20 18:43:07,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.150e-01 2023-11-20 18:43:09,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1176600.0, ans=0.125 2023-11-20 18:43:17,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176500 2023-11-20 18:43:28,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1176666.6666666667, ans=0.125 2023-11-20 18:43:36,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.08 vs. limit=10.0 2023-11-20 18:43:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1176733.3333333333, ans=10.0 2023-11-20 18:43:42,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1176733.3333333333, ans=0.0 2023-11-20 18:43:51,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1176800.0, ans=0.07 2023-11-20 18:43:57,859 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8200, loss[loss=0.05497, simple_loss=0.06278, pruned_loss=0.01425, audio_tagging_loss=0.009325, over 16189.00 frames. ], tot_loss[loss=0.07766, simple_loss=0.09854, pruned_loss=0.01853, audio_tagging_loss=0.009859, over 3040539.72 frames. ], batch size: 63, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:43:59,160 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 18:44:02,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1176866.6666666667, ans=0.0 2023-11-20 18:44:12,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1176933.3333333333, ans=0.125 2023-11-20 18:44:20,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176550 2023-11-20 18:44:24,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 7.959e+01 8.657e+01 9.371e+01 1.307e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 18:44:48,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-11-20 18:45:02,155 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8250, loss[loss=0.09732, simple_loss=0.1281, pruned_loss=0.02379, audio_tagging_loss=0.009472, over 14729.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.09969, pruned_loss=0.01884, audio_tagging_loss=0.009686, over 3037902.95 frames. ], batch size: 53, lr: 4.57e-03, grad_scale: 32.0 2023-11-20 18:45:03,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1177200.0, ans=0.125 2023-11-20 18:45:25,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176600 2023-11-20 18:45:29,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1177333.3333333333, ans=0.125 2023-11-20 18:45:31,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-20 18:45:32,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1177333.3333333333, ans=0.125 2023-11-20 18:45:46,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1177400.0, ans=0.125 2023-11-20 18:45:49,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1177400.0, ans=0.0 2023-11-20 18:45:57,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1177466.6666666667, ans=0.05 2023-11-20 18:45:58,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1177466.6666666667, ans=0.2 2023-11-20 18:46:06,207 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8300, loss[loss=0.0583, simple_loss=0.06142, pruned_loss=0.01397, audio_tagging_loss=0.01362, over 16959.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.09911, pruned_loss=0.01862, audio_tagging_loss=0.009719, over 3039011.79 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:46:30,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176650 2023-11-20 18:46:33,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.522e+01 7.955e+01 8.600e+01 9.336e+01 1.507e+02, threshold=1.720e+02, percent-clipped=0.0 2023-11-20 18:46:39,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1177666.6666666667, ans=0.125 2023-11-20 18:46:40,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1177666.6666666667, ans=0.0 2023-11-20 18:46:44,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=22.5 2023-11-20 18:46:53,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1177733.3333333333, ans=0.09899494936611666 2023-11-20 18:47:09,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1177800.0, ans=0.125 2023-11-20 18:47:11,258 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8350, loss[loss=0.07778, simple_loss=0.1052, pruned_loss=0.01617, audio_tagging_loss=0.009004, over 15100.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09989, pruned_loss=0.01882, audio_tagging_loss=0.009566, over 3039790.07 frames. ], batch size: 58, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:47:12,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1177866.6666666667, ans=0.2 2023-11-20 18:47:16,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2023-11-20 18:47:22,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1177866.6666666667, ans=0.2 2023-11-20 18:47:33,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176700 2023-11-20 18:47:41,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-20 18:47:56,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1178066.6666666667, ans=0.0 2023-11-20 18:48:01,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1178133.3333333333, ans=0.125 2023-11-20 18:48:05,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1178133.3333333333, ans=0.1 2023-11-20 18:48:12,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1178133.3333333333, ans=0.0 2023-11-20 18:48:15,746 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8400, loss[loss=0.08022, simple_loss=0.106, pruned_loss=0.01596, audio_tagging_loss=0.01128, over 15951.00 frames. ], tot_loss[loss=0.07802, simple_loss=0.09927, pruned_loss=0.01869, audio_tagging_loss=0.009695, over 3041319.60 frames. ], batch size: 58, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:48:19,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1178200.0, ans=0.025 2023-11-20 18:48:22,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1178200.0, ans=0.0 2023-11-20 18:48:37,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1178266.6666666667, ans=0.0 2023-11-20 18:48:38,705 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176750 2023-11-20 18:48:39,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.17 vs. limit=10.0 2023-11-20 18:48:42,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 7.973e+01 8.686e+01 9.472e+01 1.183e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 18:48:50,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-11-20 18:49:08,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1178466.6666666667, ans=0.125 2023-11-20 18:49:09,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1178466.6666666667, ans=0.2 2023-11-20 18:49:19,814 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8450, loss[loss=0.06451, simple_loss=0.08054, pruned_loss=0.01438, audio_tagging_loss=0.009857, over 14256.00 frames. ], tot_loss[loss=0.07784, simple_loss=0.09889, pruned_loss=0.01866, audio_tagging_loss=0.009733, over 3035818.13 frames. ], batch size: 55, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:49:30,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1178533.3333333333, ans=0.0 2023-11-20 18:49:38,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1178600.0, ans=0.125 2023-11-20 18:49:39,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1178600.0, ans=0.125 2023-11-20 18:49:43,470 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176800 2023-11-20 18:50:14,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2023-11-20 18:50:25,086 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8500, loss[loss=0.08007, simple_loss=0.1102, pruned_loss=0.01602, audio_tagging_loss=0.008959, over 15497.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.09919, pruned_loss=0.01872, audio_tagging_loss=0.009754, over 3043003.55 frames. ], batch size: 58, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:50:30,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1178866.6666666667, ans=0.025 2023-11-20 18:50:45,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1178933.3333333333, ans=0.05 2023-11-20 18:50:46,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1178933.3333333333, ans=0.0 2023-11-20 18:50:47,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176850 2023-11-20 18:50:51,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.113e+01 8.136e+01 8.962e+01 9.902e+01 1.387e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 18:50:53,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-20 18:51:12,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1179066.6666666667, ans=0.1 2023-11-20 18:51:19,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.00 vs. limit=10.0 2023-11-20 18:51:28,706 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8550, loss[loss=0.08687, simple_loss=0.1105, pruned_loss=0.02278, audio_tagging_loss=0.008851, over 14386.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.09948, pruned_loss=0.01877, audio_tagging_loss=0.00987, over 3046698.06 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:51:28,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1179200.0, ans=0.0 2023-11-20 18:51:32,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1179200.0, ans=0.125 2023-11-20 18:51:35,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1179200.0, ans=0.125 2023-11-20 18:51:39,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1179200.0, ans=0.125 2023-11-20 18:51:51,432 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176900 2023-11-20 18:51:51,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-20 18:52:18,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179400.0, ans=0.1 2023-11-20 18:52:22,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1179466.6666666667, ans=0.125 2023-11-20 18:52:31,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1179533.3333333333, ans=0.125 2023-11-20 18:52:31,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179533.3333333333, ans=0.1 2023-11-20 18:52:32,731 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8600, loss[loss=0.0723, simple_loss=0.09361, pruned_loss=0.01629, audio_tagging_loss=0.009205, over 14955.00 frames. ], tot_loss[loss=0.0781, simple_loss=0.09923, pruned_loss=0.01856, audio_tagging_loss=0.009928, over 3050566.58 frames. ], batch size: 55, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:52:43,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-20 18:52:49,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-20 18:52:54,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-20 18:52:56,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 176950 2023-11-20 18:53:00,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.645e+01 8.095e+01 8.772e+01 9.480e+01 1.996e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-20 18:53:22,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1179733.3333333333, ans=0.07 2023-11-20 18:53:35,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=12.0 2023-11-20 18:53:37,649 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8650, loss[loss=0.07682, simple_loss=0.09999, pruned_loss=0.01772, audio_tagging_loss=0.009108, over 14976.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09936, pruned_loss=0.01847, audio_tagging_loss=0.009903, over 3044968.55 frames. ], batch size: 57, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:53:39,282 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 18:53:47,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-20 18:53:58,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1179933.3333333333, ans=0.0 2023-11-20 18:54:01,263 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177000 2023-11-20 18:54:04,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1180000.0, ans=0.125 2023-11-20 18:54:06,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1180000.0, ans=0.125 2023-11-20 18:54:16,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1180066.6666666667, ans=0.125 2023-11-20 18:54:28,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1180066.6666666667, ans=0.0 2023-11-20 18:54:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1180133.3333333333, ans=0.125 2023-11-20 18:54:43,966 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8700, loss[loss=0.07617, simple_loss=0.09494, pruned_loss=0.01912, audio_tagging_loss=0.009578, over 15636.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09942, pruned_loss=0.01869, audio_tagging_loss=0.01001, over 3046477.10 frames. ], batch size: 57, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:54:57,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1180266.6666666667, ans=0.0 2023-11-20 18:55:06,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177050 2023-11-20 18:55:10,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.109e+01 8.720e+01 9.588e+01 1.618e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 18:55:14,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1180333.3333333333, ans=0.1 2023-11-20 18:55:15,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1180333.3333333333, ans=0.1 2023-11-20 18:55:17,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1180333.3333333333, ans=0.125 2023-11-20 18:55:20,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1180333.3333333333, ans=0.125 2023-11-20 18:55:33,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1180400.0, ans=0.125 2023-11-20 18:55:33,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1180400.0, ans=0.5 2023-11-20 18:55:35,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1180466.6666666667, ans=0.125 2023-11-20 18:55:48,026 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8750, loss[loss=0.06391, simple_loss=0.07345, pruned_loss=0.01451, audio_tagging_loss=0.01267, over 15265.00 frames. ], tot_loss[loss=0.07883, simple_loss=0.09983, pruned_loss=0.0188, audio_tagging_loss=0.01012, over 3047453.94 frames. ], batch size: 58, lr: 4.56e-03, grad_scale: 16.0 2023-11-20 18:56:11,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177100 2023-11-20 18:56:19,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1180666.6666666667, ans=0.035 2023-11-20 18:56:52,276 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8800, loss[loss=0.1045, simple_loss=0.142, pruned_loss=0.0267, audio_tagging_loss=0.006796, over 15508.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.1012, pruned_loss=0.019, audio_tagging_loss=0.01015, over 3052362.51 frames. ], batch size: 57, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:57:16,467 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177150 2023-11-20 18:57:16,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1180933.3333333333, ans=0.125 2023-11-20 18:57:21,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.135e+01 8.370e+01 9.044e+01 9.709e+01 1.257e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 18:57:33,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-20 18:57:53,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-20 18:57:58,613 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8850, loss[loss=0.06976, simple_loss=0.09448, pruned_loss=0.0149, audio_tagging_loss=0.007619, over 15024.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.1006, pruned_loss=0.01897, audio_tagging_loss=0.01015, over 3045561.02 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:58:10,995 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 18:58:16,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1181266.6666666667, ans=0.125 2023-11-20 18:58:20,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177200 2023-11-20 18:58:25,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1181333.3333333333, ans=0.95 2023-11-20 18:58:33,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-20 18:58:48,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1181400.0, ans=0.0 2023-11-20 18:58:48,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1181400.0, ans=0.05 2023-11-20 18:59:03,526 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8900, loss[loss=0.0864, simple_loss=0.1117, pruned_loss=0.0226, audio_tagging_loss=0.007969, over 15956.00 frames. ], tot_loss[loss=0.07908, simple_loss=0.1004, pruned_loss=0.01883, audio_tagging_loss=0.01005, over 3050070.33 frames. ], batch size: 57, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 18:59:07,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1181533.3333333333, ans=0.125 2023-11-20 18:59:27,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177250 2023-11-20 18:59:32,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.317e+01 8.861e+01 9.855e+01 1.345e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 18:59:59,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1181800.0, ans=0.125 2023-11-20 19:00:08,836 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 8950, loss[loss=0.1075, simple_loss=0.1447, pruned_loss=0.02796, audio_tagging_loss=0.007189, over 14995.00 frames. ], tot_loss[loss=0.0789, simple_loss=0.1004, pruned_loss=0.01878, audio_tagging_loss=0.009932, over 3054102.80 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 19:00:33,303 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177300 2023-11-20 19:01:01,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1182133.3333333333, ans=0.1 2023-11-20 19:01:13,874 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9000, loss[loss=0.06492, simple_loss=0.08252, pruned_loss=0.01261, audio_tagging_loss=0.01105, over 15412.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.0998, pruned_loss=0.01877, audio_tagging_loss=0.009868, over 3055208.86 frames. ], batch size: 60, lr: 4.56e-03, grad_scale: 32.0 2023-11-20 19:01:13,875 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 19:01:35,653 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3515, 3.8572, 4.2020, 3.6649, 4.1291, 3.9991, 3.8483, 3.9583], device='cuda:1') 2023-11-20 19:01:50,495 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3304, 5.0059, 4.7200, 5.1752], device='cuda:1') 2023-11-20 19:01:56,962 INFO [train_asr.py:1253] (1/4) Epoch 15, validation: loss=0.0619, simple_loss=0.05318, pruned_loss=0.005552, audio_tagging_loss=0.02975, over 4681554.00 frames. 2023-11-20 19:01:56,963 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 19:02:20,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177350 2023-11-20 19:02:20,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1182266.6666666667, ans=0.125 2023-11-20 19:02:25,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.224e+01 8.889e+01 9.368e+01 1.234e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 19:02:34,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1182333.3333333333, ans=0.2 2023-11-20 19:02:42,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1182400.0, ans=0.125 2023-11-20 19:02:56,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-20 19:02:59,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1182466.6666666667, ans=0.125 2023-11-20 19:02:59,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2023-11-20 19:03:02,042 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9050, loss[loss=0.08143, simple_loss=0.1005, pruned_loss=0.02237, audio_tagging_loss=0.008799, over 15074.00 frames. ], tot_loss[loss=0.07868, simple_loss=0.1001, pruned_loss=0.01887, audio_tagging_loss=0.009751, over 3055549.18 frames. ], batch size: 58, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:03:13,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-20 19:03:25,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177400 2023-11-20 19:03:30,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-20 19:03:40,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2023-11-20 19:04:07,292 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9100, loss[loss=0.05489, simple_loss=0.06067, pruned_loss=0.01048, audio_tagging_loss=0.01407, over 14448.00 frames. ], tot_loss[loss=0.07899, simple_loss=0.1007, pruned_loss=0.01908, audio_tagging_loss=0.009558, over 3051472.62 frames. ], batch size: 56, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:04:17,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-20 19:04:19,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-20 19:04:23,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1182933.3333333333, ans=0.07 2023-11-20 19:04:30,218 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177450 2023-11-20 19:04:34,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.170e+01 8.840e+01 9.659e+01 1.289e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 19:04:37,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1183000.0, ans=0.0 2023-11-20 19:04:49,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1183066.6666666667, ans=0.0 2023-11-20 19:04:51,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2023-11-20 19:04:55,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1183066.6666666667, ans=0.2 2023-11-20 19:05:12,040 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9150, loss[loss=0.0687, simple_loss=0.08765, pruned_loss=0.01559, audio_tagging_loss=0.00929, over 14612.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.1002, pruned_loss=0.01902, audio_tagging_loss=0.009647, over 3045412.83 frames. ], batch size: 56, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:05:18,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1183200.0, ans=0.1 2023-11-20 19:05:35,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177500 2023-11-20 19:05:39,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1183333.3333333333, ans=0.1 2023-11-20 19:05:47,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-20 19:06:04,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1183466.6666666667, ans=0.0 2023-11-20 19:06:06,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1183466.6666666667, ans=0.125 2023-11-20 19:06:15,568 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9200, loss[loss=0.05963, simple_loss=0.07629, pruned_loss=0.01076, audio_tagging_loss=0.01072, over 15677.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.1001, pruned_loss=0.01909, audio_tagging_loss=0.009677, over 3049169.96 frames. ], batch size: 57, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:06:24,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1183533.3333333333, ans=0.0 2023-11-20 19:06:38,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1183600.0, ans=0.125 2023-11-20 19:06:39,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177550 2023-11-20 19:06:44,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.006e+01 8.354e+01 8.939e+01 9.701e+01 1.171e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 19:06:49,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1183666.6666666667, ans=0.125 2023-11-20 19:06:59,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1183733.3333333333, ans=0.125 2023-11-20 19:07:09,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2023-11-20 19:07:17,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-20 19:07:20,286 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9250, loss[loss=0.09661, simple_loss=0.1218, pruned_loss=0.02838, audio_tagging_loss=0.007327, over 16660.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.1007, pruned_loss=0.0193, audio_tagging_loss=0.009656, over 3049458.37 frames. ], batch size: 62, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:07:38,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1183933.3333333333, ans=0.1 2023-11-20 19:07:43,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177600 2023-11-20 19:07:46,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1184000.0, ans=0.125 2023-11-20 19:08:09,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184066.6666666667, ans=0.1 2023-11-20 19:08:24,591 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9300, loss[loss=0.1063, simple_loss=0.1436, pruned_loss=0.02588, audio_tagging_loss=0.008574, over 15068.00 frames. ], tot_loss[loss=0.07888, simple_loss=0.1005, pruned_loss=0.01892, audio_tagging_loss=0.009705, over 3056914.68 frames. ], batch size: 57, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:08:24,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1184200.0, ans=0.2 2023-11-20 19:08:33,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1184200.0, ans=0.0 2023-11-20 19:08:36,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1184266.6666666667, ans=0.0 2023-11-20 19:08:48,437 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177650 2023-11-20 19:08:53,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.542e+01 8.090e+01 8.777e+01 9.713e+01 1.132e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 19:09:28,669 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9350, loss[loss=0.07248, simple_loss=0.09405, pruned_loss=0.0138, audio_tagging_loss=0.01165, over 15736.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.1009, pruned_loss=0.01907, audio_tagging_loss=0.009816, over 3060740.77 frames. ], batch size: 61, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:09:28,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1184533.3333333333, ans=0.0 2023-11-20 19:09:51,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1184600.0, ans=0.0 2023-11-20 19:09:52,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177700 2023-11-20 19:10:04,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1184666.6666666667, ans=0.05 2023-11-20 19:10:07,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2023-11-20 19:10:15,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1184733.3333333333, ans=0.125 2023-11-20 19:10:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1184733.3333333333, ans=0.0 2023-11-20 19:10:23,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1184800.0, ans=0.0 2023-11-20 19:10:25,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-20 19:10:33,259 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9400, loss[loss=0.09058, simple_loss=0.1132, pruned_loss=0.02395, audio_tagging_loss=0.01004, over 15250.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1015, pruned_loss=0.01904, audio_tagging_loss=0.009931, over 3058689.68 frames. ], batch size: 58, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:10:45,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1184933.3333333333, ans=0.125 2023-11-20 19:10:56,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177750 2023-11-20 19:10:58,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1185000.0, ans=0.0 2023-11-20 19:11:01,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.184e+01 8.847e+01 9.642e+01 1.225e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 19:11:09,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1185000.0, ans=0.125 2023-11-20 19:11:30,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1185133.3333333333, ans=0.2 2023-11-20 19:11:32,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1185133.3333333333, ans=0.125 2023-11-20 19:11:35,916 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:11:37,125 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9450, loss[loss=0.06391, simple_loss=0.08284, pruned_loss=0.01437, audio_tagging_loss=0.008121, over 15306.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.1004, pruned_loss=0.01871, audio_tagging_loss=0.01005, over 3059547.91 frames. ], batch size: 58, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:11:42,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1185200.0, ans=0.5 2023-11-20 19:11:43,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1185200.0, ans=0.125 2023-11-20 19:12:00,844 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177800 2023-11-20 19:12:18,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1185400.0, ans=0.0 2023-11-20 19:12:23,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2023-11-20 19:12:36,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185466.6666666667, ans=0.1 2023-11-20 19:12:41,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=12.0 2023-11-20 19:12:41,811 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9500, loss[loss=0.07518, simple_loss=0.09985, pruned_loss=0.01532, audio_tagging_loss=0.009938, over 15919.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09987, pruned_loss=0.01845, audio_tagging_loss=0.01001, over 3049413.61 frames. ], batch size: 60, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:12:56,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1185600.0, ans=0.2 2023-11-20 19:13:05,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177850 2023-11-20 19:13:10,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2023-11-20 19:13:10,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 8.141e+01 8.754e+01 9.457e+01 1.227e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 19:13:47,311 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9550, loss[loss=0.06028, simple_loss=0.07581, pruned_loss=0.01338, audio_tagging_loss=0.008997, over 15572.00 frames. ], tot_loss[loss=0.07809, simple_loss=0.09932, pruned_loss=0.01834, audio_tagging_loss=0.0101, over 3048613.71 frames. ], batch size: 58, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:13:54,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1185866.6666666667, ans=0.125 2023-11-20 19:13:59,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1185933.3333333333, ans=0.0 2023-11-20 19:14:03,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1185933.3333333333, ans=0.0 2023-11-20 19:14:04,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1185933.3333333333, ans=0.125 2023-11-20 19:14:04,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1185933.3333333333, ans=0.5 2023-11-20 19:14:06,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1185933.3333333333, ans=0.125 2023-11-20 19:14:10,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177900 2023-11-20 19:14:16,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1186000.0, ans=0.125 2023-11-20 19:14:20,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2023-11-20 19:14:28,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-11-20 19:14:29,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1186066.6666666667, ans=0.0 2023-11-20 19:14:36,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1186066.6666666667, ans=0.125 2023-11-20 19:14:41,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=22.5 2023-11-20 19:14:51,990 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9600, loss[loss=0.08226, simple_loss=0.1059, pruned_loss=0.0203, audio_tagging_loss=0.009002, over 16570.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.1001, pruned_loss=0.0186, audio_tagging_loss=0.01007, over 3054831.28 frames. ], batch size: 59, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:15:15,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-20 19:15:15,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 177950 2023-11-20 19:15:22,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.290e+01 9.064e+01 1.000e+02 1.775e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-20 19:15:39,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-20 19:15:43,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1186466.6666666667, ans=0.125 2023-11-20 19:15:56,290 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9650, loss[loss=0.07293, simple_loss=0.08731, pruned_loss=0.01786, audio_tagging_loss=0.01142, over 14770.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09978, pruned_loss=0.01861, audio_tagging_loss=0.01006, over 3053791.33 frames. ], batch size: 59, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:16:03,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-11-20 19:16:11,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1186600.0, ans=0.125 2023-11-20 19:16:20,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178000 2023-11-20 19:16:24,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1186666.6666666667, ans=0.1 2023-11-20 19:17:02,200 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9700, loss[loss=0.06508, simple_loss=0.08129, pruned_loss=0.01347, audio_tagging_loss=0.01097, over 15453.00 frames. ], tot_loss[loss=0.07796, simple_loss=0.09907, pruned_loss=0.01848, audio_tagging_loss=0.00995, over 3050696.17 frames. ], batch size: 57, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:17:06,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2023-11-20 19:17:13,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1186866.6666666667, ans=0.07 2023-11-20 19:17:19,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1186933.3333333333, ans=0.125 2023-11-20 19:17:19,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1186933.3333333333, ans=0.125 2023-11-20 19:17:25,266 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178050 2023-11-20 19:17:31,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.867e+01 8.180e+01 8.975e+01 9.665e+01 1.279e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 19:17:31,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187000.0, ans=0.1 2023-11-20 19:17:31,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-20 19:17:37,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1187000.0, ans=0.04949747468305833 2023-11-20 19:17:52,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1187133.3333333333, ans=0.2 2023-11-20 19:18:00,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187133.3333333333, ans=0.1 2023-11-20 19:18:04,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1187133.3333333333, ans=0.0 2023-11-20 19:18:06,477 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9750, loss[loss=0.08645, simple_loss=0.1128, pruned_loss=0.0212, audio_tagging_loss=0.008874, over 15483.00 frames. ], tot_loss[loss=0.07838, simple_loss=0.09998, pruned_loss=0.01858, audio_tagging_loss=0.009816, over 3050312.75 frames. ], batch size: 56, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:18:15,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1187200.0, ans=0.125 2023-11-20 19:18:28,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178100 2023-11-20 19:18:30,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-11-20 19:18:41,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1187333.3333333333, ans=0.125 2023-11-20 19:18:45,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1187400.0, ans=0.0 2023-11-20 19:18:53,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1187400.0, ans=0.125 2023-11-20 19:18:57,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1187466.6666666667, ans=0.125 2023-11-20 19:18:59,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1187466.6666666667, ans=0.0 2023-11-20 19:19:09,563 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9800, loss[loss=0.08062, simple_loss=0.1023, pruned_loss=0.01937, audio_tagging_loss=0.01012, over 14472.00 frames. ], tot_loss[loss=0.07727, simple_loss=0.09831, pruned_loss=0.01836, audio_tagging_loss=0.009756, over 3045401.44 frames. ], batch size: 56, lr: 4.55e-03, grad_scale: 32.0 2023-11-20 19:19:33,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178150 2023-11-20 19:19:34,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1187666.6666666667, ans=0.125 2023-11-20 19:19:39,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.101e+01 8.620e+01 9.316e+01 1.225e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 19:20:05,640 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:20:05,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187800.0, ans=0.1 2023-11-20 19:20:14,287 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9850, loss[loss=0.103, simple_loss=0.1358, pruned_loss=0.02743, audio_tagging_loss=0.007676, over 14736.00 frames. ], tot_loss[loss=0.07766, simple_loss=0.0988, pruned_loss=0.01859, audio_tagging_loss=0.009673, over 3045109.95 frames. ], batch size: 54, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:20:35,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1187933.3333333333, ans=0.125 2023-11-20 19:20:37,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178200 2023-11-20 19:21:02,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1188066.6666666667, ans=0.0 2023-11-20 19:21:09,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1188133.3333333333, ans=0.125 2023-11-20 19:21:19,005 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9900, loss[loss=0.1141, simple_loss=0.1487, pruned_loss=0.03326, audio_tagging_loss=0.006552, over 14918.00 frames. ], tot_loss[loss=0.07855, simple_loss=0.1, pruned_loss=0.01894, audio_tagging_loss=0.00961, over 3048586.39 frames. ], batch size: 54, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:21:41,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178250 2023-11-20 19:21:47,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.012e+01 8.821e+01 9.655e+01 1.193e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 19:22:01,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-20 19:22:22,893 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 9950, loss[loss=0.07603, simple_loss=0.09284, pruned_loss=0.01798, audio_tagging_loss=0.01164, over 13190.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.1, pruned_loss=0.01874, audio_tagging_loss=0.009641, over 3052017.70 frames. ], batch size: 52, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:22:27,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-20 19:22:45,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178300 2023-11-20 19:22:58,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1188666.6666666667, ans=0.125 2023-11-20 19:23:03,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-20 19:23:04,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1188733.3333333333, ans=0.125 2023-11-20 19:23:06,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1188733.3333333333, ans=0.0 2023-11-20 19:23:09,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1188733.3333333333, ans=0.1 2023-11-20 19:23:19,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1188800.0, ans=0.0 2023-11-20 19:23:22,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1188800.0, ans=0.125 2023-11-20 19:23:27,669 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10000, loss[loss=0.09199, simple_loss=0.1118, pruned_loss=0.02498, audio_tagging_loss=0.01109, over 15805.00 frames. ], tot_loss[loss=0.07777, simple_loss=0.09913, pruned_loss=0.01853, audio_tagging_loss=0.009675, over 3050571.06 frames. ], batch size: 59, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:23:44,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1188933.3333333333, ans=0.2 2023-11-20 19:23:50,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178350 2023-11-20 19:23:51,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1189000.0, ans=0.0 2023-11-20 19:23:56,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 7.984e+01 8.693e+01 9.319e+01 1.323e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 19:24:13,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1189066.6666666667, ans=0.125 2023-11-20 19:24:17,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1189133.3333333333, ans=0.95 2023-11-20 19:24:17,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-20 19:24:30,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1189200.0, ans=0.95 2023-11-20 19:24:31,307 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10050, loss[loss=0.0621, simple_loss=0.08594, pruned_loss=0.009936, audio_tagging_loss=0.009192, over 14474.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.0996, pruned_loss=0.01861, audio_tagging_loss=0.009618, over 3059361.76 frames. ], batch size: 55, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:24:48,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1189266.6666666667, ans=0.125 2023-11-20 19:24:50,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1189266.6666666667, ans=0.125 2023-11-20 19:24:53,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178400 2023-11-20 19:24:58,831 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:25:09,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1189400.0, ans=0.0 2023-11-20 19:25:09,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1189400.0, ans=0.125 2023-11-20 19:25:10,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1189400.0, ans=0.95 2023-11-20 19:25:11,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1189400.0, ans=0.0 2023-11-20 19:25:11,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-20 19:25:20,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1189400.0, ans=0.1 2023-11-20 19:25:22,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1189466.6666666667, ans=0.0 2023-11-20 19:25:35,270 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10100, loss[loss=0.07587, simple_loss=0.0901, pruned_loss=0.01871, audio_tagging_loss=0.01211, over 15080.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.1001, pruned_loss=0.01881, audio_tagging_loss=0.009684, over 3051731.05 frames. ], batch size: 58, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:25:35,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1189533.3333333333, ans=0.0 2023-11-20 19:25:39,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1189533.3333333333, ans=0.125 2023-11-20 19:25:55,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1189600.0, ans=22.5 2023-11-20 19:25:58,383 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178450 2023-11-20 19:26:04,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.521e+01 8.112e+01 8.895e+01 9.698e+01 1.490e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 19:26:06,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1189666.6666666667, ans=0.125 2023-11-20 19:26:11,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-11-20 19:26:15,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1189733.3333333333, ans=0.0 2023-11-20 19:26:17,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1189733.3333333333, ans=0.2 2023-11-20 19:26:18,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-20 19:26:25,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1189800.0, ans=0.0 2023-11-20 19:26:26,335 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:26:38,688 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10150, loss[loss=0.08093, simple_loss=0.09573, pruned_loss=0.02123, audio_tagging_loss=0.01184, over 16100.00 frames. ], tot_loss[loss=0.07801, simple_loss=0.09947, pruned_loss=0.0185, audio_tagging_loss=0.009775, over 3054810.62 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:27:03,185 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178500 2023-11-20 19:27:09,272 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:27:18,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1190066.6666666667, ans=0.125 2023-11-20 19:27:43,565 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10200, loss[loss=0.09095, simple_loss=0.1179, pruned_loss=0.02285, audio_tagging_loss=0.009167, over 16174.00 frames. ], tot_loss[loss=0.07861, simple_loss=0.1001, pruned_loss=0.01876, audio_tagging_loss=0.009811, over 3060153.99 frames. ], batch size: 56, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:27:56,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190266.6666666667, ans=0.1 2023-11-20 19:28:06,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178550 2023-11-20 19:28:08,195 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:28:13,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.453e+01 8.265e+01 8.993e+01 9.730e+01 1.182e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 19:28:26,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-20 19:28:36,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1190466.6666666667, ans=0.125 2023-11-20 19:28:39,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1190466.6666666667, ans=0.07 2023-11-20 19:28:48,088 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10250, loss[loss=0.06284, simple_loss=0.07112, pruned_loss=0.01273, audio_tagging_loss=0.01455, over 14733.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.1005, pruned_loss=0.01897, audio_tagging_loss=0.009815, over 3062732.46 frames. ], batch size: 57, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:29:11,254 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178600 2023-11-20 19:29:17,899 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:29:41,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-11-20 19:29:51,530 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10300, loss[loss=0.06975, simple_loss=0.08091, pruned_loss=0.01709, audio_tagging_loss=0.01221, over 16660.00 frames. ], tot_loss[loss=0.07844, simple_loss=0.09962, pruned_loss=0.01876, audio_tagging_loss=0.009871, over 3057107.77 frames. ], batch size: 68, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:30:03,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1190933.3333333333, ans=0.035 2023-11-20 19:30:05,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1190933.3333333333, ans=0.125 2023-11-20 19:30:15,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178650 2023-11-20 19:30:20,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1191000.0, ans=0.0 2023-11-20 19:30:21,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 7.948e+01 8.606e+01 9.117e+01 1.225e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 19:30:44,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1191133.3333333333, ans=0.125 2023-11-20 19:30:56,003 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10350, loss[loss=0.06135, simple_loss=0.06862, pruned_loss=0.01346, audio_tagging_loss=0.01359, over 16228.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.09962, pruned_loss=0.01878, audio_tagging_loss=0.009984, over 3057770.86 frames. ], batch size: 61, lr: 4.54e-03, grad_scale: 16.0 2023-11-20 19:31:06,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1191200.0, ans=0.125 2023-11-20 19:31:13,406 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:31:19,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178700 2023-11-20 19:31:25,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1191333.3333333333, ans=0.125 2023-11-20 19:31:36,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2023-11-20 19:31:39,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1191400.0, ans=0.125 2023-11-20 19:31:51,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1191466.6666666667, ans=0.2 2023-11-20 19:31:56,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1191466.6666666667, ans=0.125 2023-11-20 19:31:58,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1191533.3333333333, ans=0.0 2023-11-20 19:31:59,604 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10400, loss[loss=0.08172, simple_loss=0.1022, pruned_loss=0.0188, audio_tagging_loss=0.01184, over 14438.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09865, pruned_loss=0.01859, audio_tagging_loss=0.01012, over 3057607.75 frames. ], batch size: 54, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:32:23,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178750 2023-11-20 19:32:30,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.343e+01 9.051e+01 9.727e+01 1.230e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-20 19:32:35,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1191666.6666666667, ans=0.125 2023-11-20 19:32:49,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1191800.0, ans=0.125 2023-11-20 19:32:56,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1191800.0, ans=0.125 2023-11-20 19:33:03,403 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10450, loss[loss=0.07996, simple_loss=0.0985, pruned_loss=0.02109, audio_tagging_loss=0.009619, over 14250.00 frames. ], tot_loss[loss=0.07775, simple_loss=0.09821, pruned_loss=0.01854, audio_tagging_loss=0.0101, over 3051158.13 frames. ], batch size: 53, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:33:13,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-20 19:33:17,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2023-11-20 19:33:18,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1191933.3333333333, ans=0.0 2023-11-20 19:33:27,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178800 2023-11-20 19:33:46,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=12.0 2023-11-20 19:33:47,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1192066.6666666667, ans=0.125 2023-11-20 19:34:08,049 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10500, loss[loss=0.07137, simple_loss=0.08583, pruned_loss=0.01886, audio_tagging_loss=0.00959, over 15939.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09927, pruned_loss=0.01874, audio_tagging_loss=0.009963, over 3053264.45 frames. ], batch size: 58, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:34:09,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2023-11-20 19:34:30,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178850 2023-11-20 19:34:34,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1192333.3333333333, ans=0.125 2023-11-20 19:34:37,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1192333.3333333333, ans=0.09899494936611666 2023-11-20 19:34:37,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.449e+01 8.858e+01 9.853e+01 1.185e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 19:34:39,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1192333.3333333333, ans=0.05 2023-11-20 19:34:50,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1192400.0, ans=0.1 2023-11-20 19:34:56,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-11-20 19:35:03,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1192466.6666666667, ans=0.1 2023-11-20 19:35:06,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1192466.6666666667, ans=0.125 2023-11-20 19:35:11,106 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10550, loss[loss=0.06074, simple_loss=0.07858, pruned_loss=0.01385, audio_tagging_loss=0.007607, over 14202.00 frames. ], tot_loss[loss=0.07843, simple_loss=0.09964, pruned_loss=0.01872, audio_tagging_loss=0.009894, over 3057386.64 frames. ], batch size: 54, lr: 4.54e-03, grad_scale: 32.0 2023-11-20 19:35:24,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1192600.0, ans=0.0 2023-11-20 19:35:32,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1192600.0, ans=10.0 2023-11-20 19:35:33,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178900 2023-11-20 19:36:14,475 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10600, loss[loss=0.1005, simple_loss=0.1388, pruned_loss=0.02326, audio_tagging_loss=0.007791, over 15830.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.1005, pruned_loss=0.01898, audio_tagging_loss=0.009777, over 3061419.82 frames. ], batch size: 57, lr: 4.54e-03, grad_scale: 16.0 2023-11-20 19:36:21,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1192866.6666666667, ans=0.0 2023-11-20 19:36:30,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1192933.3333333333, ans=10.0 2023-11-20 19:36:30,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1192933.3333333333, ans=0.0 2023-11-20 19:36:38,431 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 178950 2023-11-20 19:36:43,556 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:36:46,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.146e+01 8.630e+01 9.635e+01 1.232e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-20 19:36:58,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.28 vs. limit=10.0 2023-11-20 19:37:18,815 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10650, loss[loss=0.07153, simple_loss=0.09871, pruned_loss=0.01252, audio_tagging_loss=0.009661, over 14216.00 frames. ], tot_loss[loss=0.07766, simple_loss=0.09875, pruned_loss=0.01849, audio_tagging_loss=0.009796, over 3050695.18 frames. ], batch size: 53, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:37:23,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-20 19:37:25,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1193200.0, ans=0.125 2023-11-20 19:37:32,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1193266.6666666667, ans=0.0 2023-11-20 19:37:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1193266.6666666667, ans=0.125 2023-11-20 19:37:41,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179000 2023-11-20 19:37:41,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-11-20 19:37:53,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1193333.3333333333, ans=0.1 2023-11-20 19:38:02,831 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:38:05,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1193400.0, ans=0.125 2023-11-20 19:38:22,584 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10700, loss[loss=0.08738, simple_loss=0.1207, pruned_loss=0.01897, audio_tagging_loss=0.008076, over 15757.00 frames. ], tot_loss[loss=0.07725, simple_loss=0.09839, pruned_loss=0.01831, audio_tagging_loss=0.00974, over 3051800.40 frames. ], batch size: 59, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:38:35,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1193600.0, ans=0.125 2023-11-20 19:38:45,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179050 2023-11-20 19:38:49,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1193666.6666666667, ans=0.1 2023-11-20 19:38:54,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.600e+01 8.026e+01 8.604e+01 9.471e+01 1.335e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 19:39:06,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-20 19:39:14,490 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:39:19,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-11-20 19:39:25,044 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10750, loss[loss=0.08981, simple_loss=0.1117, pruned_loss=0.02335, audio_tagging_loss=0.01059, over 17093.00 frames. ], tot_loss[loss=0.07651, simple_loss=0.09744, pruned_loss=0.01805, audio_tagging_loss=0.009739, over 3050120.29 frames. ], batch size: 63, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:39:39,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1193933.3333333333, ans=0.0 2023-11-20 19:39:47,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179100 2023-11-20 19:40:28,465 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10800, loss[loss=0.07271, simple_loss=0.09436, pruned_loss=0.01549, audio_tagging_loss=0.01004, over 14517.00 frames. ], tot_loss[loss=0.07699, simple_loss=0.09823, pruned_loss=0.01819, audio_tagging_loss=0.009691, over 3054355.56 frames. ], batch size: 56, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:40:50,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179150 2023-11-20 19:41:00,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.337e+01 8.834e+01 9.923e+01 1.321e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 19:41:01,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1194333.3333333333, ans=0.0 2023-11-20 19:41:03,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-20 19:41:07,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1194400.0, ans=0.125 2023-11-20 19:41:20,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1194466.6666666667, ans=0.1 2023-11-20 19:41:31,481 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10850, loss[loss=0.08151, simple_loss=0.1087, pruned_loss=0.0195, audio_tagging_loss=0.007665, over 15564.00 frames. ], tot_loss[loss=0.07704, simple_loss=0.09817, pruned_loss=0.01827, audio_tagging_loss=0.009686, over 3048665.60 frames. ], batch size: 57, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:41:34,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-20 19:41:45,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1194600.0, ans=0.0 2023-11-20 19:41:53,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179200 2023-11-20 19:41:54,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-11-20 19:42:15,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1194733.3333333333, ans=0.1 2023-11-20 19:42:16,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1194733.3333333333, ans=0.04949747468305833 2023-11-20 19:42:30,663 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:42:34,252 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10900, loss[loss=0.06572, simple_loss=0.08615, pruned_loss=0.01141, audio_tagging_loss=0.01124, over 15033.00 frames. ], tot_loss[loss=0.0772, simple_loss=0.09817, pruned_loss=0.01838, audio_tagging_loss=0.00974, over 3048023.76 frames. ], batch size: 56, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:42:57,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-20 19:42:58,127 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179250 2023-11-20 19:43:08,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.081e+01 8.823e+01 9.357e+01 1.279e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 19:43:17,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1195066.6666666667, ans=0.125 2023-11-20 19:43:20,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1195066.6666666667, ans=0.125 2023-11-20 19:43:27,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1195133.3333333333, ans=0.0 2023-11-20 19:43:38,313 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 10950, loss[loss=0.06115, simple_loss=0.07545, pruned_loss=0.01211, audio_tagging_loss=0.01132, over 15324.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.09752, pruned_loss=0.01825, audio_tagging_loss=0.009785, over 3045244.43 frames. ], batch size: 56, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:43:59,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1195266.6666666667, ans=0.0 2023-11-20 19:44:01,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179300 2023-11-20 19:44:03,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-20 19:44:11,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1195333.3333333333, ans=0.125 2023-11-20 19:44:18,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1195400.0, ans=0.95 2023-11-20 19:44:26,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1195400.0, ans=0.0 2023-11-20 19:44:42,834 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11000, loss[loss=0.08738, simple_loss=0.1163, pruned_loss=0.01949, audio_tagging_loss=0.009734, over 14994.00 frames. ], tot_loss[loss=0.07708, simple_loss=0.09797, pruned_loss=0.01838, audio_tagging_loss=0.009722, over 3049983.99 frames. ], batch size: 55, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:44:51,298 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:44:58,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1195600.0, ans=0.1 2023-11-20 19:45:03,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-20 19:45:04,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179350 2023-11-20 19:45:15,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.189e+01 8.881e+01 9.768e+01 1.271e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 19:45:15,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1195666.6666666667, ans=0.125 2023-11-20 19:45:28,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1195733.3333333333, ans=0.125 2023-11-20 19:45:45,447 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11050, loss[loss=0.09301, simple_loss=0.1227, pruned_loss=0.02555, audio_tagging_loss=0.006107, over 15461.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09961, pruned_loss=0.01885, audio_tagging_loss=0.009687, over 3056381.34 frames. ], batch size: 57, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:45:54,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1195866.6666666667, ans=0.2 2023-11-20 19:46:04,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1195933.3333333333, ans=0.125 2023-11-20 19:46:07,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179400 2023-11-20 19:46:31,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1196066.6666666667, ans=0.125 2023-11-20 19:46:39,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-11-20 19:46:45,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1196133.3333333333, ans=0.2 2023-11-20 19:46:48,009 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11100, loss[loss=0.07421, simple_loss=0.09114, pruned_loss=0.01789, audio_tagging_loss=0.01075, over 14467.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.09893, pruned_loss=0.01858, audio_tagging_loss=0.009937, over 3044393.27 frames. ], batch size: 54, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:46:51,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-20 19:47:10,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1196266.6666666667, ans=0.125 2023-11-20 19:47:11,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179450 2023-11-20 19:47:21,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.315e+01 8.689e+01 9.550e+01 1.347e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 19:47:42,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1196466.6666666667, ans=0.125 2023-11-20 19:47:52,015 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11150, loss[loss=0.06575, simple_loss=0.08239, pruned_loss=0.01414, audio_tagging_loss=0.01042, over 15244.00 frames. ], tot_loss[loss=0.07744, simple_loss=0.09818, pruned_loss=0.01833, audio_tagging_loss=0.01002, over 3044740.52 frames. ], batch size: 59, lr: 4.53e-03, grad_scale: 16.0 2023-11-20 19:47:53,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-20 19:48:13,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179500 2023-11-20 19:48:54,078 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11200, loss[loss=0.073, simple_loss=0.0992, pruned_loss=0.01293, audio_tagging_loss=0.01047, over 15604.00 frames. ], tot_loss[loss=0.07757, simple_loss=0.09847, pruned_loss=0.01829, audio_tagging_loss=0.01004, over 3047859.67 frames. ], batch size: 58, lr: 4.53e-03, grad_scale: 32.0 2023-11-20 19:49:06,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1196933.3333333333, ans=0.125 2023-11-20 19:49:16,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179550 2023-11-20 19:49:23,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2023-11-20 19:49:26,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.291e+01 7.963e+01 8.549e+01 9.018e+01 1.277e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-20 19:49:37,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1197066.6666666667, ans=0.1 2023-11-20 19:49:38,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1197066.6666666667, ans=0.0 2023-11-20 19:49:42,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1197066.6666666667, ans=0.2 2023-11-20 19:49:52,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1197133.3333333333, ans=0.125 2023-11-20 19:49:56,602 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11250, loss[loss=0.07008, simple_loss=0.08619, pruned_loss=0.0176, audio_tagging_loss=0.009386, over 16858.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09759, pruned_loss=0.01816, audio_tagging_loss=0.01011, over 3049149.42 frames. ], batch size: 63, lr: 4.53e-03, grad_scale: 32.0 2023-11-20 19:50:11,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1197266.6666666667, ans=0.2 2023-11-20 19:50:16,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1197266.6666666667, ans=0.1 2023-11-20 19:50:20,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179600 2023-11-20 19:50:28,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1197333.3333333333, ans=0.2 2023-11-20 19:50:47,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1197466.6666666667, ans=0.0 2023-11-20 19:50:52,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1197466.6666666667, ans=0.1 2023-11-20 19:50:59,725 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11300, loss[loss=0.07118, simple_loss=0.09216, pruned_loss=0.01508, audio_tagging_loss=0.01001, over 14930.00 frames. ], tot_loss[loss=0.07708, simple_loss=0.09784, pruned_loss=0.01822, audio_tagging_loss=0.009934, over 3043982.71 frames. ], batch size: 56, lr: 4.53e-03, grad_scale: 32.0 2023-11-20 19:51:01,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1197533.3333333333, ans=0.125 2023-11-20 19:51:04,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1197533.3333333333, ans=0.125 2023-11-20 19:51:13,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1197600.0, ans=0.0 2023-11-20 19:51:17,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1197600.0, ans=0.1 2023-11-20 19:51:22,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179650 2023-11-20 19:51:24,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1197666.6666666667, ans=0.07 2023-11-20 19:51:32,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.320e+01 8.034e+01 8.460e+01 9.162e+01 1.233e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-20 19:51:55,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1197800.0, ans=0.0 2023-11-20 19:51:57,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1197800.0, ans=0.1 2023-11-20 19:52:03,238 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11350, loss[loss=0.06652, simple_loss=0.08114, pruned_loss=0.01641, audio_tagging_loss=0.009542, over 15147.00 frames. ], tot_loss[loss=0.07673, simple_loss=0.09745, pruned_loss=0.01822, audio_tagging_loss=0.00978, over 3038099.40 frames. ], batch size: 57, lr: 4.53e-03, grad_scale: 32.0 2023-11-20 19:52:05,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2023-11-20 19:52:24,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1197933.3333333333, ans=0.1 2023-11-20 19:52:25,662 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179700 2023-11-20 19:52:59,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1198133.3333333333, ans=0.0 2023-11-20 19:53:04,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1198200.0, ans=0.0 2023-11-20 19:53:04,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1198200.0, ans=0.125 2023-11-20 19:53:05,833 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11400, loss[loss=0.08533, simple_loss=0.1152, pruned_loss=0.01908, audio_tagging_loss=0.008668, over 16735.00 frames. ], tot_loss[loss=0.07663, simple_loss=0.09747, pruned_loss=0.01814, audio_tagging_loss=0.00976, over 3041436.75 frames. ], batch size: 62, lr: 4.53e-03, grad_scale: 32.0 2023-11-20 19:53:26,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1198266.6666666667, ans=0.1 2023-11-20 19:53:29,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179750 2023-11-20 19:53:35,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1198333.3333333333, ans=0.125 2023-11-20 19:53:39,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.357e+01 7.998e+01 8.618e+01 9.401e+01 1.167e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 19:53:56,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1198466.6666666667, ans=0.125 2023-11-20 19:54:09,906 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11450, loss[loss=0.07744, simple_loss=0.1026, pruned_loss=0.01837, audio_tagging_loss=0.007758, over 15623.00 frames. ], tot_loss[loss=0.07658, simple_loss=0.09726, pruned_loss=0.01813, audio_tagging_loss=0.009816, over 3046563.18 frames. ], batch size: 59, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:54:13,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 19:54:22,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1198600.0, ans=0.125 2023-11-20 19:54:24,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-20 19:54:32,759 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179800 2023-11-20 19:54:44,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1198666.6666666667, ans=0.125 2023-11-20 19:54:45,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1198666.6666666667, ans=10.0 2023-11-20 19:54:53,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-20 19:55:04,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-11-20 19:55:06,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1198800.0, ans=0.125 2023-11-20 19:55:13,637 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11500, loss[loss=0.06949, simple_loss=0.08196, pruned_loss=0.01847, audio_tagging_loss=0.01004, over 16091.00 frames. ], tot_loss[loss=0.07794, simple_loss=0.09892, pruned_loss=0.01867, audio_tagging_loss=0.009807, over 3050378.51 frames. ], batch size: 63, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:55:31,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1198933.3333333333, ans=0.0 2023-11-20 19:55:36,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179850 2023-11-20 19:55:41,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1199000.0, ans=0.2 2023-11-20 19:55:42,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1199000.0, ans=0.0 2023-11-20 19:55:47,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.150e+01 8.270e+01 9.064e+01 9.843e+01 1.286e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-20 19:55:54,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1199066.6666666667, ans=0.125 2023-11-20 19:56:17,602 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11550, loss[loss=0.07689, simple_loss=0.09878, pruned_loss=0.01981, audio_tagging_loss=0.007688, over 16005.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.09989, pruned_loss=0.01876, audio_tagging_loss=0.009755, over 3053504.09 frames. ], batch size: 62, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:56:21,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1199200.0, ans=0.125 2023-11-20 19:56:27,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1199200.0, ans=0.0 2023-11-20 19:56:35,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-11-20 19:56:41,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179900 2023-11-20 19:56:44,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1199333.3333333333, ans=0.125 2023-11-20 19:56:56,110 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 19:57:03,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1199400.0, ans=0.0 2023-11-20 19:57:04,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1199400.0, ans=0.0 2023-11-20 19:57:22,583 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11600, loss[loss=0.09696, simple_loss=0.1213, pruned_loss=0.02832, audio_tagging_loss=0.007994, over 16098.00 frames. ], tot_loss[loss=0.07913, simple_loss=0.1004, pruned_loss=0.01917, audio_tagging_loss=0.009745, over 3054286.98 frames. ], batch size: 60, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:57:25,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1199533.3333333333, ans=0.0 2023-11-20 19:57:41,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1199600.0, ans=0.1 2023-11-20 19:57:44,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 179950 2023-11-20 19:57:55,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.413e+01 9.073e+01 1.016e+02 1.405e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-20 19:57:56,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1199666.6666666667, ans=0.0 2023-11-20 19:58:12,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1199800.0, ans=0.1 2023-11-20 19:58:22,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1199800.0, ans=0.5 2023-11-20 19:58:26,090 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11650, loss[loss=0.07148, simple_loss=0.09666, pruned_loss=0.01253, audio_tagging_loss=0.01062, over 15812.00 frames. ], tot_loss[loss=0.079, simple_loss=0.1003, pruned_loss=0.01913, audio_tagging_loss=0.009714, over 3052663.53 frames. ], batch size: 58, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:58:48,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180000 2023-11-20 19:58:50,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1200000.0, ans=0.125 2023-11-20 19:58:54,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1200000.0, ans=0.125 2023-11-20 19:58:57,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=15.0 2023-11-20 19:59:02,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1200000.0, ans=0.035 2023-11-20 19:59:04,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1200000.0, ans=0.2 2023-11-20 19:59:15,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1200066.6666666667, ans=0.125 2023-11-20 19:59:17,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1200066.6666666667, ans=0.0 2023-11-20 19:59:18,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1200133.3333333333, ans=0.0 2023-11-20 19:59:32,097 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11700, loss[loss=0.06911, simple_loss=0.08683, pruned_loss=0.01506, audio_tagging_loss=0.01063, over 14865.00 frames. ], tot_loss[loss=0.07811, simple_loss=0.09903, pruned_loss=0.01877, audio_tagging_loss=0.009823, over 3049816.42 frames. ], batch size: 55, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 19:59:38,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1200200.0, ans=0.0 2023-11-20 19:59:47,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-11-20 19:59:55,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180050 2023-11-20 20:00:06,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.544e+01 8.114e+01 8.755e+01 9.534e+01 1.280e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 20:00:27,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1200466.6666666667, ans=0.125 2023-11-20 20:00:35,967 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11750, loss[loss=0.06838, simple_loss=0.08393, pruned_loss=0.01149, audio_tagging_loss=0.01492, over 15195.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.0981, pruned_loss=0.01849, audio_tagging_loss=0.009962, over 3048984.80 frames. ], batch size: 56, lr: 4.52e-03, grad_scale: 16.0 2023-11-20 20:00:40,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-20 20:00:52,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-20 20:00:58,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180100 2023-11-20 20:01:02,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1200666.6666666667, ans=0.5 2023-11-20 20:01:09,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1200666.6666666667, ans=0.125 2023-11-20 20:01:14,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=15.0 2023-11-20 20:01:29,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1200800.0, ans=10.0 2023-11-20 20:01:40,530 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11800, loss[loss=0.1014, simple_loss=0.1281, pruned_loss=0.02836, audio_tagging_loss=0.008973, over 14921.00 frames. ], tot_loss[loss=0.07832, simple_loss=0.09912, pruned_loss=0.01886, audio_tagging_loss=0.009898, over 3046170.26 frames. ], batch size: 54, lr: 4.52e-03, grad_scale: 16.0 2023-11-20 20:02:03,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180150 2023-11-20 20:02:09,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1201000.0, ans=0.0 2023-11-20 20:02:15,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.209e+01 9.058e+01 9.835e+01 1.362e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-20 20:02:34,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1201133.3333333333, ans=0.0 2023-11-20 20:02:41,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1201133.3333333333, ans=0.1 2023-11-20 20:02:43,942 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11850, loss[loss=0.09191, simple_loss=0.1221, pruned_loss=0.02411, audio_tagging_loss=0.006774, over 15969.00 frames. ], tot_loss[loss=0.07813, simple_loss=0.09881, pruned_loss=0.01877, audio_tagging_loss=0.009953, over 3043249.62 frames. ], batch size: 57, lr: 4.52e-03, grad_scale: 16.0 2023-11-20 20:02:57,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1201266.6666666667, ans=0.125 2023-11-20 20:03:07,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180200 2023-11-20 20:03:15,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1201333.3333333333, ans=0.1 2023-11-20 20:03:17,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1201333.3333333333, ans=0.0 2023-11-20 20:03:42,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1201466.6666666667, ans=0.2 2023-11-20 20:03:48,561 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11900, loss[loss=0.06803, simple_loss=0.08912, pruned_loss=0.01382, audio_tagging_loss=0.009648, over 14920.00 frames. ], tot_loss[loss=0.07797, simple_loss=0.09871, pruned_loss=0.01857, audio_tagging_loss=0.01004, over 3043014.86 frames. ], batch size: 56, lr: 4.52e-03, grad_scale: 16.0 2023-11-20 20:03:53,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-20 20:04:05,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1201600.0, ans=0.125 2023-11-20 20:04:08,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1201600.0, ans=0.0 2023-11-20 20:04:11,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180250 2023-11-20 20:04:22,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.009e+01 8.683e+01 9.562e+01 2.232e+02, threshold=1.737e+02, percent-clipped=1.0 2023-11-20 20:04:34,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1201733.3333333333, ans=0.2 2023-11-20 20:04:39,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1201800.0, ans=0.125 2023-11-20 20:04:52,298 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 11950, loss[loss=0.06476, simple_loss=0.08165, pruned_loss=0.01476, audio_tagging_loss=0.009177, over 15675.00 frames. ], tot_loss[loss=0.07766, simple_loss=0.09829, pruned_loss=0.01844, audio_tagging_loss=0.01008, over 3044777.48 frames. ], batch size: 59, lr: 4.52e-03, grad_scale: 16.0 2023-11-20 20:04:56,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1201866.6666666667, ans=0.2 2023-11-20 20:05:14,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180300 2023-11-20 20:05:17,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1202000.0, ans=0.125 2023-11-20 20:05:52,618 INFO [train_asr.py:1221] (1/4) Epoch 15, batch 12000, loss[loss=0.1024, simple_loss=0.1284, pruned_loss=0.03104, audio_tagging_loss=0.007172, over 16231.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09913, pruned_loss=0.01872, audio_tagging_loss=0.01013, over 3047264.35 frames. ], batch size: 56, lr: 4.52e-03, grad_scale: 32.0 2023-11-20 20:05:52,619 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 20:06:29,436 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1484, 2.3134, 5.0319, 2.5901], device='cuda:1') 2023-11-20 20:06:33,105 INFO [train_asr.py:1253] (1/4) Epoch 15, validation: loss=0.06134, simple_loss=0.05315, pruned_loss=0.00551, audio_tagging_loss=0.02926, over 4681554.00 frames. 2023-11-20 20:06:33,106 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 20:06:50,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1202266.6666666667, ans=0.125 2023-11-20 20:06:54,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180350 2023-11-20 20:06:56,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1202333.3333333333, ans=0.1 2023-11-20 20:07:37,553 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 0, loss[loss=0.1065, simple_loss=0.1142, pruned_loss=0.02819, audio_tagging_loss=0.02125, over 15044.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1142, pruned_loss=0.02819, audio_tagging_loss=0.02125, over 15044.00 frames. ], batch size: 57, lr: 4.37e-03, grad_scale: 32.0 2023-11-20 20:07:37,554 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 20:08:13,621 INFO [train_asr.py:1253] (1/4) Epoch 16, validation: loss=0.06129, simple_loss=0.0532, pruned_loss=0.005566, audio_tagging_loss=0.02913, over 4681554.00 frames. 2023-11-20 20:08:13,622 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 20:08:17,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.389e+01 9.121e+01 1.002e+02 1.446e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 20:08:39,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-20 20:08:44,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1202493.3333333333, ans=0.2 2023-11-20 20:09:05,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1202626.6666666667, ans=0.0 2023-11-20 20:09:11,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180400 2023-11-20 20:09:14,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:09:19,013 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 50, loss[loss=0.07377, simple_loss=0.08671, pruned_loss=0.0128, audio_tagging_loss=0.01761, over 14530.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1005, pruned_loss=0.01864, audio_tagging_loss=0.01902, over 690130.74 frames. ], batch size: 55, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:09:39,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1202760.0, ans=0.2 2023-11-20 20:09:57,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1202893.3333333333, ans=0.2 2023-11-20 20:10:16,030 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180450 2023-11-20 20:10:23,396 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 100, loss[loss=0.0796, simple_loss=0.1006, pruned_loss=0.01539, audio_tagging_loss=0.01389, over 16223.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.09687, pruned_loss=0.01777, audio_tagging_loss=0.01851, over 1212012.30 frames. ], batch size: 60, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:10:29,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.922e+01 9.752e+01 1.060e+02 1.405e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-20 20:11:21,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180500 2023-11-20 20:11:28,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1203360.0, ans=0.0 2023-11-20 20:11:29,084 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 150, loss[loss=0.06185, simple_loss=0.06975, pruned_loss=0.01624, audio_tagging_loss=0.01074, over 13913.00 frames. ], tot_loss[loss=0.0831, simple_loss=0.09785, pruned_loss=0.01782, audio_tagging_loss=0.01636, over 1615482.94 frames. ], batch size: 53, lr: 4.37e-03, grad_scale: 8.0 2023-11-20 20:11:42,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=15.0 2023-11-20 20:12:14,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1203560.0, ans=0.2 2023-11-20 20:12:25,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180550 2023-11-20 20:12:31,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1203626.6666666667, ans=0.125 2023-11-20 20:12:33,433 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 200, loss[loss=0.06037, simple_loss=0.07214, pruned_loss=0.01329, audio_tagging_loss=0.01101, over 15888.00 frames. ], tot_loss[loss=0.08186, simple_loss=0.09757, pruned_loss=0.01845, audio_tagging_loss=0.01463, over 1928653.48 frames. ], batch size: 59, lr: 4.37e-03, grad_scale: 8.0 2023-11-20 20:12:39,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.158e+01 8.825e+01 9.398e+01 1.176e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 20:12:52,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203760.0, ans=0.1 2023-11-20 20:13:12,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1203893.3333333333, ans=0.125 2023-11-20 20:13:17,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1203893.3333333333, ans=0.125 2023-11-20 20:13:22,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1203893.3333333333, ans=0.125 2023-11-20 20:13:29,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180600 2023-11-20 20:13:32,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1203960.0, ans=0.125 2023-11-20 20:13:36,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1204026.6666666667, ans=0.125 2023-11-20 20:13:37,517 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 250, loss[loss=0.06142, simple_loss=0.07075, pruned_loss=0.01538, audio_tagging_loss=0.01067, over 14660.00 frames. ], tot_loss[loss=0.08158, simple_loss=0.09939, pruned_loss=0.01882, audio_tagging_loss=0.01307, over 2173683.41 frames. ], batch size: 59, lr: 4.37e-03, grad_scale: 8.0 2023-11-20 20:13:52,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1204093.3333333333, ans=0.125 2023-11-20 20:13:52,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.27 vs. limit=10.0 2023-11-20 20:14:02,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1204160.0, ans=0.2 2023-11-20 20:14:04,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2023-11-20 20:14:05,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1204160.0, ans=0.125 2023-11-20 20:14:14,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1204160.0, ans=0.0 2023-11-20 20:14:19,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1204226.6666666667, ans=0.125 2023-11-20 20:14:35,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180650 2023-11-20 20:14:42,348 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 300, loss[loss=0.08505, simple_loss=0.1058, pruned_loss=0.02271, audio_tagging_loss=0.009448, over 15377.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.09904, pruned_loss=0.01869, audio_tagging_loss=0.01221, over 2364122.99 frames. ], batch size: 58, lr: 4.37e-03, grad_scale: 8.0 2023-11-20 20:14:48,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.454e+01 9.235e+01 1.018e+02 1.447e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-20 20:15:08,462 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:15:16,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1204493.3333333333, ans=10.0 2023-11-20 20:15:18,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1204493.3333333333, ans=0.125 2023-11-20 20:15:36,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1204626.6666666667, ans=0.0 2023-11-20 20:15:39,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180700 2023-11-20 20:15:46,861 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 350, loss[loss=0.08585, simple_loss=0.1094, pruned_loss=0.02368, audio_tagging_loss=0.007461, over 15795.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1005, pruned_loss=0.01882, audio_tagging_loss=0.01131, over 2522345.67 frames. ], batch size: 59, lr: 4.37e-03, grad_scale: 8.0 2023-11-20 20:16:06,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2023-11-20 20:16:17,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-11-20 20:16:26,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2023-11-20 20:16:37,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1204893.3333333333, ans=0.125 2023-11-20 20:16:40,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1204960.0, ans=0.125 2023-11-20 20:16:44,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180750 2023-11-20 20:16:51,455 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 400, loss[loss=0.08375, simple_loss=0.1055, pruned_loss=0.0243, audio_tagging_loss=0.00671, over 16259.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.1004, pruned_loss=0.01884, audio_tagging_loss=0.01081, over 2639189.83 frames. ], batch size: 61, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:16:54,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1205026.6666666667, ans=0.125 2023-11-20 20:16:58,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.022e+01 8.670e+01 9.315e+01 1.277e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 20:17:23,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1205160.0, ans=0.125 2023-11-20 20:17:34,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1205226.6666666667, ans=0.125 2023-11-20 20:17:34,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1205226.6666666667, ans=0.2 2023-11-20 20:17:45,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1205293.3333333333, ans=0.0 2023-11-20 20:17:48,883 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180800 2023-11-20 20:17:56,387 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 450, loss[loss=0.06831, simple_loss=0.09542, pruned_loss=0.01084, audio_tagging_loss=0.00976, over 16067.00 frames. ], tot_loss[loss=0.07883, simple_loss=0.09931, pruned_loss=0.01857, audio_tagging_loss=0.01061, over 2736372.47 frames. ], batch size: 58, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:18:14,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-20 20:18:15,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2023-11-20 20:18:21,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1205493.3333333333, ans=0.125 2023-11-20 20:18:52,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180850 2023-11-20 20:18:59,866 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 500, loss[loss=0.07079, simple_loss=0.08607, pruned_loss=0.01607, audio_tagging_loss=0.01168, over 13789.00 frames. ], tot_loss[loss=0.07814, simple_loss=0.09851, pruned_loss=0.01844, audio_tagging_loss=0.01046, over 2800917.10 frames. ], batch size: 53, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:19:07,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.156e+01 8.896e+01 9.725e+01 1.312e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 20:19:26,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1205826.6666666667, ans=0.125 2023-11-20 20:19:56,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180900 2023-11-20 20:19:58,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1205960.0, ans=0.1 2023-11-20 20:20:03,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-11-20 20:20:03,748 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 550, loss[loss=0.06706, simple_loss=0.08663, pruned_loss=0.01329, audio_tagging_loss=0.01045, over 15179.00 frames. ], tot_loss[loss=0.07818, simple_loss=0.09852, pruned_loss=0.01856, audio_tagging_loss=0.01036, over 2862726.22 frames. ], batch size: 62, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:20:04,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1206026.6666666667, ans=0.0 2023-11-20 20:20:39,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1206160.0, ans=0.2 2023-11-20 20:20:43,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-20 20:20:51,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1206226.6666666667, ans=0.125 2023-11-20 20:20:53,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1206293.3333333333, ans=0.1 2023-11-20 20:20:59,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 180950 2023-11-20 20:21:06,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1206360.0, ans=0.0 2023-11-20 20:21:07,302 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 600, loss[loss=0.1279, simple_loss=0.1713, pruned_loss=0.03517, audio_tagging_loss=0.007112, over 15709.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09912, pruned_loss=0.01872, audio_tagging_loss=0.01026, over 2903166.99 frames. ], batch size: 60, lr: 4.37e-03, grad_scale: 16.0 2023-11-20 20:21:13,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.009e+01 8.820e+01 9.311e+01 1.105e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 20:21:23,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1206426.6666666667, ans=0.125 2023-11-20 20:21:43,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1206560.0, ans=0.125 2023-11-20 20:21:50,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1206560.0, ans=0.125 2023-11-20 20:21:51,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1206560.0, ans=0.0 2023-11-20 20:21:58,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1206626.6666666667, ans=0.0 2023-11-20 20:22:02,623 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181000 2023-11-20 20:22:10,121 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 650, loss[loss=0.06731, simple_loss=0.08604, pruned_loss=0.01461, audio_tagging_loss=0.009674, over 15664.00 frames. ], tot_loss[loss=0.07851, simple_loss=0.09924, pruned_loss=0.01874, audio_tagging_loss=0.01014, over 2938271.16 frames. ], batch size: 59, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:22:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1206693.3333333333, ans=0.2 2023-11-20 20:22:16,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=22.5 2023-11-20 20:22:26,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1206760.0, ans=0.125 2023-11-20 20:22:46,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1206826.6666666667, ans=0.1 2023-11-20 20:22:47,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1206893.3333333333, ans=0.125 2023-11-20 20:23:06,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181050 2023-11-20 20:23:14,270 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 700, loss[loss=0.07982, simple_loss=0.1035, pruned_loss=0.01812, audio_tagging_loss=0.009965, over 14771.00 frames. ], tot_loss[loss=0.07797, simple_loss=0.09867, pruned_loss=0.01851, audio_tagging_loss=0.01012, over 2963113.39 frames. ], batch size: 53, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:23:20,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.218e+01 8.797e+01 9.608e+01 1.489e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 20:23:35,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1207093.3333333333, ans=0.1 2023-11-20 20:23:57,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1207226.6666666667, ans=0.1 2023-11-20 20:23:58,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1207226.6666666667, ans=0.125 2023-11-20 20:24:09,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181100 2023-11-20 20:24:15,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-11-20 20:24:17,483 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 750, loss[loss=0.09777, simple_loss=0.1281, pruned_loss=0.02591, audio_tagging_loss=0.007817, over 15399.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.09906, pruned_loss=0.01851, audio_tagging_loss=0.01011, over 2988727.91 frames. ], batch size: 55, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:24:20,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1207360.0, ans=0.125 2023-11-20 20:24:20,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1207360.0, ans=0.0 2023-11-20 20:24:24,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1207360.0, ans=10.0 2023-11-20 20:24:24,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1207360.0, ans=0.125 2023-11-20 20:24:30,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-20 20:24:40,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1207426.6666666667, ans=0.0 2023-11-20 20:25:13,038 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181150 2023-11-20 20:25:17,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-20 20:25:18,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-20 20:25:19,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1207693.3333333333, ans=0.0 2023-11-20 20:25:20,347 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 800, loss[loss=0.07652, simple_loss=0.09527, pruned_loss=0.01842, audio_tagging_loss=0.01047, over 15189.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09948, pruned_loss=0.01876, audio_tagging_loss=0.0102, over 3002561.12 frames. ], batch size: 55, lr: 4.36e-03, grad_scale: 32.0 2023-11-20 20:25:26,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.116e+01 8.797e+01 9.654e+01 1.312e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 20:25:36,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1207760.0, ans=0.0 2023-11-20 20:25:54,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1207826.6666666667, ans=0.125 2023-11-20 20:26:16,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181200 2023-11-20 20:26:23,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1208026.6666666667, ans=0.125 2023-11-20 20:26:23,979 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 850, loss[loss=0.07974, simple_loss=0.1004, pruned_loss=0.02249, audio_tagging_loss=0.007023, over 13369.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09944, pruned_loss=0.01863, audio_tagging_loss=0.01019, over 3009363.11 frames. ], batch size: 52, lr: 4.36e-03, grad_scale: 32.0 2023-11-20 20:26:39,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1208093.3333333333, ans=0.125 2023-11-20 20:26:51,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-20 20:27:06,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1208226.6666666667, ans=0.125 2023-11-20 20:27:16,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2023-11-20 20:27:20,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181250 2023-11-20 20:27:28,690 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 900, loss[loss=0.06401, simple_loss=0.08103, pruned_loss=0.0115, audio_tagging_loss=0.012, over 14271.00 frames. ], tot_loss[loss=0.0782, simple_loss=0.099, pruned_loss=0.01852, audio_tagging_loss=0.01018, over 3020184.75 frames. ], batch size: 56, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:27:35,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.347e+01 8.999e+01 9.634e+01 1.332e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 20:27:53,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2023-11-20 20:28:23,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1208626.6666666667, ans=0.0 2023-11-20 20:28:24,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181300 2023-11-20 20:28:27,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2023-11-20 20:28:31,771 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 950, loss[loss=0.06966, simple_loss=0.08331, pruned_loss=0.01773, audio_tagging_loss=0.01028, over 15449.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09954, pruned_loss=0.01866, audio_tagging_loss=0.01017, over 3029940.99 frames. ], batch size: 61, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:28:32,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1208693.3333333333, ans=0.125 2023-11-20 20:28:38,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1208693.3333333333, ans=0.1 2023-11-20 20:28:49,719 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:28:49,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-20 20:28:50,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1208760.0, ans=0.125 2023-11-20 20:29:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1208826.6666666667, ans=0.125 2023-11-20 20:29:12,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1208893.3333333333, ans=0.09899494936611666 2023-11-20 20:29:22,982 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:29:27,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181350 2023-11-20 20:29:35,684 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1000, loss[loss=0.09009, simple_loss=0.1252, pruned_loss=0.02247, audio_tagging_loss=0.004995, over 16086.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09886, pruned_loss=0.01848, audio_tagging_loss=0.009981, over 3030249.72 frames. ], batch size: 60, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:29:42,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.425e+01 8.384e+01 9.027e+01 9.924e+01 1.224e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-20 20:29:55,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1209093.3333333333, ans=0.1 2023-11-20 20:30:02,958 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 20:30:11,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1209160.0, ans=0.125 2023-11-20 20:30:19,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1209226.6666666667, ans=0.125 2023-11-20 20:30:28,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-20 20:30:31,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181400 2023-11-20 20:30:37,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1209293.3333333333, ans=0.2 2023-11-20 20:30:40,294 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1050, loss[loss=0.06655, simple_loss=0.08826, pruned_loss=0.01202, audio_tagging_loss=0.0104, over 14792.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.09859, pruned_loss=0.01826, audio_tagging_loss=0.009897, over 3039693.72 frames. ], batch size: 55, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:30:48,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1209360.0, ans=0.125 2023-11-20 20:30:51,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1209360.0, ans=0.0 2023-11-20 20:30:54,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1209426.6666666667, ans=0.125 2023-11-20 20:31:26,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1209560.0, ans=0.1 2023-11-20 20:31:33,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1209626.6666666667, ans=0.125 2023-11-20 20:31:35,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181450 2023-11-20 20:31:42,798 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1100, loss[loss=0.1008, simple_loss=0.1294, pruned_loss=0.02706, audio_tagging_loss=0.009051, over 15809.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.09953, pruned_loss=0.01864, audio_tagging_loss=0.009764, over 3038608.37 frames. ], batch size: 58, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:31:45,252 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 20:31:45,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1209693.3333333333, ans=0.125 2023-11-20 20:31:49,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.927e+01 7.892e+01 8.572e+01 9.142e+01 1.110e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-20 20:32:20,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1209893.3333333333, ans=0.125 2023-11-20 20:32:38,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181500 2023-11-20 20:32:38,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1209960.0, ans=0.05 2023-11-20 20:32:46,231 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1150, loss[loss=0.06332, simple_loss=0.07964, pruned_loss=0.01448, audio_tagging_loss=0.009022, over 14325.00 frames. ], tot_loss[loss=0.07784, simple_loss=0.09918, pruned_loss=0.01859, audio_tagging_loss=0.009665, over 3039154.56 frames. ], batch size: 56, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:32:51,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1210026.6666666667, ans=0.125 2023-11-20 20:33:12,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1210160.0, ans=0.2 2023-11-20 20:33:15,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1210160.0, ans=0.125 2023-11-20 20:33:18,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1210160.0, ans=0.0 2023-11-20 20:33:20,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1210160.0, ans=0.125 2023-11-20 20:33:29,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1210226.6666666667, ans=0.125 2023-11-20 20:33:34,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1210226.6666666667, ans=0.1 2023-11-20 20:33:35,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1210226.6666666667, ans=0.0 2023-11-20 20:33:37,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1210293.3333333333, ans=0.125 2023-11-20 20:33:42,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181550 2023-11-20 20:33:46,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1210293.3333333333, ans=0.0 2023-11-20 20:33:49,342 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1200, loss[loss=0.09813, simple_loss=0.1221, pruned_loss=0.02872, audio_tagging_loss=0.008374, over 14730.00 frames. ], tot_loss[loss=0.07831, simple_loss=0.09999, pruned_loss=0.01868, audio_tagging_loss=0.009633, over 3042120.87 frames. ], batch size: 54, lr: 4.36e-03, grad_scale: 32.0 2023-11-20 20:33:51,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1210360.0, ans=0.125 2023-11-20 20:33:56,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1210360.0, ans=0.0 2023-11-20 20:33:57,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.318e+01 9.034e+01 9.990e+01 1.243e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 20:33:58,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2023-11-20 20:33:59,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1210360.0, ans=0.0 2023-11-20 20:34:01,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-20 20:34:27,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1210560.0, ans=0.1 2023-11-20 20:34:27,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-11-20 20:34:45,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181600 2023-11-20 20:34:48,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1210626.6666666667, ans=0.0 2023-11-20 20:34:54,259 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1250, loss[loss=0.07391, simple_loss=0.08775, pruned_loss=0.01818, audio_tagging_loss=0.01186, over 14699.00 frames. ], tot_loss[loss=0.07755, simple_loss=0.09897, pruned_loss=0.01837, audio_tagging_loss=0.009691, over 3040816.01 frames. ], batch size: 56, lr: 4.36e-03, grad_scale: 32.0 2023-11-20 20:34:54,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1210693.3333333333, ans=0.0 2023-11-20 20:35:05,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1210760.0, ans=0.125 2023-11-20 20:35:12,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-11-20 20:35:21,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=22.5 2023-11-20 20:35:23,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.25 vs. limit=10.0 2023-11-20 20:35:27,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1210826.6666666667, ans=0.0 2023-11-20 20:35:37,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1210893.3333333333, ans=0.1 2023-11-20 20:35:45,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1210960.0, ans=0.125 2023-11-20 20:35:45,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2023-11-20 20:35:49,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1210960.0, ans=0.0 2023-11-20 20:35:50,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181650 2023-11-20 20:35:53,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1210960.0, ans=0.0 2023-11-20 20:35:56,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1211026.6666666667, ans=0.125 2023-11-20 20:35:57,731 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1300, loss[loss=0.05359, simple_loss=0.05714, pruned_loss=0.01349, audio_tagging_loss=0.01154, over 14930.00 frames. ], tot_loss[loss=0.07723, simple_loss=0.09841, pruned_loss=0.01833, audio_tagging_loss=0.009697, over 3034851.00 frames. ], batch size: 58, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:36:06,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.079e+01 9.042e+01 9.644e+01 1.533e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 20:36:07,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1211026.6666666667, ans=0.2 2023-11-20 20:36:34,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211226.6666666667, ans=0.1 2023-11-20 20:36:44,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1211226.6666666667, ans=0.0 2023-11-20 20:36:47,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1211293.3333333333, ans=0.125 2023-11-20 20:36:49,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1211293.3333333333, ans=0.125 2023-11-20 20:36:53,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181700 2023-11-20 20:36:58,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1211293.3333333333, ans=0.2 2023-11-20 20:37:00,270 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1350, loss[loss=0.08925, simple_loss=0.1244, pruned_loss=0.02194, audio_tagging_loss=0.005085, over 15321.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.09903, pruned_loss=0.01843, audio_tagging_loss=0.009638, over 3036757.33 frames. ], batch size: 57, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:37:05,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1211360.0, ans=0.0 2023-11-20 20:37:46,297 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 20:37:47,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-20 20:37:55,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1211626.6666666667, ans=0.125 2023-11-20 20:37:56,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181750 2023-11-20 20:38:04,310 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1400, loss[loss=0.1069, simple_loss=0.1389, pruned_loss=0.02946, audio_tagging_loss=0.008026, over 15455.00 frames. ], tot_loss[loss=0.07714, simple_loss=0.09827, pruned_loss=0.01823, audio_tagging_loss=0.009776, over 3038717.71 frames. ], batch size: 56, lr: 4.36e-03, grad_scale: 16.0 2023-11-20 20:38:11,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1211693.3333333333, ans=0.015 2023-11-20 20:38:13,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.529e+01 8.254e+01 8.824e+01 9.596e+01 1.357e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 20:38:24,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211760.0, ans=0.1 2023-11-20 20:38:25,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1211760.0, ans=0.125 2023-11-20 20:38:38,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1211826.6666666667, ans=0.125 2023-11-20 20:38:39,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-20 20:38:46,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1211893.3333333333, ans=0.125 2023-11-20 20:38:56,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-20 20:39:01,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181800 2023-11-20 20:39:04,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1211960.0, ans=0.125 2023-11-20 20:39:06,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-11-20 20:39:08,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1212026.6666666667, ans=0.0 2023-11-20 20:39:09,392 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1450, loss[loss=0.08364, simple_loss=0.1052, pruned_loss=0.02048, audio_tagging_loss=0.01059, over 15042.00 frames. ], tot_loss[loss=0.07695, simple_loss=0.0977, pruned_loss=0.01822, audio_tagging_loss=0.009885, over 3042967.06 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 16.0 2023-11-20 20:39:27,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-11-20 20:39:34,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1212160.0, ans=0.125 2023-11-20 20:39:45,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1212160.0, ans=0.0 2023-11-20 20:39:54,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1212226.6666666667, ans=0.0 2023-11-20 20:40:01,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-20 20:40:05,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181850 2023-11-20 20:40:06,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1212293.3333333333, ans=0.0 2023-11-20 20:40:13,041 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1500, loss[loss=0.08753, simple_loss=0.1204, pruned_loss=0.01821, audio_tagging_loss=0.009111, over 15397.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09854, pruned_loss=0.01863, audio_tagging_loss=0.009987, over 3030814.32 frames. ], batch size: 55, lr: 4.35e-03, grad_scale: 16.0 2023-11-20 20:40:22,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.058e+01 8.717e+01 9.373e+01 1.477e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 20:40:40,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1212493.3333333333, ans=0.035 2023-11-20 20:40:42,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2023-11-20 20:40:49,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1212493.3333333333, ans=10.0 2023-11-20 20:41:08,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181900 2023-11-20 20:41:16,545 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1550, loss[loss=0.06931, simple_loss=0.08807, pruned_loss=0.01573, audio_tagging_loss=0.009552, over 15025.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.09923, pruned_loss=0.01885, audio_tagging_loss=0.009999, over 3035918.31 frames. ], batch size: 57, lr: 4.35e-03, grad_scale: 16.0 2023-11-20 20:41:19,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-20 20:41:38,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1212760.0, ans=0.125 2023-11-20 20:42:13,185 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 181950 2023-11-20 20:42:20,299 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1600, loss[loss=0.0661, simple_loss=0.08066, pruned_loss=0.01464, audio_tagging_loss=0.01113, over 14825.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09853, pruned_loss=0.01872, audio_tagging_loss=0.01005, over 3037273.04 frames. ], batch size: 58, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:42:30,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.255e+01 8.909e+01 9.457e+01 1.581e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 20:42:43,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1213093.3333333333, ans=0.1 2023-11-20 20:42:44,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1213093.3333333333, ans=0.125 2023-11-20 20:43:12,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1213293.3333333333, ans=0.1 2023-11-20 20:43:17,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182000 2023-11-20 20:43:24,944 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1650, loss[loss=0.06575, simple_loss=0.07729, pruned_loss=0.01519, audio_tagging_loss=0.01192, over 14693.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09823, pruned_loss=0.01879, audio_tagging_loss=0.01013, over 3039737.96 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:43:28,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1213360.0, ans=0.0 2023-11-20 20:43:47,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1213426.6666666667, ans=0.1 2023-11-20 20:44:13,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1213560.0, ans=0.125 2023-11-20 20:44:16,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1213626.6666666667, ans=0.125 2023-11-20 20:44:18,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1213626.6666666667, ans=0.0 2023-11-20 20:44:20,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182050 2023-11-20 20:44:28,445 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1700, loss[loss=0.08176, simple_loss=0.1042, pruned_loss=0.02056, audio_tagging_loss=0.009112, over 15567.00 frames. ], tot_loss[loss=0.07813, simple_loss=0.09878, pruned_loss=0.01873, audio_tagging_loss=0.01001, over 3053301.96 frames. ], batch size: 60, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:44:31,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1213693.3333333333, ans=0.125 2023-11-20 20:44:36,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.092e+01 8.985e+01 9.523e+01 1.287e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 20:44:42,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1213760.0, ans=0.125 2023-11-20 20:44:52,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1213826.6666666667, ans=0.0 2023-11-20 20:45:23,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=22.5 2023-11-20 20:45:23,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182100 2023-11-20 20:45:25,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1213960.0, ans=15.0 2023-11-20 20:45:30,314 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:45:31,262 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1750, loss[loss=0.08866, simple_loss=0.1098, pruned_loss=0.02297, audio_tagging_loss=0.01078, over 15680.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09748, pruned_loss=0.01835, audio_tagging_loss=0.009965, over 3056482.99 frames. ], batch size: 58, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:45:39,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1214026.6666666667, ans=0.2 2023-11-20 20:45:55,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1214160.0, ans=0.5 2023-11-20 20:45:59,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1214160.0, ans=0.1 2023-11-20 20:46:02,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1214160.0, ans=0.125 2023-11-20 20:46:07,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1214160.0, ans=0.0 2023-11-20 20:46:19,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1214226.6666666667, ans=0.2 2023-11-20 20:46:27,731 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182150 2023-11-20 20:46:28,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-11-20 20:46:35,489 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1800, loss[loss=0.06327, simple_loss=0.08874, pruned_loss=0.01175, audio_tagging_loss=0.007148, over 14384.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.0983, pruned_loss=0.01852, audio_tagging_loss=0.009772, over 3055279.60 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:46:43,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.260e+01 8.761e+01 9.578e+01 1.185e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 20:46:57,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2023-11-20 20:47:03,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1214493.3333333333, ans=0.0 2023-11-20 20:47:31,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182200 2023-11-20 20:47:40,217 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1850, loss[loss=0.07616, simple_loss=0.08942, pruned_loss=0.01951, audio_tagging_loss=0.01194, over 15821.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.09754, pruned_loss=0.01831, audio_tagging_loss=0.009783, over 3046623.96 frames. ], batch size: 61, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:47:43,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1214693.3333333333, ans=0.125 2023-11-20 20:47:59,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2023-11-20 20:48:05,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1214826.6666666667, ans=0.1 2023-11-20 20:48:11,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1214826.6666666667, ans=0.0 2023-11-20 20:48:18,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1214893.3333333333, ans=0.0 2023-11-20 20:48:37,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182250 2023-11-20 20:48:44,177 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1900, loss[loss=0.1022, simple_loss=0.1363, pruned_loss=0.0273, audio_tagging_loss=0.006816, over 15295.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09852, pruned_loss=0.01844, audio_tagging_loss=0.009634, over 3051923.38 frames. ], batch size: 57, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:48:53,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 7.917e+01 8.580e+01 9.433e+01 1.165e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-20 20:49:01,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1215093.3333333333, ans=0.125 2023-11-20 20:49:05,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1215093.3333333333, ans=0.0 2023-11-20 20:49:10,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1215160.0, ans=0.05 2023-11-20 20:49:30,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2023-11-20 20:49:40,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182300 2023-11-20 20:49:40,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1215293.3333333333, ans=0.125 2023-11-20 20:49:47,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1215360.0, ans=0.125 2023-11-20 20:49:48,454 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 1950, loss[loss=0.07034, simple_loss=0.09385, pruned_loss=0.01439, audio_tagging_loss=0.00903, over 14079.00 frames. ], tot_loss[loss=0.07643, simple_loss=0.09709, pruned_loss=0.01815, audio_tagging_loss=0.009732, over 3050484.04 frames. ], batch size: 53, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:50:08,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1215426.6666666667, ans=0.0 2023-11-20 20:50:45,666 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182350 2023-11-20 20:50:49,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1215626.6666666667, ans=0.0 2023-11-20 20:50:53,434 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2000, loss[loss=0.08175, simple_loss=0.09631, pruned_loss=0.02109, audio_tagging_loss=0.01251, over 14642.00 frames. ], tot_loss[loss=0.07604, simple_loss=0.09664, pruned_loss=0.01787, audio_tagging_loss=0.009847, over 3044057.73 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:50:55,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-20 20:51:01,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1215693.3333333333, ans=0.125 2023-11-20 20:51:02,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 7.938e+01 8.733e+01 9.578e+01 1.409e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 20:51:07,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1215760.0, ans=0.0 2023-11-20 20:51:07,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1215760.0, ans=0.0 2023-11-20 20:51:22,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1215826.6666666667, ans=0.125 2023-11-20 20:51:30,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1215893.3333333333, ans=0.125 2023-11-20 20:51:49,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1215960.0, ans=0.05 2023-11-20 20:51:49,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1215960.0, ans=0.0 2023-11-20 20:51:50,566 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182400 2023-11-20 20:51:55,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1215960.0, ans=0.125 2023-11-20 20:51:58,156 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2050, loss[loss=0.1057, simple_loss=0.141, pruned_loss=0.02796, audio_tagging_loss=0.007212, over 15847.00 frames. ], tot_loss[loss=0.07671, simple_loss=0.09761, pruned_loss=0.01804, audio_tagging_loss=0.009863, over 3042633.04 frames. ], batch size: 55, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:51:59,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1216026.6666666667, ans=0.125 2023-11-20 20:52:00,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1216026.6666666667, ans=0.2 2023-11-20 20:52:04,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:52:23,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216160.0, ans=0.1 2023-11-20 20:52:32,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1216160.0, ans=0.0 2023-11-20 20:52:44,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1216226.6666666667, ans=0.125 2023-11-20 20:52:54,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182450 2023-11-20 20:52:56,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1216293.3333333333, ans=0.125 2023-11-20 20:53:02,331 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2100, loss[loss=0.09441, simple_loss=0.1154, pruned_loss=0.02693, audio_tagging_loss=0.009796, over 14881.00 frames. ], tot_loss[loss=0.07677, simple_loss=0.09776, pruned_loss=0.01803, audio_tagging_loss=0.009856, over 3048860.12 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:53:11,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.190e+01 8.920e+01 9.750e+01 1.833e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-20 20:53:29,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2023-11-20 20:53:37,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1216493.3333333333, ans=0.2 2023-11-20 20:53:44,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1216560.0, ans=0.1 2023-11-20 20:53:53,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1216626.6666666667, ans=0.2 2023-11-20 20:53:55,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=22.5 2023-11-20 20:53:58,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182500 2023-11-20 20:54:00,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1216626.6666666667, ans=0.0 2023-11-20 20:54:06,760 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2150, loss[loss=0.07146, simple_loss=0.09806, pruned_loss=0.01574, audio_tagging_loss=0.006689, over 14784.00 frames. ], tot_loss[loss=0.07702, simple_loss=0.09817, pruned_loss=0.01809, audio_tagging_loss=0.009837, over 3051911.47 frames. ], batch size: 55, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:54:26,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216760.0, ans=0.1 2023-11-20 20:54:35,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-20 20:54:36,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1216826.6666666667, ans=0.125 2023-11-20 20:54:43,809 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 20:54:56,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1216893.3333333333, ans=0.125 2023-11-20 20:55:03,995 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182550 2023-11-20 20:55:03,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1216960.0, ans=0.125 2023-11-20 20:55:04,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1216960.0, ans=0.0 2023-11-20 20:55:11,331 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2200, loss[loss=0.07895, simple_loss=0.1067, pruned_loss=0.01545, audio_tagging_loss=0.01015, over 15856.00 frames. ], tot_loss[loss=0.07742, simple_loss=0.0988, pruned_loss=0.01819, audio_tagging_loss=0.009823, over 3054064.63 frames. ], batch size: 60, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:55:19,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.514e+01 9.084e+01 9.823e+01 1.256e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 20:55:21,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1217026.6666666667, ans=0.125 2023-11-20 20:55:25,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1217093.3333333333, ans=0.0 2023-11-20 20:55:29,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1217093.3333333333, ans=0.125 2023-11-20 20:55:59,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217226.6666666667, ans=0.1 2023-11-20 20:56:07,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182600 2023-11-20 20:56:16,431 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2250, loss[loss=0.05712, simple_loss=0.07022, pruned_loss=0.01267, audio_tagging_loss=0.009346, over 15260.00 frames. ], tot_loss[loss=0.07792, simple_loss=0.09928, pruned_loss=0.0184, audio_tagging_loss=0.009882, over 3047017.18 frames. ], batch size: 60, lr: 4.35e-03, grad_scale: 32.0 2023-11-20 20:56:17,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-20 20:56:17,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1217360.0, ans=0.125 2023-11-20 20:56:19,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1217360.0, ans=0.1 2023-11-20 20:56:31,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1217426.6666666667, ans=0.0 2023-11-20 20:57:11,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182650 2023-11-20 20:57:16,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1217626.6666666667, ans=0.2 2023-11-20 20:57:19,147 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2300, loss[loss=0.08249, simple_loss=0.1024, pruned_loss=0.01964, audio_tagging_loss=0.01167, over 13757.00 frames. ], tot_loss[loss=0.07778, simple_loss=0.09922, pruned_loss=0.01831, audio_tagging_loss=0.009863, over 3046886.74 frames. ], batch size: 54, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 20:57:29,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.084e+01 8.941e+01 9.693e+01 1.110e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 20:57:34,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1217760.0, ans=0.04949747468305833 2023-11-20 20:58:06,113 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 20:58:11,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1217960.0, ans=0.5 2023-11-20 20:58:14,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1217960.0, ans=0.0 2023-11-20 20:58:15,660 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 20:58:15,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1217960.0, ans=0.125 2023-11-20 20:58:16,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1217960.0, ans=15.0 2023-11-20 20:58:16,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182700 2023-11-20 20:58:22,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1217960.0, ans=0.2 2023-11-20 20:58:24,701 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2350, loss[loss=0.06319, simple_loss=0.07717, pruned_loss=0.01213, audio_tagging_loss=0.01248, over 16424.00 frames. ], tot_loss[loss=0.07818, simple_loss=0.09983, pruned_loss=0.01842, audio_tagging_loss=0.009841, over 3045783.71 frames. ], batch size: 63, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 20:58:31,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1218026.6666666667, ans=0.125 2023-11-20 20:58:52,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1218160.0, ans=0.0 2023-11-20 20:59:20,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-20 20:59:21,109 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182750 2023-11-20 20:59:28,319 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2400, loss[loss=0.09335, simple_loss=0.1166, pruned_loss=0.02456, audio_tagging_loss=0.0105, over 14767.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.1003, pruned_loss=0.01866, audio_tagging_loss=0.009839, over 3045774.58 frames. ], batch size: 57, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 20:59:38,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.066e+01 8.791e+01 9.748e+01 1.239e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 20:59:51,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1218426.6666666667, ans=0.2 2023-11-20 20:59:56,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2023-11-20 20:59:58,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1218493.3333333333, ans=0.05 2023-11-20 21:00:07,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1218560.0, ans=0.05 2023-11-20 21:00:14,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1218560.0, ans=0.025 2023-11-20 21:00:25,001 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182800 2023-11-20 21:00:26,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1218626.6666666667, ans=0.125 2023-11-20 21:00:32,669 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2450, loss[loss=0.0573, simple_loss=0.06239, pruned_loss=0.01297, audio_tagging_loss=0.01313, over 15144.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.1002, pruned_loss=0.0186, audio_tagging_loss=0.009924, over 3041272.90 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:00:44,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-11-20 21:01:02,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1218826.6666666667, ans=0.0 2023-11-20 21:01:09,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1218826.6666666667, ans=0.0 2023-11-20 21:01:27,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1218960.0, ans=0.0 2023-11-20 21:01:30,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182850 2023-11-20 21:01:37,753 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2500, loss[loss=0.04459, simple_loss=0.0571, pruned_loss=0.006848, audio_tagging_loss=0.009194, over 14720.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.09953, pruned_loss=0.01827, audio_tagging_loss=0.009864, over 3048392.29 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:01:48,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 7.891e+01 8.728e+01 9.490e+01 1.260e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 21:02:04,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1219160.0, ans=0.0 2023-11-20 21:02:16,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1219226.6666666667, ans=0.125 2023-11-20 21:02:34,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-20 21:02:35,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182900 2023-11-20 21:02:38,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1219293.3333333333, ans=0.0 2023-11-20 21:02:42,992 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2550, loss[loss=0.07832, simple_loss=0.1008, pruned_loss=0.01529, audio_tagging_loss=0.01265, over 15940.00 frames. ], tot_loss[loss=0.07791, simple_loss=0.09948, pruned_loss=0.01829, audio_tagging_loss=0.009877, over 3047272.04 frames. ], batch size: 59, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:02:49,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1219360.0, ans=0.5 2023-11-20 21:02:53,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1219360.0, ans=0.125 2023-11-20 21:02:54,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1219426.6666666667, ans=0.0 2023-11-20 21:03:30,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1219560.0, ans=0.04949747468305833 2023-11-20 21:03:31,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-20 21:03:39,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1219626.6666666667, ans=0.125 2023-11-20 21:03:40,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 182950 2023-11-20 21:03:47,486 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2600, loss[loss=0.07706, simple_loss=0.1088, pruned_loss=0.0157, audio_tagging_loss=0.006971, over 15169.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09905, pruned_loss=0.01827, audio_tagging_loss=0.009771, over 3043899.24 frames. ], batch size: 56, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:03:47,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1219693.3333333333, ans=0.125 2023-11-20 21:03:57,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.668e+01 8.265e+01 9.445e+01 1.045e+02 1.287e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-20 21:04:05,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1219760.0, ans=0.2 2023-11-20 21:04:07,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1219760.0, ans=0.05 2023-11-20 21:04:16,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-11-20 21:04:22,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1219826.6666666667, ans=0.1 2023-11-20 21:04:43,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183000 2023-11-20 21:04:52,359 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2650, loss[loss=0.06576, simple_loss=0.08014, pruned_loss=0.01421, audio_tagging_loss=0.01148, over 15729.00 frames. ], tot_loss[loss=0.07722, simple_loss=0.0985, pruned_loss=0.01814, audio_tagging_loss=0.009838, over 3047127.44 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:05:19,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1220160.0, ans=0.2 2023-11-20 21:05:22,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1220160.0, ans=0.2 2023-11-20 21:05:48,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183050 2023-11-20 21:05:51,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1220293.3333333333, ans=10.0 2023-11-20 21:05:56,273 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2700, loss[loss=0.06532, simple_loss=0.08068, pruned_loss=0.01502, audio_tagging_loss=0.009957, over 15210.00 frames. ], tot_loss[loss=0.07718, simple_loss=0.09822, pruned_loss=0.01814, audio_tagging_loss=0.009932, over 3049260.62 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:06:06,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.355e+01 9.034e+01 9.882e+01 1.379e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 21:06:50,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=8.0 2023-11-20 21:06:52,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183100 2023-11-20 21:06:59,904 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2750, loss[loss=0.09293, simple_loss=0.1176, pruned_loss=0.02353, audio_tagging_loss=0.01061, over 15372.00 frames. ], tot_loss[loss=0.07711, simple_loss=0.09798, pruned_loss=0.01825, audio_tagging_loss=0.009866, over 3050806.46 frames. ], batch size: 57, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:07:04,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-20 21:07:07,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1220693.3333333333, ans=0.2 2023-11-20 21:07:18,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1220760.0, ans=0.0 2023-11-20 21:07:23,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1220760.0, ans=0.5 2023-11-20 21:07:31,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1220826.6666666667, ans=0.0 2023-11-20 21:07:52,666 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:07:54,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1220960.0, ans=0.0 2023-11-20 21:07:55,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-20 21:07:56,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183150 2023-11-20 21:08:02,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=12.0 2023-11-20 21:08:04,593 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2800, loss[loss=0.07674, simple_loss=0.09331, pruned_loss=0.01992, audio_tagging_loss=0.01016, over 15980.00 frames. ], tot_loss[loss=0.07639, simple_loss=0.09677, pruned_loss=0.01806, audio_tagging_loss=0.009948, over 3053152.75 frames. ], batch size: 60, lr: 4.34e-03, grad_scale: 32.0 2023-11-20 21:08:07,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1221026.6666666667, ans=0.1 2023-11-20 21:08:15,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.121e+01 8.798e+01 9.627e+01 1.849e+02, threshold=1.760e+02, percent-clipped=1.0 2023-11-20 21:08:15,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221093.3333333333, ans=0.1 2023-11-20 21:08:44,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1221226.6666666667, ans=0.125 2023-11-20 21:08:56,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1221293.3333333333, ans=0.1 2023-11-20 21:09:01,383 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183200 2023-11-20 21:09:01,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1221293.3333333333, ans=0.125 2023-11-20 21:09:02,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1221293.3333333333, ans=0.125 2023-11-20 21:09:08,976 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2850, loss[loss=0.0954, simple_loss=0.1025, pruned_loss=0.03051, audio_tagging_loss=0.01365, over 15007.00 frames. ], tot_loss[loss=0.0762, simple_loss=0.09657, pruned_loss=0.01802, audio_tagging_loss=0.009885, over 3049040.55 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:09:14,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1221360.0, ans=0.125 2023-11-20 21:09:24,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-20 21:09:31,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1221426.6666666667, ans=0.0 2023-11-20 21:09:42,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1221493.3333333333, ans=0.125 2023-11-20 21:09:43,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221493.3333333333, ans=0.1 2023-11-20 21:09:48,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221560.0, ans=0.1 2023-11-20 21:09:58,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.69 vs. limit=22.5 2023-11-20 21:10:05,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183250 2023-11-20 21:10:13,744 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2900, loss[loss=0.1001, simple_loss=0.1288, pruned_loss=0.0275, audio_tagging_loss=0.008158, over 15872.00 frames. ], tot_loss[loss=0.07657, simple_loss=0.09713, pruned_loss=0.01811, audio_tagging_loss=0.009895, over 3042203.51 frames. ], batch size: 58, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:10:22,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1221693.3333333333, ans=0.0 2023-11-20 21:10:26,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 7.821e+01 8.647e+01 9.509e+01 1.495e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 21:10:42,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1221826.6666666667, ans=0.5 2023-11-20 21:10:55,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-20 21:11:09,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183300 2023-11-20 21:11:17,190 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 2950, loss[loss=0.07407, simple_loss=0.09894, pruned_loss=0.01823, audio_tagging_loss=0.006365, over 15186.00 frames. ], tot_loss[loss=0.07685, simple_loss=0.09768, pruned_loss=0.01816, audio_tagging_loss=0.009845, over 3042672.35 frames. ], batch size: 56, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:11:26,582 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:12:11,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1222293.3333333333, ans=0.0 2023-11-20 21:12:13,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183350 2023-11-20 21:12:20,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.31 vs. limit=10.0 2023-11-20 21:12:21,090 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3000, loss[loss=0.1132, simple_loss=0.1402, pruned_loss=0.0324, audio_tagging_loss=0.0107, over 15727.00 frames. ], tot_loss[loss=0.07752, simple_loss=0.09842, pruned_loss=0.01842, audio_tagging_loss=0.009889, over 3035402.78 frames. ], batch size: 59, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:12:21,091 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 21:12:59,439 INFO [train_asr.py:1253] (1/4) Epoch 16, validation: loss=0.06057, simple_loss=0.053, pruned_loss=0.005481, audio_tagging_loss=0.02859, over 4681554.00 frames. 2023-11-20 21:12:59,440 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 21:13:03,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-20 21:13:12,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.173e+01 8.695e+01 9.591e+01 1.296e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 21:13:26,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1222493.3333333333, ans=0.125 2023-11-20 21:13:39,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1222560.0, ans=0.0 2023-11-20 21:13:55,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183400 2023-11-20 21:14:04,217 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3050, loss[loss=0.05099, simple_loss=0.0612, pruned_loss=0.008464, audio_tagging_loss=0.01193, over 16735.00 frames. ], tot_loss[loss=0.07804, simple_loss=0.09907, pruned_loss=0.01861, audio_tagging_loss=0.009894, over 3043539.18 frames. ], batch size: 65, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:14:08,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-20 21:14:26,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1222760.0, ans=0.1 2023-11-20 21:14:39,848 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:15:00,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183450 2023-11-20 21:15:02,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-11-20 21:15:07,766 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3100, loss[loss=0.07955, simple_loss=0.09862, pruned_loss=0.01925, audio_tagging_loss=0.011, over 17033.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.1004, pruned_loss=0.01874, audio_tagging_loss=0.00983, over 3055218.04 frames. ], batch size: 63, lr: 4.34e-03, grad_scale: 16.0 2023-11-20 21:15:20,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.627e+01 8.321e+01 8.878e+01 9.736e+01 1.416e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 21:15:34,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1223160.0, ans=0.0 2023-11-20 21:15:42,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1223160.0, ans=0.125 2023-11-20 21:16:03,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183500 2023-11-20 21:16:10,925 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:16:11,724 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3150, loss[loss=0.06107, simple_loss=0.0678, pruned_loss=0.01536, audio_tagging_loss=0.01181, over 13667.00 frames. ], tot_loss[loss=0.07917, simple_loss=0.1009, pruned_loss=0.01883, audio_tagging_loss=0.009912, over 3060801.73 frames. ], batch size: 53, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:16:13,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1223360.0, ans=0.125 2023-11-20 21:16:15,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1223360.0, ans=0.125 2023-11-20 21:16:19,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1223360.0, ans=0.125 2023-11-20 21:16:19,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1223360.0, ans=0.125 2023-11-20 21:16:30,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1223426.6666666667, ans=0.125 2023-11-20 21:16:36,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223493.3333333333, ans=0.1 2023-11-20 21:16:46,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1223493.3333333333, ans=0.5 2023-11-20 21:16:50,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1223560.0, ans=0.125 2023-11-20 21:17:05,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-20 21:17:07,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183550 2023-11-20 21:17:10,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2023-11-20 21:17:16,820 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3200, loss[loss=0.05737, simple_loss=0.08256, pruned_loss=0.007218, audio_tagging_loss=0.008874, over 15718.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.1002, pruned_loss=0.0185, audio_tagging_loss=0.01004, over 3061323.16 frames. ], batch size: 59, lr: 4.33e-03, grad_scale: 32.0 2023-11-20 21:17:25,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1223693.3333333333, ans=0.05 2023-11-20 21:17:28,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.173e+01 8.835e+01 9.713e+01 1.308e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 21:17:29,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-11-20 21:17:32,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1223760.0, ans=0.5 2023-11-20 21:17:53,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1223893.3333333333, ans=0.0 2023-11-20 21:17:57,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-20 21:18:10,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-20 21:18:12,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183600 2023-11-20 21:18:20,296 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3250, loss[loss=0.07918, simple_loss=0.09976, pruned_loss=0.02049, audio_tagging_loss=0.00881, over 15149.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09944, pruned_loss=0.0184, audio_tagging_loss=0.01006, over 3059148.75 frames. ], batch size: 55, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:18:27,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-11-20 21:18:40,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1224093.3333333333, ans=0.2 2023-11-20 21:18:48,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1224160.0, ans=0.1 2023-11-20 21:18:53,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1224160.0, ans=0.0 2023-11-20 21:19:16,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183650 2023-11-20 21:19:24,567 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3300, loss[loss=0.0794, simple_loss=0.1075, pruned_loss=0.01681, audio_tagging_loss=0.008834, over 16034.00 frames. ], tot_loss[loss=0.07813, simple_loss=0.09903, pruned_loss=0.01846, audio_tagging_loss=0.01015, over 3054593.27 frames. ], batch size: 59, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:19:28,754 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:19:35,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1224426.6666666667, ans=0.125 2023-11-20 21:19:36,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-20 21:19:38,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.036e+01 8.648e+01 9.640e+01 1.721e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 21:20:04,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1224560.0, ans=0.1 2023-11-20 21:20:13,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2023-11-20 21:20:20,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183700 2023-11-20 21:20:27,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1224693.3333333333, ans=0.125 2023-11-20 21:20:28,150 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3350, loss[loss=0.06875, simple_loss=0.07548, pruned_loss=0.0184, audio_tagging_loss=0.01261, over 14767.00 frames. ], tot_loss[loss=0.07818, simple_loss=0.09901, pruned_loss=0.01859, audio_tagging_loss=0.01009, over 3057871.72 frames. ], batch size: 57, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:20:51,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1224760.0, ans=0.125 2023-11-20 21:21:05,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-20 21:21:05,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-20 21:21:22,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1224960.0, ans=0.1 2023-11-20 21:21:26,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183750 2023-11-20 21:21:26,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1224960.0, ans=0.125 2023-11-20 21:21:33,803 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3400, loss[loss=0.07621, simple_loss=0.09898, pruned_loss=0.01808, audio_tagging_loss=0.008639, over 15565.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.09929, pruned_loss=0.01851, audio_tagging_loss=0.009918, over 3053954.96 frames. ], batch size: 57, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:21:47,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.162e+01 8.769e+01 9.534e+01 1.233e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 21:21:54,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1225093.3333333333, ans=0.1 2023-11-20 21:21:57,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1225160.0, ans=0.2 2023-11-20 21:22:05,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-20 21:22:11,231 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.015e-01 2023-11-20 21:22:14,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1225226.6666666667, ans=0.0 2023-11-20 21:22:22,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1225226.6666666667, ans=0.125 2023-11-20 21:22:29,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183800 2023-11-20 21:22:31,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1225293.3333333333, ans=0.125 2023-11-20 21:22:38,211 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3450, loss[loss=0.08343, simple_loss=0.1136, pruned_loss=0.01838, audio_tagging_loss=0.008268, over 14862.00 frames. ], tot_loss[loss=0.07728, simple_loss=0.0983, pruned_loss=0.01834, audio_tagging_loss=0.009789, over 3054846.34 frames. ], batch size: 54, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:22:50,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1225426.6666666667, ans=0.1 2023-11-20 21:23:14,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1225493.3333333333, ans=0.0 2023-11-20 21:23:16,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1225560.0, ans=0.125 2023-11-20 21:23:22,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1225560.0, ans=0.125 2023-11-20 21:23:32,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1225626.6666666667, ans=0.2 2023-11-20 21:23:34,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183850 2023-11-20 21:23:41,689 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3500, loss[loss=0.06324, simple_loss=0.07695, pruned_loss=0.01522, audio_tagging_loss=0.00954, over 15843.00 frames. ], tot_loss[loss=0.07704, simple_loss=0.09806, pruned_loss=0.01827, audio_tagging_loss=0.009736, over 3051585.89 frames. ], batch size: 61, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:23:57,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.353e+01 9.150e+01 1.014e+02 1.535e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 21:23:57,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-20 21:24:08,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1225826.6666666667, ans=0.125 2023-11-20 21:24:08,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1225826.6666666667, ans=0.1 2023-11-20 21:24:11,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1225826.6666666667, ans=0.1 2023-11-20 21:24:13,354 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:24:19,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1225826.6666666667, ans=0.125 2023-11-20 21:24:38,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:24:39,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183900 2023-11-20 21:24:47,808 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3550, loss[loss=0.06974, simple_loss=0.09158, pruned_loss=0.01604, audio_tagging_loss=0.007909, over 15289.00 frames. ], tot_loss[loss=0.07705, simple_loss=0.09828, pruned_loss=0.01825, audio_tagging_loss=0.00966, over 3053535.69 frames. ], batch size: 55, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:24:54,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1226026.6666666667, ans=0.125 2023-11-20 21:24:54,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1226026.6666666667, ans=0.2 2023-11-20 21:25:30,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1226226.6666666667, ans=0.0 2023-11-20 21:25:44,371 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 183950 2023-11-20 21:25:51,798 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3600, loss[loss=0.06607, simple_loss=0.07972, pruned_loss=0.01782, audio_tagging_loss=0.0084, over 14707.00 frames. ], tot_loss[loss=0.07703, simple_loss=0.09843, pruned_loss=0.01815, audio_tagging_loss=0.009665, over 3050633.40 frames. ], batch size: 54, lr: 4.33e-03, grad_scale: 32.0 2023-11-20 21:26:05,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.147e+01 8.022e+01 8.880e+01 9.578e+01 1.162e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 21:26:23,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2023-11-20 21:26:28,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1226493.3333333333, ans=0.025 2023-11-20 21:26:29,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-20 21:26:48,805 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184000 2023-11-20 21:27:00,013 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3650, loss[loss=0.09149, simple_loss=0.1204, pruned_loss=0.0191, audio_tagging_loss=0.01219, over 14552.00 frames. ], tot_loss[loss=0.07722, simple_loss=0.0988, pruned_loss=0.01817, audio_tagging_loss=0.009654, over 3052023.04 frames. ], batch size: 55, lr: 4.33e-03, grad_scale: 32.0 2023-11-20 21:27:15,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1226760.0, ans=0.125 2023-11-20 21:27:25,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1226826.6666666667, ans=0.5 2023-11-20 21:27:38,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-20 21:27:40,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-20 21:27:43,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1226893.3333333333, ans=0.125 2023-11-20 21:27:55,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-20 21:27:57,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184050 2023-11-20 21:28:05,148 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3700, loss[loss=0.09008, simple_loss=0.1172, pruned_loss=0.0224, audio_tagging_loss=0.00909, over 15916.00 frames. ], tot_loss[loss=0.07748, simple_loss=0.09884, pruned_loss=0.01834, audio_tagging_loss=0.009724, over 3054606.64 frames. ], batch size: 58, lr: 4.33e-03, grad_scale: 32.0 2023-11-20 21:28:09,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1227026.6666666667, ans=0.0 2023-11-20 21:28:16,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1227026.6666666667, ans=0.125 2023-11-20 21:28:19,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.009e+01 8.576e+01 9.316e+01 1.305e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 21:28:23,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:28:51,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-20 21:29:02,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184100 2023-11-20 21:29:10,250 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3750, loss[loss=0.08934, simple_loss=0.1079, pruned_loss=0.02441, audio_tagging_loss=0.01097, over 15653.00 frames. ], tot_loss[loss=0.07851, simple_loss=0.1004, pruned_loss=0.01863, audio_tagging_loss=0.009693, over 3062751.39 frames. ], batch size: 59, lr: 4.33e-03, grad_scale: 32.0 2023-11-20 21:29:41,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1227493.3333333333, ans=0.0 2023-11-20 21:29:50,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1227560.0, ans=0.125 2023-11-20 21:29:52,876 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:30:06,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184150 2023-11-20 21:30:13,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-20 21:30:13,949 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3800, loss[loss=0.08108, simple_loss=0.1161, pruned_loss=0.01711, audio_tagging_loss=0.005918, over 15952.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1014, pruned_loss=0.01885, audio_tagging_loss=0.009652, over 3067127.63 frames. ], batch size: 58, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:30:26,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=8.0 2023-11-20 21:30:29,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.281e+01 9.046e+01 9.670e+01 1.407e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 21:30:47,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2023-11-20 21:31:08,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1227960.0, ans=0.0 2023-11-20 21:31:11,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184200 2023-11-20 21:31:14,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1227960.0, ans=0.0 2023-11-20 21:31:15,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1227960.0, ans=0.125 2023-11-20 21:31:18,618 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3850, loss[loss=0.06463, simple_loss=0.08307, pruned_loss=0.01268, audio_tagging_loss=0.01041, over 14328.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.1007, pruned_loss=0.01848, audio_tagging_loss=0.009735, over 3065199.33 frames. ], batch size: 56, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:31:42,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1228093.3333333333, ans=0.0 2023-11-20 21:31:43,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1228160.0, ans=0.1 2023-11-20 21:31:55,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1228226.6666666667, ans=0.2 2023-11-20 21:31:58,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1228226.6666666667, ans=0.125 2023-11-20 21:32:09,344 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:32:12,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1228293.3333333333, ans=0.0 2023-11-20 21:32:14,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2023-11-20 21:32:15,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184250 2023-11-20 21:32:23,165 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3900, loss[loss=0.07109, simple_loss=0.0856, pruned_loss=0.01766, audio_tagging_loss=0.01063, over 15873.00 frames. ], tot_loss[loss=0.07767, simple_loss=0.09934, pruned_loss=0.01816, audio_tagging_loss=0.009832, over 3054515.87 frames. ], batch size: 63, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:32:38,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.176e+01 8.890e+01 9.815e+01 1.146e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 21:32:39,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1228426.6666666667, ans=0.125 2023-11-20 21:32:50,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1228493.3333333333, ans=0.1 2023-11-20 21:32:53,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1228493.3333333333, ans=0.0 2023-11-20 21:33:20,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184300 2023-11-20 21:33:27,626 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 3950, loss[loss=0.08096, simple_loss=0.1049, pruned_loss=0.01868, audio_tagging_loss=0.00981, over 14872.00 frames. ], tot_loss[loss=0.0777, simple_loss=0.09907, pruned_loss=0.01823, audio_tagging_loss=0.009934, over 3045771.59 frames. ], batch size: 55, lr: 4.33e-03, grad_scale: 16.0 2023-11-20 21:33:29,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1228693.3333333333, ans=0.0 2023-11-20 21:33:32,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1228693.3333333333, ans=0.125 2023-11-20 21:33:39,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1228760.0, ans=0.0 2023-11-20 21:33:51,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1228760.0, ans=0.125 2023-11-20 21:33:54,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-20 21:34:09,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1228893.3333333333, ans=0.2 2023-11-20 21:34:11,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1228893.3333333333, ans=0.125 2023-11-20 21:34:11,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1228893.3333333333, ans=0.125 2023-11-20 21:34:19,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1228960.0, ans=0.0 2023-11-20 21:34:24,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184350 2023-11-20 21:34:27,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1228960.0, ans=0.125 2023-11-20 21:34:32,586 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4000, loss[loss=0.07774, simple_loss=0.09484, pruned_loss=0.02023, audio_tagging_loss=0.01009, over 15289.00 frames. ], tot_loss[loss=0.07845, simple_loss=0.09987, pruned_loss=0.01857, audio_tagging_loss=0.009947, over 3044961.73 frames. ], batch size: 57, lr: 4.32e-03, grad_scale: 32.0 2023-11-20 21:34:40,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1229026.6666666667, ans=0.1 2023-11-20 21:34:47,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.600e+01 8.174e+01 8.766e+01 9.644e+01 1.258e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 21:35:17,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1229226.6666666667, ans=0.09899494936611666 2023-11-20 21:35:27,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1229293.3333333333, ans=0.125 2023-11-20 21:35:28,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184400 2023-11-20 21:35:36,143 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4050, loss[loss=0.08074, simple_loss=0.1079, pruned_loss=0.01912, audio_tagging_loss=0.007682, over 15067.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09978, pruned_loss=0.01855, audio_tagging_loss=0.009897, over 3045144.01 frames. ], batch size: 56, lr: 4.32e-03, grad_scale: 32.0 2023-11-20 21:35:37,459 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:35:42,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1229360.0, ans=0.0 2023-11-20 21:35:45,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1229360.0, ans=0.07 2023-11-20 21:35:51,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1229426.6666666667, ans=0.125 2023-11-20 21:36:10,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1229493.3333333333, ans=0.0 2023-11-20 21:36:22,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1229560.0, ans=0.0 2023-11-20 21:36:28,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1229626.6666666667, ans=0.125 2023-11-20 21:36:32,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184450 2023-11-20 21:36:40,459 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4100, loss[loss=0.08281, simple_loss=0.1007, pruned_loss=0.01867, audio_tagging_loss=0.01379, over 15650.00 frames. ], tot_loss[loss=0.07781, simple_loss=0.09899, pruned_loss=0.01831, audio_tagging_loss=0.01001, over 3043965.23 frames. ], batch size: 61, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:36:41,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1229693.3333333333, ans=0.125 2023-11-20 21:36:57,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.177e+01 8.908e+01 9.687e+01 1.316e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 21:37:08,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1229826.6666666667, ans=0.0 2023-11-20 21:37:11,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1229826.6666666667, ans=0.0 2023-11-20 21:37:15,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1229826.6666666667, ans=0.07 2023-11-20 21:37:16,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1229826.6666666667, ans=0.2 2023-11-20 21:37:36,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184500 2023-11-20 21:37:44,743 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4150, loss[loss=0.09822, simple_loss=0.1271, pruned_loss=0.02818, audio_tagging_loss=0.006488, over 14938.00 frames. ], tot_loss[loss=0.07821, simple_loss=0.09961, pruned_loss=0.01858, audio_tagging_loss=0.009823, over 3046297.43 frames. ], batch size: 56, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:37:47,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1230026.6666666667, ans=0.2 2023-11-20 21:37:49,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1230026.6666666667, ans=0.125 2023-11-20 21:37:50,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1230026.6666666667, ans=0.125 2023-11-20 21:37:52,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-20 21:37:54,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1230026.6666666667, ans=0.1 2023-11-20 21:37:55,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1230026.6666666667, ans=0.125 2023-11-20 21:38:00,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1230093.3333333333, ans=0.2 2023-11-20 21:38:03,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1230093.3333333333, ans=0.125 2023-11-20 21:38:29,423 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:38:36,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1230293.3333333333, ans=0.0 2023-11-20 21:38:36,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1230293.3333333333, ans=0.0 2023-11-20 21:38:42,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184550 2023-11-20 21:38:42,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2023-11-20 21:38:48,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1230360.0, ans=0.125 2023-11-20 21:38:49,331 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4200, loss[loss=0.06327, simple_loss=0.07869, pruned_loss=0.01367, audio_tagging_loss=0.01026, over 15677.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09889, pruned_loss=0.01835, audio_tagging_loss=0.009761, over 3046896.80 frames. ], batch size: 58, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:38:56,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2023-11-20 21:39:07,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.428e+01 8.014e+01 8.553e+01 9.363e+01 1.203e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-20 21:39:19,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-20 21:39:31,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-20 21:39:43,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1230626.6666666667, ans=0.125 2023-11-20 21:39:45,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184600 2023-11-20 21:39:53,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-11-20 21:39:53,576 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4250, loss[loss=0.07731, simple_loss=0.09724, pruned_loss=0.01629, audio_tagging_loss=0.0124, over 15106.00 frames. ], tot_loss[loss=0.07821, simple_loss=0.1002, pruned_loss=0.01851, audio_tagging_loss=0.009617, over 3053482.10 frames. ], batch size: 56, lr: 4.32e-03, grad_scale: 8.0 2023-11-20 21:39:53,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1230693.3333333333, ans=0.1 2023-11-20 21:39:53,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1230693.3333333333, ans=0.0 2023-11-20 21:40:04,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1230693.3333333333, ans=0.0 2023-11-20 21:40:18,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1230826.6666666667, ans=0.0 2023-11-20 21:40:50,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184650 2023-11-20 21:40:57,891 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4300, loss[loss=0.06833, simple_loss=0.08429, pruned_loss=0.01456, audio_tagging_loss=0.01163, over 16093.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.1005, pruned_loss=0.01855, audio_tagging_loss=0.009575, over 3057636.69 frames. ], batch size: 61, lr: 4.32e-03, grad_scale: 8.0 2023-11-20 21:40:58,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1231026.6666666667, ans=0.2 2023-11-20 21:41:15,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.115e+01 8.273e+01 8.996e+01 9.641e+01 1.140e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 21:41:27,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1231160.0, ans=0.2 2023-11-20 21:41:54,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184700 2023-11-20 21:42:01,378 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4350, loss[loss=0.06235, simple_loss=0.08902, pruned_loss=0.01174, audio_tagging_loss=0.006096, over 15412.00 frames. ], tot_loss[loss=0.07872, simple_loss=0.1008, pruned_loss=0.01882, audio_tagging_loss=0.009489, over 3050636.09 frames. ], batch size: 59, lr: 4.32e-03, grad_scale: 8.0 2023-11-20 21:42:01,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231360.0, ans=0.1 2023-11-20 21:42:04,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1231360.0, ans=0.0 2023-11-20 21:42:29,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1231493.3333333333, ans=0.125 2023-11-20 21:42:46,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1231560.0, ans=0.125 2023-11-20 21:42:57,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184750 2023-11-20 21:43:05,113 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4400, loss[loss=0.09488, simple_loss=0.1233, pruned_loss=0.02437, audio_tagging_loss=0.008873, over 15157.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.1003, pruned_loss=0.01866, audio_tagging_loss=0.009456, over 3053273.40 frames. ], batch size: 57, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:43:05,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1231693.3333333333, ans=0.125 2023-11-20 21:43:15,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1231693.3333333333, ans=0.125 2023-11-20 21:43:23,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1231760.0, ans=10.0 2023-11-20 21:43:23,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.140e+01 8.871e+01 9.490e+01 1.181e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 21:43:24,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-20 21:43:31,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1231826.6666666667, ans=0.125 2023-11-20 21:43:31,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1231826.6666666667, ans=0.125 2023-11-20 21:43:41,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-20 21:43:49,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1231893.3333333333, ans=0.125 2023-11-20 21:43:59,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:44:00,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1231960.0, ans=0.125 2023-11-20 21:44:01,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184800 2023-11-20 21:44:09,925 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4450, loss[loss=0.07613, simple_loss=0.1049, pruned_loss=0.01424, audio_tagging_loss=0.009452, over 15487.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.1004, pruned_loss=0.01871, audio_tagging_loss=0.009459, over 3056936.85 frames. ], batch size: 57, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:44:14,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1232026.6666666667, ans=0.125 2023-11-20 21:44:50,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1232226.6666666667, ans=0.125 2023-11-20 21:44:59,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1232226.6666666667, ans=0.1 2023-11-20 21:45:07,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184850 2023-11-20 21:45:07,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1232293.3333333333, ans=0.125 2023-11-20 21:45:15,001 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4500, loss[loss=0.06904, simple_loss=0.0855, pruned_loss=0.01516, audio_tagging_loss=0.01112, over 16082.00 frames. ], tot_loss[loss=0.07771, simple_loss=0.09956, pruned_loss=0.01837, audio_tagging_loss=0.009551, over 3054259.32 frames. ], batch size: 60, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:45:16,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1232360.0, ans=0.125 2023-11-20 21:45:32,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.142e+01 8.902e+01 9.690e+01 1.395e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 21:45:44,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1232493.3333333333, ans=0.125 2023-11-20 21:46:03,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1232560.0, ans=0.0 2023-11-20 21:46:11,430 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184900 2023-11-20 21:46:18,660 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4550, loss[loss=0.06926, simple_loss=0.09136, pruned_loss=0.01422, audio_tagging_loss=0.009364, over 14244.00 frames. ], tot_loss[loss=0.07795, simple_loss=0.1001, pruned_loss=0.01843, audio_tagging_loss=0.009483, over 3052853.45 frames. ], batch size: 55, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:46:29,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1232693.3333333333, ans=0.125 2023-11-20 21:47:05,296 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 21:47:05,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1232893.3333333333, ans=0.125 2023-11-20 21:47:09,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1232960.0, ans=6.0 2023-11-20 21:47:10,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1232960.0, ans=0.0 2023-11-20 21:47:15,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 184950 2023-11-20 21:47:22,846 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4600, loss[loss=0.08517, simple_loss=0.113, pruned_loss=0.01998, audio_tagging_loss=0.008683, over 14754.00 frames. ], tot_loss[loss=0.07752, simple_loss=0.09904, pruned_loss=0.01828, audio_tagging_loss=0.009715, over 3049249.96 frames. ], batch size: 57, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:47:26,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1233026.6666666667, ans=0.125 2023-11-20 21:47:40,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 7.895e+01 8.619e+01 9.471e+01 1.250e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 21:47:47,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1233160.0, ans=0.125 2023-11-20 21:47:52,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1233160.0, ans=0.09899494936611666 2023-11-20 21:47:54,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1233160.0, ans=0.0 2023-11-20 21:48:09,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2023-11-20 21:48:19,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185000 2023-11-20 21:48:25,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1233293.3333333333, ans=0.0 2023-11-20 21:48:27,597 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4650, loss[loss=0.06999, simple_loss=0.07621, pruned_loss=0.01861, audio_tagging_loss=0.01328, over 15107.00 frames. ], tot_loss[loss=0.07831, simple_loss=0.09991, pruned_loss=0.01862, audio_tagging_loss=0.009737, over 3054364.30 frames. ], batch size: 59, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:48:36,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1233360.0, ans=0.125 2023-11-20 21:48:45,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2023-11-20 21:49:01,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-20 21:49:04,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1233560.0, ans=0.0 2023-11-20 21:49:23,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185050 2023-11-20 21:49:30,676 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4700, loss[loss=0.08178, simple_loss=0.1052, pruned_loss=0.01909, audio_tagging_loss=0.0101, over 14382.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.09976, pruned_loss=0.01861, audio_tagging_loss=0.009881, over 3060480.46 frames. ], batch size: 54, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:49:43,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1233760.0, ans=0.0 2023-11-20 21:49:48,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.027e+01 8.804e+01 9.916e+01 1.555e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 21:50:02,459 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:50:02,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1233826.6666666667, ans=0.1 2023-11-20 21:50:15,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1233893.3333333333, ans=0.2 2023-11-20 21:50:28,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185100 2023-11-20 21:50:35,791 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4750, loss[loss=0.05836, simple_loss=0.06421, pruned_loss=0.01187, audio_tagging_loss=0.01439, over 15558.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09941, pruned_loss=0.01872, audio_tagging_loss=0.009967, over 3058297.85 frames. ], batch size: 61, lr: 4.32e-03, grad_scale: 16.0 2023-11-20 21:50:39,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1234026.6666666667, ans=0.0 2023-11-20 21:50:52,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-20 21:51:09,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1234160.0, ans=0.0 2023-11-20 21:51:28,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1234293.3333333333, ans=0.125 2023-11-20 21:51:31,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1234293.3333333333, ans=0.2 2023-11-20 21:51:32,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185150 2023-11-20 21:51:39,954 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4800, loss[loss=0.06904, simple_loss=0.08408, pruned_loss=0.01752, audio_tagging_loss=0.009475, over 15785.00 frames. ], tot_loss[loss=0.07858, simple_loss=0.09966, pruned_loss=0.01876, audio_tagging_loss=0.009988, over 3063143.55 frames. ], batch size: 59, lr: 4.32e-03, grad_scale: 32.0 2023-11-20 21:51:42,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1234360.0, ans=0.1 2023-11-20 21:51:43,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1234360.0, ans=0.125 2023-11-20 21:51:58,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.531e+01 9.078e+01 1.027e+02 1.282e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 21:52:14,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1234493.3333333333, ans=0.1 2023-11-20 21:52:15,462 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.138e-02 2023-11-20 21:52:17,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2023-11-20 21:52:17,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-20 21:52:35,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185200 2023-11-20 21:52:43,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1234693.3333333333, ans=0.1 2023-11-20 21:52:44,125 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4850, loss[loss=0.07378, simple_loss=0.09793, pruned_loss=0.01473, audio_tagging_loss=0.01009, over 17642.00 frames. ], tot_loss[loss=0.07795, simple_loss=0.09891, pruned_loss=0.01854, audio_tagging_loss=0.009957, over 3052668.73 frames. ], batch size: 66, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 21:52:49,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1234693.3333333333, ans=0.1 2023-11-20 21:52:59,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1234760.0, ans=0.1 2023-11-20 21:53:07,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1234760.0, ans=0.125 2023-11-20 21:53:09,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1234826.6666666667, ans=0.09899494936611666 2023-11-20 21:53:20,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1234826.6666666667, ans=0.07 2023-11-20 21:53:34,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1234960.0, ans=0.09899494936611666 2023-11-20 21:53:40,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185250 2023-11-20 21:53:40,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1234960.0, ans=0.125 2023-11-20 21:53:47,771 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4900, loss[loss=0.07624, simple_loss=0.09687, pruned_loss=0.01693, audio_tagging_loss=0.01088, over 14859.00 frames. ], tot_loss[loss=0.07765, simple_loss=0.09866, pruned_loss=0.01847, audio_tagging_loss=0.009856, over 3053594.89 frames. ], batch size: 57, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 21:53:54,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2023-11-20 21:54:06,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.089e+01 8.842e+01 9.821e+01 1.319e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 21:54:09,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1235093.3333333333, ans=15.0 2023-11-20 21:54:10,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1235093.3333333333, ans=0.2 2023-11-20 21:54:33,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1235226.6666666667, ans=0.0 2023-11-20 21:54:35,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1235226.6666666667, ans=0.1 2023-11-20 21:54:40,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1235293.3333333333, ans=0.125 2023-11-20 21:54:43,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185300 2023-11-20 21:54:44,713 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 21:54:52,109 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 4950, loss[loss=0.06287, simple_loss=0.07762, pruned_loss=0.01545, audio_tagging_loss=0.008615, over 15222.00 frames. ], tot_loss[loss=0.07764, simple_loss=0.0988, pruned_loss=0.01849, audio_tagging_loss=0.009747, over 3050353.68 frames. ], batch size: 59, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 21:55:00,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1235360.0, ans=0.1 2023-11-20 21:55:04,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1235426.6666666667, ans=0.125 2023-11-20 21:55:30,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-20 21:55:48,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185350 2023-11-20 21:55:55,703 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5000, loss[loss=0.07135, simple_loss=0.09384, pruned_loss=0.01551, audio_tagging_loss=0.008916, over 15032.00 frames. ], tot_loss[loss=0.07741, simple_loss=0.0989, pruned_loss=0.01843, audio_tagging_loss=0.009527, over 3047230.54 frames. ], batch size: 57, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 21:56:02,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1235693.3333333333, ans=0.125 2023-11-20 21:56:02,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1235693.3333333333, ans=0.125 2023-11-20 21:56:08,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1235760.0, ans=0.125 2023-11-20 21:56:16,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 7.784e+01 8.478e+01 9.107e+01 1.087e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-20 21:56:28,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1235826.6666666667, ans=0.0 2023-11-20 21:56:32,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-11-20 21:56:45,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1235893.3333333333, ans=0.125 2023-11-20 21:56:52,776 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185400 2023-11-20 21:56:57,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1235960.0, ans=0.125 2023-11-20 21:57:01,012 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5050, loss[loss=0.09093, simple_loss=0.1225, pruned_loss=0.02445, audio_tagging_loss=0.005217, over 15315.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09749, pruned_loss=0.01799, audio_tagging_loss=0.009621, over 3050329.94 frames. ], batch size: 57, lr: 4.31e-03, grad_scale: 8.0 2023-11-20 21:57:57,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185450 2023-11-20 21:58:05,600 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5100, loss[loss=0.07288, simple_loss=0.09338, pruned_loss=0.01767, audio_tagging_loss=0.008518, over 15319.00 frames. ], tot_loss[loss=0.07667, simple_loss=0.09806, pruned_loss=0.01805, audio_tagging_loss=0.009591, over 3049731.17 frames. ], batch size: 61, lr: 4.31e-03, grad_scale: 8.0 2023-11-20 21:58:23,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1236426.6666666667, ans=0.125 2023-11-20 21:58:25,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.354e+01 9.032e+01 1.014e+02 3.240e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-20 21:58:56,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-20 21:59:02,167 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185500 2023-11-20 21:59:04,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1236626.6666666667, ans=0.125 2023-11-20 21:59:09,300 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5150, loss[loss=0.1031, simple_loss=0.1339, pruned_loss=0.0283, audio_tagging_loss=0.007858, over 15652.00 frames. ], tot_loss[loss=0.07701, simple_loss=0.09853, pruned_loss=0.01816, audio_tagging_loss=0.009576, over 3046134.30 frames. ], batch size: 55, lr: 4.31e-03, grad_scale: 8.0 2023-11-20 21:59:19,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1236693.3333333333, ans=0.125 2023-11-20 21:59:36,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1236826.6666666667, ans=0.0 2023-11-20 21:59:47,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1236893.3333333333, ans=0.2 2023-11-20 21:59:47,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1236893.3333333333, ans=0.0 2023-11-20 21:59:57,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1236893.3333333333, ans=0.125 2023-11-20 21:59:57,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1236893.3333333333, ans=0.125 2023-11-20 22:00:03,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1236960.0, ans=0.125 2023-11-20 22:00:05,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185550 2023-11-20 22:00:06,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1236960.0, ans=0.1 2023-11-20 22:00:14,336 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5200, loss[loss=0.05228, simple_loss=0.06607, pruned_loss=0.008971, audio_tagging_loss=0.01027, over 14465.00 frames. ], tot_loss[loss=0.07717, simple_loss=0.09846, pruned_loss=0.01829, audio_tagging_loss=0.009645, over 3044308.48 frames. ], batch size: 55, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:00:21,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1237026.6666666667, ans=0.1 2023-11-20 22:00:29,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-20 22:00:31,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1237093.3333333333, ans=0.125 2023-11-20 22:00:34,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.252e+01 8.979e+01 9.707e+01 1.443e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 22:01:10,311 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185600 2023-11-20 22:01:10,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-20 22:01:15,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2023-11-20 22:01:18,591 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5250, loss[loss=0.09181, simple_loss=0.1243, pruned_loss=0.02466, audio_tagging_loss=0.005028, over 14354.00 frames. ], tot_loss[loss=0.07755, simple_loss=0.09915, pruned_loss=0.01846, audio_tagging_loss=0.009507, over 3043627.18 frames. ], batch size: 55, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:01:24,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-20 22:01:30,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1237426.6666666667, ans=0.0 2023-11-20 22:01:37,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=8.0 2023-11-20 22:02:10,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-20 22:02:15,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185650 2023-11-20 22:02:22,647 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5300, loss[loss=0.1064, simple_loss=0.1345, pruned_loss=0.03019, audio_tagging_loss=0.00893, over 15179.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09977, pruned_loss=0.01892, audio_tagging_loss=0.009585, over 3045010.50 frames. ], batch size: 54, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:02:26,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-11-20 22:02:29,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-20 22:02:42,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.578e+01 8.164e+01 8.663e+01 9.355e+01 1.569e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 22:02:58,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1237826.6666666667, ans=0.1 2023-11-20 22:03:18,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185700 2023-11-20 22:03:26,269 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5350, loss[loss=0.06856, simple_loss=0.08948, pruned_loss=0.01546, audio_tagging_loss=0.008366, over 13541.00 frames. ], tot_loss[loss=0.07746, simple_loss=0.09833, pruned_loss=0.01856, audio_tagging_loss=0.009733, over 3035973.81 frames. ], batch size: 53, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:03:31,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1238026.6666666667, ans=0.0 2023-11-20 22:03:58,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1238160.0, ans=0.125 2023-11-20 22:03:59,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1238160.0, ans=0.2 2023-11-20 22:04:14,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2023-11-20 22:04:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1238226.6666666667, ans=0.125 2023-11-20 22:04:22,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185750 2023-11-20 22:04:26,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1238293.3333333333, ans=0.0 2023-11-20 22:04:31,029 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5400, loss[loss=0.082, simple_loss=0.1006, pruned_loss=0.02116, audio_tagging_loss=0.01056, over 16752.00 frames. ], tot_loss[loss=0.07719, simple_loss=0.09821, pruned_loss=0.01824, audio_tagging_loss=0.009845, over 3048282.77 frames. ], batch size: 63, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:04:33,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1238360.0, ans=0.0 2023-11-20 22:04:37,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1238360.0, ans=0.0 2023-11-20 22:04:51,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 7.937e+01 8.475e+01 9.191e+01 1.261e+02, threshold=1.695e+02, percent-clipped=0.0 2023-11-20 22:04:56,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1238493.3333333333, ans=0.1 2023-11-20 22:04:57,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1238493.3333333333, ans=0.125 2023-11-20 22:05:14,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:05:28,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185800 2023-11-20 22:05:35,790 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5450, loss[loss=0.06746, simple_loss=0.08809, pruned_loss=0.01217, audio_tagging_loss=0.01125, over 14915.00 frames. ], tot_loss[loss=0.07703, simple_loss=0.09803, pruned_loss=0.01809, audio_tagging_loss=0.009932, over 3046394.94 frames. ], batch size: 56, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:05:42,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1238693.3333333333, ans=0.1 2023-11-20 22:05:48,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1238760.0, ans=0.125 2023-11-20 22:05:51,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-11-20 22:05:59,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1238826.6666666667, ans=0.1 2023-11-20 22:06:07,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-20 22:06:23,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1238893.3333333333, ans=0.0 2023-11-20 22:06:31,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185850 2023-11-20 22:06:38,728 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5500, loss[loss=0.0687, simple_loss=0.08643, pruned_loss=0.01627, audio_tagging_loss=0.009214, over 15079.00 frames. ], tot_loss[loss=0.07728, simple_loss=0.09812, pruned_loss=0.01826, audio_tagging_loss=0.009957, over 3039405.53 frames. ], batch size: 58, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:06:43,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-11-20 22:06:49,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1239026.6666666667, ans=0.125 2023-11-20 22:07:00,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.468e+01 9.055e+01 9.983e+01 1.429e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-20 22:07:07,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2023-11-20 22:07:11,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-20 22:07:20,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239226.6666666667, ans=0.1 2023-11-20 22:07:28,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1239226.6666666667, ans=0.125 2023-11-20 22:07:35,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185900 2023-11-20 22:07:40,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1239293.3333333333, ans=0.2 2023-11-20 22:07:42,616 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5550, loss[loss=0.07042, simple_loss=0.085, pruned_loss=0.01495, audio_tagging_loss=0.01297, over 15908.00 frames. ], tot_loss[loss=0.0773, simple_loss=0.09818, pruned_loss=0.01816, audio_tagging_loss=0.01005, over 3045742.63 frames. ], batch size: 62, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:08:04,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1239426.6666666667, ans=0.125 2023-11-20 22:08:06,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1239426.6666666667, ans=0.1 2023-11-20 22:08:07,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.78 vs. limit=10.0 2023-11-20 22:08:08,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1239493.3333333333, ans=0.125 2023-11-20 22:08:09,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1239493.3333333333, ans=0.125 2023-11-20 22:08:16,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1239493.3333333333, ans=0.2 2023-11-20 22:08:19,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1239493.3333333333, ans=0.1 2023-11-20 22:08:40,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 185950 2023-11-20 22:08:48,138 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5600, loss[loss=0.08377, simple_loss=0.1024, pruned_loss=0.02351, audio_tagging_loss=0.009055, over 14519.00 frames. ], tot_loss[loss=0.07754, simple_loss=0.09861, pruned_loss=0.01818, audio_tagging_loss=0.01006, over 3046309.48 frames. ], batch size: 54, lr: 4.31e-03, grad_scale: 32.0 2023-11-20 22:08:51,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1239693.3333333333, ans=0.0 2023-11-20 22:08:51,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-20 22:09:08,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 7.915e+01 8.506e+01 9.381e+01 1.103e+02, threshold=1.701e+02, percent-clipped=0.0 2023-11-20 22:09:31,238 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 22:09:43,290 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186000 2023-11-20 22:09:50,695 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5650, loss[loss=0.09498, simple_loss=0.1213, pruned_loss=0.02507, audio_tagging_loss=0.009247, over 15318.00 frames. ], tot_loss[loss=0.07738, simple_loss=0.09851, pruned_loss=0.01809, audio_tagging_loss=0.01004, over 3044706.17 frames. ], batch size: 58, lr: 4.31e-03, grad_scale: 32.0 2023-11-20 22:10:12,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1240093.3333333333, ans=0.125 2023-11-20 22:10:17,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-20 22:10:33,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1240226.6666666667, ans=0.0 2023-11-20 22:10:47,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186050 2023-11-20 22:10:47,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1240293.3333333333, ans=0.04949747468305833 2023-11-20 22:10:54,510 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5700, loss[loss=0.07317, simple_loss=0.09205, pruned_loss=0.01634, audio_tagging_loss=0.01081, over 15807.00 frames. ], tot_loss[loss=0.07713, simple_loss=0.09833, pruned_loss=0.01794, audio_tagging_loss=0.01003, over 3048348.91 frames. ], batch size: 60, lr: 4.31e-03, grad_scale: 16.0 2023-11-20 22:10:58,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1240360.0, ans=0.125 2023-11-20 22:11:01,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1240360.0, ans=0.0 2023-11-20 22:11:16,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.611e+01 8.091e+01 8.750e+01 9.672e+01 1.332e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 22:11:26,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1240493.3333333333, ans=0.0 2023-11-20 22:11:41,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1240560.0, ans=22.5 2023-11-20 22:11:51,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186100 2023-11-20 22:11:51,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-20 22:11:54,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1240626.6666666667, ans=0.2 2023-11-20 22:11:54,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1240626.6666666667, ans=0.125 2023-11-20 22:11:59,005 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5750, loss[loss=0.09583, simple_loss=0.1253, pruned_loss=0.02477, audio_tagging_loss=0.00839, over 15698.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09843, pruned_loss=0.01805, audio_tagging_loss=0.009879, over 3043620.16 frames. ], batch size: 56, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:12:01,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1240693.3333333333, ans=0.0 2023-11-20 22:12:02,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240693.3333333333, ans=0.1 2023-11-20 22:12:08,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-20 22:12:34,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240826.6666666667, ans=0.1 2023-11-20 22:12:38,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1240893.3333333333, ans=0.125 2023-11-20 22:12:42,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240893.3333333333, ans=0.1 2023-11-20 22:12:55,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186150 2023-11-20 22:12:58,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-20 22:13:02,395 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5800, loss[loss=0.08113, simple_loss=0.1022, pruned_loss=0.02282, audio_tagging_loss=0.007192, over 15230.00 frames. ], tot_loss[loss=0.07719, simple_loss=0.09841, pruned_loss=0.01819, audio_tagging_loss=0.009795, over 3039722.99 frames. ], batch size: 56, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:13:16,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1241093.3333333333, ans=0.125 2023-11-20 22:13:23,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.344e+01 8.039e+01 8.914e+01 9.519e+01 1.369e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 22:13:31,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1241160.0, ans=0.0 2023-11-20 22:13:33,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1241160.0, ans=0.0 2023-11-20 22:13:33,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1241160.0, ans=0.0 2023-11-20 22:13:40,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1241226.6666666667, ans=0.125 2023-11-20 22:13:51,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1241226.6666666667, ans=0.125 2023-11-20 22:13:57,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1241293.3333333333, ans=0.125 2023-11-20 22:13:58,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186200 2023-11-20 22:14:06,334 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5850, loss[loss=0.1029, simple_loss=0.1382, pruned_loss=0.02535, audio_tagging_loss=0.008412, over 14501.00 frames. ], tot_loss[loss=0.07818, simple_loss=0.09977, pruned_loss=0.01859, audio_tagging_loss=0.0097, over 3035200.92 frames. ], batch size: 52, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:14:11,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1241360.0, ans=0.125 2023-11-20 22:14:16,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1241360.0, ans=0.125 2023-11-20 22:14:26,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1241426.6666666667, ans=0.2 2023-11-20 22:14:31,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1241493.3333333333, ans=0.0 2023-11-20 22:14:32,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1241493.3333333333, ans=0.2 2023-11-20 22:14:45,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-20 22:15:02,755 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186250 2023-11-20 22:15:11,170 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5900, loss[loss=0.08053, simple_loss=0.1079, pruned_loss=0.01977, audio_tagging_loss=0.006783, over 14822.00 frames. ], tot_loss[loss=0.07844, simple_loss=0.1004, pruned_loss=0.01866, audio_tagging_loss=0.009584, over 3041802.51 frames. ], batch size: 54, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:15:11,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1241693.3333333333, ans=0.1 2023-11-20 22:15:32,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.032e+01 8.653e+01 9.726e+01 1.327e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 22:15:32,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1241760.0, ans=0.125 2023-11-20 22:15:38,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-20 22:15:42,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1241826.6666666667, ans=0.125 2023-11-20 22:15:59,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1241893.3333333333, ans=0.0 2023-11-20 22:15:59,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1241893.3333333333, ans=0.2 2023-11-20 22:16:06,731 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186300 2023-11-20 22:16:14,539 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 5950, loss[loss=0.06227, simple_loss=0.07986, pruned_loss=0.01286, audio_tagging_loss=0.009484, over 16317.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.09993, pruned_loss=0.0185, audio_tagging_loss=0.009587, over 3043922.51 frames. ], batch size: 62, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:16:18,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1242026.6666666667, ans=0.0 2023-11-20 22:16:35,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1242093.3333333333, ans=0.125 2023-11-20 22:17:06,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-20 22:17:10,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1242293.3333333333, ans=0.025 2023-11-20 22:17:11,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186350 2023-11-20 22:17:18,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1242360.0, ans=0.125 2023-11-20 22:17:19,222 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6000, loss[loss=0.09101, simple_loss=0.1097, pruned_loss=0.02885, audio_tagging_loss=0.007327, over 14507.00 frames. ], tot_loss[loss=0.07786, simple_loss=0.0995, pruned_loss=0.0184, audio_tagging_loss=0.00971, over 3042471.74 frames. ], batch size: 55, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:17:19,223 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 22:17:48,354 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9862, 5.9005, 5.7273, 5.5606], device='cuda:1') 2023-11-20 22:17:57,725 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8457, 4.9588, 4.9932, 4.9448], device='cuda:1') 2023-11-20 22:18:03,608 INFO [train_asr.py:1253] (1/4) Epoch 16, validation: loss=0.06177, simple_loss=0.05296, pruned_loss=0.005445, audio_tagging_loss=0.02985, over 4681554.00 frames. 2023-11-20 22:18:03,609 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 22:18:25,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.385e+01 8.060e+01 8.625e+01 9.633e+01 1.979e+02, threshold=1.725e+02, percent-clipped=1.0 2023-11-20 22:18:47,832 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 22:18:57,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1242626.6666666667, ans=0.0 2023-11-20 22:18:59,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186400 2023-11-20 22:19:05,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1242626.6666666667, ans=0.125 2023-11-20 22:19:07,789 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6050, loss[loss=0.06884, simple_loss=0.08346, pruned_loss=0.01447, audio_tagging_loss=0.01264, over 15661.00 frames. ], tot_loss[loss=0.07779, simple_loss=0.09947, pruned_loss=0.01839, audio_tagging_loss=0.009668, over 3045445.57 frames. ], batch size: 58, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:19:10,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1242693.3333333333, ans=0.125 2023-11-20 22:19:11,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1242693.3333333333, ans=0.125 2023-11-20 22:19:15,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1242693.3333333333, ans=0.125 2023-11-20 22:19:18,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-11-20 22:19:39,309 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:19:50,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1242893.3333333333, ans=0.125 2023-11-20 22:20:04,997 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186450 2023-11-20 22:20:06,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1242960.0, ans=0.0 2023-11-20 22:20:12,053 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6100, loss[loss=0.07438, simple_loss=0.091, pruned_loss=0.01811, audio_tagging_loss=0.01077, over 14354.00 frames. ], tot_loss[loss=0.07776, simple_loss=0.09956, pruned_loss=0.01837, audio_tagging_loss=0.00961, over 3043789.38 frames. ], batch size: 55, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:20:14,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2023-11-20 22:20:16,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1243026.6666666667, ans=0.05 2023-11-20 22:20:28,142 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:20:34,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.632e+01 8.005e+01 8.728e+01 9.355e+01 1.254e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 22:20:43,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243160.0, ans=0.0 2023-11-20 22:20:43,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1243160.0, ans=0.09899494936611666 2023-11-20 22:20:57,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1243226.6666666667, ans=0.125 2023-11-20 22:21:08,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186500 2023-11-20 22:21:16,644 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6150, loss[loss=0.07268, simple_loss=0.1009, pruned_loss=0.01489, audio_tagging_loss=0.007373, over 15086.00 frames. ], tot_loss[loss=0.07699, simple_loss=0.09814, pruned_loss=0.01818, audio_tagging_loss=0.009747, over 3049473.76 frames. ], batch size: 57, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:21:17,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2023-11-20 22:21:21,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1243360.0, ans=0.125 2023-11-20 22:21:26,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-20 22:22:12,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186550 2023-11-20 22:22:20,051 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6200, loss[loss=0.08839, simple_loss=0.1088, pruned_loss=0.02333, audio_tagging_loss=0.01065, over 16446.00 frames. ], tot_loss[loss=0.07744, simple_loss=0.09851, pruned_loss=0.01844, audio_tagging_loss=0.009747, over 3052989.09 frames. ], batch size: 61, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:22:28,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1243693.3333333333, ans=0.125 2023-11-20 22:22:34,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-20 22:22:41,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-20 22:22:42,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.113e+01 8.838e+01 9.519e+01 1.384e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 22:22:50,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-20 22:22:57,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1243826.6666666667, ans=0.125 2023-11-20 22:23:17,841 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186600 2023-11-20 22:23:25,493 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6250, loss[loss=0.05863, simple_loss=0.07035, pruned_loss=0.007001, audio_tagging_loss=0.01646, over 15801.00 frames. ], tot_loss[loss=0.07741, simple_loss=0.09816, pruned_loss=0.01832, audio_tagging_loss=0.01, over 3053465.56 frames. ], batch size: 62, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:23:29,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1244026.6666666667, ans=0.2 2023-11-20 22:23:52,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1244160.0, ans=0.0 2023-11-20 22:23:53,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1244160.0, ans=0.125 2023-11-20 22:23:57,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1244160.0, ans=0.0 2023-11-20 22:24:09,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244226.6666666667, ans=0.1 2023-11-20 22:24:09,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244226.6666666667, ans=0.1 2023-11-20 22:24:14,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1244226.6666666667, ans=0.125 2023-11-20 22:24:21,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186650 2023-11-20 22:24:23,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1244293.3333333333, ans=0.125 2023-11-20 22:24:29,276 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6300, loss[loss=0.09585, simple_loss=0.126, pruned_loss=0.02419, audio_tagging_loss=0.008665, over 16483.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.0984, pruned_loss=0.01825, audio_tagging_loss=0.009998, over 3053032.81 frames. ], batch size: 58, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:24:42,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1244426.6666666667, ans=0.125 2023-11-20 22:24:51,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.323e+01 8.066e+01 8.583e+01 9.288e+01 1.245e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 22:25:06,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-20 22:25:15,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-20 22:25:22,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1244626.6666666667, ans=0.1 2023-11-20 22:25:25,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186700 2023-11-20 22:25:32,742 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6350, loss[loss=0.06748, simple_loss=0.08512, pruned_loss=0.01244, audio_tagging_loss=0.01248, over 16228.00 frames. ], tot_loss[loss=0.07713, simple_loss=0.09803, pruned_loss=0.01807, audio_tagging_loss=0.01005, over 3050439.91 frames. ], batch size: 62, lr: 4.30e-03, grad_scale: 16.0 2023-11-20 22:25:57,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244826.6666666667, ans=0.1 2023-11-20 22:26:03,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1244826.6666666667, ans=0.0 2023-11-20 22:26:06,666 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:26:07,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1244826.6666666667, ans=0.2 2023-11-20 22:26:15,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1244893.3333333333, ans=0.0 2023-11-20 22:26:20,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1244893.3333333333, ans=0.125 2023-11-20 22:26:26,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1244960.0, ans=0.0 2023-11-20 22:26:29,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186750 2023-11-20 22:26:37,566 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6400, loss[loss=0.08483, simple_loss=0.1064, pruned_loss=0.02089, audio_tagging_loss=0.01072, over 15282.00 frames. ], tot_loss[loss=0.0774, simple_loss=0.09834, pruned_loss=0.01818, audio_tagging_loss=0.01006, over 3046465.42 frames. ], batch size: 56, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:26:37,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1245026.6666666667, ans=0.0 2023-11-20 22:26:50,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1245093.3333333333, ans=0.125 2023-11-20 22:26:54,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-20 22:27:00,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.359e+01 8.103e+01 8.559e+01 9.205e+01 1.089e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 22:27:19,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1245226.6666666667, ans=0.025 2023-11-20 22:27:31,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1245293.3333333333, ans=0.125 2023-11-20 22:27:34,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186800 2023-11-20 22:27:41,527 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6450, loss[loss=0.08288, simple_loss=0.09817, pruned_loss=0.02592, audio_tagging_loss=0.007874, over 15769.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.09741, pruned_loss=0.01803, audio_tagging_loss=0.01013, over 3040880.67 frames. ], batch size: 59, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:28:00,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1245426.6666666667, ans=0.2 2023-11-20 22:28:24,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1245560.0, ans=0.1 2023-11-20 22:28:26,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-20 22:28:27,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.98 vs. limit=10.0 2023-11-20 22:28:34,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1245626.6666666667, ans=0.2 2023-11-20 22:28:39,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186850 2023-11-20 22:28:39,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1245626.6666666667, ans=0.0 2023-11-20 22:28:46,227 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6500, loss[loss=0.07501, simple_loss=0.09758, pruned_loss=0.01477, audio_tagging_loss=0.01145, over 15517.00 frames. ], tot_loss[loss=0.07648, simple_loss=0.09694, pruned_loss=0.01785, audio_tagging_loss=0.01016, over 3045275.76 frames. ], batch size: 60, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:29:09,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.106e+01 8.759e+01 9.505e+01 1.385e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 22:29:11,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1245826.6666666667, ans=0.125 2023-11-20 22:29:28,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-20 22:29:29,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1245893.3333333333, ans=0.05 2023-11-20 22:29:42,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186900 2023-11-20 22:29:50,869 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6550, loss[loss=0.07717, simple_loss=0.1056, pruned_loss=0.01365, audio_tagging_loss=0.01071, over 16152.00 frames. ], tot_loss[loss=0.07682, simple_loss=0.09777, pruned_loss=0.018, audio_tagging_loss=0.009938, over 3054716.11 frames. ], batch size: 60, lr: 4.30e-03, grad_scale: 32.0 2023-11-20 22:29:56,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-20 22:30:03,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-20 22:30:17,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1246160.0, ans=0.0 2023-11-20 22:30:18,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.44 vs. limit=22.5 2023-11-20 22:30:45,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1246293.3333333333, ans=0.125 2023-11-20 22:30:48,475 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 186950 2023-11-20 22:30:55,606 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6600, loss[loss=0.07325, simple_loss=0.09383, pruned_loss=0.01942, audio_tagging_loss=0.006908, over 14781.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09692, pruned_loss=0.01767, audio_tagging_loss=0.009762, over 3054420.08 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:31:18,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.367e+01 8.854e+01 9.653e+01 1.222e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 22:31:25,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-20 22:31:45,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1246626.6666666667, ans=0.0 2023-11-20 22:31:51,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1246626.6666666667, ans=0.1 2023-11-20 22:31:52,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187000 2023-11-20 22:32:00,540 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6650, loss[loss=0.04694, simple_loss=0.04784, pruned_loss=0.009431, audio_tagging_loss=0.01359, over 13956.00 frames. ], tot_loss[loss=0.07576, simple_loss=0.09674, pruned_loss=0.01765, audio_tagging_loss=0.009733, over 3054147.74 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:32:01,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1246693.3333333333, ans=0.125 2023-11-20 22:32:22,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1246760.0, ans=0.09899494936611666 2023-11-20 22:32:30,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-20 22:32:32,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-20 22:32:53,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1246960.0, ans=0.0 2023-11-20 22:32:56,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187050 2023-11-20 22:32:56,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1246960.0, ans=0.125 2023-11-20 22:32:59,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1246960.0, ans=0.05 2023-11-20 22:33:04,396 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6700, loss[loss=0.07021, simple_loss=0.08908, pruned_loss=0.01381, audio_tagging_loss=0.01186, over 15035.00 frames. ], tot_loss[loss=0.07541, simple_loss=0.09628, pruned_loss=0.01756, audio_tagging_loss=0.00971, over 3046321.07 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:33:07,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1247026.6666666667, ans=0.125 2023-11-20 22:33:16,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1247093.3333333333, ans=0.015 2023-11-20 22:33:27,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.055e+01 8.769e+01 9.493e+01 1.327e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 22:33:33,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1247160.0, ans=0.125 2023-11-20 22:33:33,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1247160.0, ans=0.1 2023-11-20 22:33:35,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1247160.0, ans=0.1 2023-11-20 22:34:00,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187100 2023-11-20 22:34:00,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1247293.3333333333, ans=0.125 2023-11-20 22:34:08,150 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6750, loss[loss=0.06863, simple_loss=0.08762, pruned_loss=0.01502, audio_tagging_loss=0.009803, over 15298.00 frames. ], tot_loss[loss=0.07586, simple_loss=0.09692, pruned_loss=0.01774, audio_tagging_loss=0.009661, over 3048005.09 frames. ], batch size: 58, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:34:16,856 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:34:54,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1247560.0, ans=0.125 2023-11-20 22:35:05,434 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187150 2023-11-20 22:35:13,206 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6800, loss[loss=0.06742, simple_loss=0.08549, pruned_loss=0.01452, audio_tagging_loss=0.01016, over 15239.00 frames. ], tot_loss[loss=0.07597, simple_loss=0.09705, pruned_loss=0.01783, audio_tagging_loss=0.009613, over 3048280.22 frames. ], batch size: 58, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:35:17,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-11-20 22:35:28,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2023-11-20 22:35:35,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.835e+01 8.011e+01 8.725e+01 9.590e+01 1.326e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 22:36:08,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-20 22:36:08,995 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187200 2023-11-20 22:36:09,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1247960.0, ans=0.125 2023-11-20 22:36:16,467 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6850, loss[loss=0.06605, simple_loss=0.0832, pruned_loss=0.01376, audio_tagging_loss=0.01069, over 14985.00 frames. ], tot_loss[loss=0.07639, simple_loss=0.09764, pruned_loss=0.01793, audio_tagging_loss=0.009638, over 3049794.84 frames. ], batch size: 57, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:36:19,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1248026.6666666667, ans=0.07 2023-11-20 22:36:44,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1248160.0, ans=0.0 2023-11-20 22:36:51,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1248160.0, ans=0.0 2023-11-20 22:37:06,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-20 22:37:12,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187250 2023-11-20 22:37:20,286 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6900, loss[loss=0.09809, simple_loss=0.121, pruned_loss=0.03015, audio_tagging_loss=0.007449, over 15565.00 frames. ], tot_loss[loss=0.07648, simple_loss=0.09765, pruned_loss=0.01794, audio_tagging_loss=0.009719, over 3047403.47 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:37:32,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1248426.6666666667, ans=0.0 2023-11-20 22:37:44,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.341e+01 8.921e+01 9.872e+01 1.364e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 22:37:49,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1248493.3333333333, ans=0.0 2023-11-20 22:37:54,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-11-20 22:38:06,839 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 22:38:14,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1248626.6666666667, ans=0.05 2023-11-20 22:38:16,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187300 2023-11-20 22:38:21,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-20 22:38:24,999 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 6950, loss[loss=0.1015, simple_loss=0.1419, pruned_loss=0.02281, audio_tagging_loss=0.007738, over 15803.00 frames. ], tot_loss[loss=0.07594, simple_loss=0.09685, pruned_loss=0.01767, audio_tagging_loss=0.009843, over 3049757.41 frames. ], batch size: 55, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:38:28,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1248693.3333333333, ans=0.1 2023-11-20 22:38:39,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1248760.0, ans=0.125 2023-11-20 22:38:43,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1248760.0, ans=0.125 2023-11-20 22:39:01,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1248893.3333333333, ans=0.0 2023-11-20 22:39:03,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1248893.3333333333, ans=0.125 2023-11-20 22:39:17,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1248960.0, ans=0.05 2023-11-20 22:39:21,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187350 2023-11-20 22:39:25,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1248960.0, ans=0.125 2023-11-20 22:39:26,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1248960.0, ans=0.0 2023-11-20 22:39:27,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1249026.6666666667, ans=0.07 2023-11-20 22:39:28,595 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7000, loss[loss=0.04961, simple_loss=0.06374, pruned_loss=0.007891, audio_tagging_loss=0.009847, over 15282.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09691, pruned_loss=0.0177, audio_tagging_loss=0.009845, over 3050896.12 frames. ], batch size: 60, lr: 4.29e-03, grad_scale: 16.0 2023-11-20 22:39:47,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1249093.3333333333, ans=0.0 2023-11-20 22:39:52,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.760e+01 8.148e+01 8.784e+01 9.512e+01 1.267e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 22:39:58,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1249160.0, ans=0.0 2023-11-20 22:40:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1249160.0, ans=0.125 2023-11-20 22:40:15,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1249226.6666666667, ans=0.125 2023-11-20 22:40:25,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187400 2023-11-20 22:40:29,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2023-11-20 22:40:33,240 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7050, loss[loss=0.08114, simple_loss=0.09917, pruned_loss=0.02042, audio_tagging_loss=0.01113, over 14446.00 frames. ], tot_loss[loss=0.07564, simple_loss=0.09629, pruned_loss=0.01752, audio_tagging_loss=0.009973, over 3049383.59 frames. ], batch size: 53, lr: 4.29e-03, grad_scale: 16.0 2023-11-20 22:40:43,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-20 22:40:45,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-20 22:41:04,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1249493.3333333333, ans=0.0 2023-11-20 22:41:24,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1249626.6666666667, ans=0.0 2023-11-20 22:41:29,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187450 2023-11-20 22:41:37,924 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7100, loss[loss=0.08336, simple_loss=0.1058, pruned_loss=0.01999, audio_tagging_loss=0.01049, over 14730.00 frames. ], tot_loss[loss=0.0763, simple_loss=0.09699, pruned_loss=0.01777, audio_tagging_loss=0.01003, over 3054976.07 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 16.0 2023-11-20 22:41:44,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1249693.3333333333, ans=0.125 2023-11-20 22:41:47,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1249693.3333333333, ans=0.125 2023-11-20 22:42:01,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1249760.0, ans=0.0 2023-11-20 22:42:01,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1249760.0, ans=0.2 2023-11-20 22:42:02,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.541e+01 8.191e+01 8.839e+01 9.583e+01 1.103e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 22:42:34,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187500 2023-11-20 22:42:38,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1249960.0, ans=0.125 2023-11-20 22:42:41,828 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7150, loss[loss=0.08406, simple_loss=0.107, pruned_loss=0.02213, audio_tagging_loss=0.008448, over 13869.00 frames. ], tot_loss[loss=0.07714, simple_loss=0.09805, pruned_loss=0.0181, audio_tagging_loss=0.01001, over 3049326.97 frames. ], batch size: 52, lr: 4.29e-03, grad_scale: 16.0 2023-11-20 22:43:09,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1250160.0, ans=0.2 2023-11-20 22:43:28,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1250226.6666666667, ans=0.0 2023-11-20 22:43:38,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187550 2023-11-20 22:43:45,992 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7200, loss[loss=0.06427, simple_loss=0.08318, pruned_loss=0.01271, audio_tagging_loss=0.009961, over 15252.00 frames. ], tot_loss[loss=0.07742, simple_loss=0.09849, pruned_loss=0.01812, audio_tagging_loss=0.01005, over 3048603.81 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:43:51,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1250360.0, ans=0.5 2023-11-20 22:43:57,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1250426.6666666667, ans=0.0 2023-11-20 22:44:04,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1250426.6666666667, ans=0.0 2023-11-20 22:44:10,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.608e+01 8.172e+01 8.869e+01 9.437e+01 2.740e+02, threshold=1.774e+02, percent-clipped=1.0 2023-11-20 22:44:12,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-11-20 22:44:13,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1250493.3333333333, ans=0.0 2023-11-20 22:44:24,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1250560.0, ans=0.1 2023-11-20 22:44:30,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1250560.0, ans=0.0 2023-11-20 22:44:39,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-11-20 22:44:41,994 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187600 2023-11-20 22:44:50,281 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7250, loss[loss=0.0959, simple_loss=0.1141, pruned_loss=0.02864, audio_tagging_loss=0.01021, over 15110.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09811, pruned_loss=0.01792, audio_tagging_loss=0.01009, over 3042575.59 frames. ], batch size: 55, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:44:52,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1250693.3333333333, ans=0.0 2023-11-20 22:45:29,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=22.5 2023-11-20 22:45:47,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187650 2023-11-20 22:45:55,051 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7300, loss[loss=0.07069, simple_loss=0.09425, pruned_loss=0.01549, audio_tagging_loss=0.008073, over 15297.00 frames. ], tot_loss[loss=0.07725, simple_loss=0.09839, pruned_loss=0.01807, audio_tagging_loss=0.009985, over 3042404.01 frames. ], batch size: 58, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:46:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1251093.3333333333, ans=0.1 2023-11-20 22:46:18,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.538e+01 8.202e+01 8.886e+01 9.738e+01 1.326e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 22:46:22,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1251160.0, ans=0.0 2023-11-20 22:46:51,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187700 2023-11-20 22:46:59,067 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7350, loss[loss=0.0469, simple_loss=0.05377, pruned_loss=0.008198, audio_tagging_loss=0.01181, over 15631.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09868, pruned_loss=0.01806, audio_tagging_loss=0.009753, over 3039189.17 frames. ], batch size: 63, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:47:11,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-11-20 22:47:28,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1251493.3333333333, ans=0.1 2023-11-20 22:47:37,509 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:47:52,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1251626.6666666667, ans=0.0 2023-11-20 22:47:52,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1251626.6666666667, ans=0.025 2023-11-20 22:47:55,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187750 2023-11-20 22:48:02,463 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7400, loss[loss=0.05608, simple_loss=0.06559, pruned_loss=0.01126, audio_tagging_loss=0.01203, over 15345.00 frames. ], tot_loss[loss=0.07703, simple_loss=0.09865, pruned_loss=0.01799, audio_tagging_loss=0.00971, over 3036599.99 frames. ], batch size: 61, lr: 4.29e-03, grad_scale: 32.0 2023-11-20 22:48:05,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1251693.3333333333, ans=0.125 2023-11-20 22:48:15,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1251760.0, ans=0.1 2023-11-20 22:48:27,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.213e+01 8.738e+01 9.709e+01 1.431e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 22:48:33,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1251826.6666666667, ans=0.0 2023-11-20 22:48:37,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1251826.6666666667, ans=0.125 2023-11-20 22:48:38,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1251826.6666666667, ans=0.125 2023-11-20 22:48:48,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1251893.3333333333, ans=0.5 2023-11-20 22:48:59,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187800 2023-11-20 22:49:06,808 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7450, loss[loss=0.07587, simple_loss=0.1015, pruned_loss=0.01781, audio_tagging_loss=0.007295, over 14573.00 frames. ], tot_loss[loss=0.07682, simple_loss=0.09833, pruned_loss=0.01802, audio_tagging_loss=0.009634, over 3034897.29 frames. ], batch size: 54, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:49:33,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=12.0 2023-11-20 22:49:45,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1252226.6666666667, ans=0.125 2023-11-20 22:50:02,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187850 2023-11-20 22:50:10,841 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7500, loss[loss=0.07816, simple_loss=0.09804, pruned_loss=0.02024, audio_tagging_loss=0.008902, over 15763.00 frames. ], tot_loss[loss=0.07762, simple_loss=0.09926, pruned_loss=0.01832, audio_tagging_loss=0.009666, over 3031321.02 frames. ], batch size: 59, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:50:17,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1252360.0, ans=0.1 2023-11-20 22:50:34,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.198e+01 8.879e+01 9.496e+01 1.257e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 22:50:41,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1252493.3333333333, ans=0.0 2023-11-20 22:51:02,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1252626.6666666667, ans=0.125 2023-11-20 22:51:03,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-20 22:51:06,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187900 2023-11-20 22:51:13,956 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7550, loss[loss=0.08032, simple_loss=0.1106, pruned_loss=0.01793, audio_tagging_loss=0.007091, over 14292.00 frames. ], tot_loss[loss=0.07769, simple_loss=0.09913, pruned_loss=0.01847, audio_tagging_loss=0.009648, over 3039840.73 frames. ], batch size: 53, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:51:15,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1252693.3333333333, ans=0.125 2023-11-20 22:52:03,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1252960.0, ans=0.125 2023-11-20 22:52:10,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 187950 2023-11-20 22:52:17,965 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7600, loss[loss=0.05244, simple_loss=0.06468, pruned_loss=0.008207, audio_tagging_loss=0.0119, over 16413.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09848, pruned_loss=0.01824, audio_tagging_loss=0.009668, over 3048572.11 frames. ], batch size: 62, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:52:31,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1253093.3333333333, ans=0.0 2023-11-20 22:52:42,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.631e+01 8.162e+01 8.819e+01 9.727e+01 1.348e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 22:52:49,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1253160.0, ans=0.2 2023-11-20 22:52:51,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1253160.0, ans=0.125 2023-11-20 22:53:01,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-20 22:53:14,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188000 2023-11-20 22:53:22,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1253293.3333333333, ans=0.0 2023-11-20 22:53:25,784 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7650, loss[loss=0.06017, simple_loss=0.07425, pruned_loss=0.01364, audio_tagging_loss=0.009401, over 14239.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09847, pruned_loss=0.0182, audio_tagging_loss=0.009717, over 3049433.96 frames. ], batch size: 56, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:53:31,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1253360.0, ans=0.125 2023-11-20 22:53:31,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1253360.0, ans=0.125 2023-11-20 22:54:03,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1253560.0, ans=0.07 2023-11-20 22:54:21,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1253626.6666666667, ans=0.2 2023-11-20 22:54:22,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188050 2023-11-20 22:54:27,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-11-20 22:54:30,090 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7700, loss[loss=0.06235, simple_loss=0.07822, pruned_loss=0.01476, audio_tagging_loss=0.008471, over 15177.00 frames. ], tot_loss[loss=0.0773, simple_loss=0.09873, pruned_loss=0.01816, audio_tagging_loss=0.009778, over 3046848.96 frames. ], batch size: 59, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:54:30,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1253693.3333333333, ans=0.125 2023-11-20 22:54:46,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1253760.0, ans=0.2 2023-11-20 22:54:49,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1253760.0, ans=0.2 2023-11-20 22:54:54,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.666e+01 7.984e+01 8.508e+01 9.260e+01 1.144e+02, threshold=1.702e+02, percent-clipped=0.0 2023-11-20 22:54:59,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1253826.6666666667, ans=0.0 2023-11-20 22:55:06,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-20 22:55:14,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1253893.3333333333, ans=0.0 2023-11-20 22:55:26,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188100 2023-11-20 22:55:34,507 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7750, loss[loss=0.06934, simple_loss=0.08532, pruned_loss=0.0135, audio_tagging_loss=0.01318, over 14353.00 frames. ], tot_loss[loss=0.07656, simple_loss=0.09755, pruned_loss=0.01785, audio_tagging_loss=0.009943, over 3035619.33 frames. ], batch size: 54, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:55:57,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1254093.3333333333, ans=0.125 2023-11-20 22:55:57,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1254093.3333333333, ans=0.0 2023-11-20 22:56:01,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1254160.0, ans=0.2 2023-11-20 22:56:01,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-20 22:56:02,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1254160.0, ans=0.025 2023-11-20 22:56:03,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1254160.0, ans=0.0 2023-11-20 22:56:14,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-20 22:56:30,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188150 2023-11-20 22:56:31,903 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 22:56:36,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1254360.0, ans=0.125 2023-11-20 22:56:37,569 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7800, loss[loss=0.07208, simple_loss=0.09075, pruned_loss=0.01658, audio_tagging_loss=0.01013, over 15597.00 frames. ], tot_loss[loss=0.07651, simple_loss=0.09754, pruned_loss=0.0178, audio_tagging_loss=0.009935, over 3045017.95 frames. ], batch size: 60, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:56:46,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1254360.0, ans=0.125 2023-11-20 22:57:01,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.318e+01 8.003e+01 8.886e+01 9.530e+01 1.591e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 22:57:02,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1254493.3333333333, ans=0.04949747468305833 2023-11-20 22:57:15,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.512e-01 2023-11-20 22:57:15,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1254560.0, ans=0.1 2023-11-20 22:57:18,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1254560.0, ans=0.125 2023-11-20 22:57:20,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1254560.0, ans=10.0 2023-11-20 22:57:22,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1254560.0, ans=0.0 2023-11-20 22:57:24,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-11-20 22:57:27,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2023-11-20 22:57:34,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188200 2023-11-20 22:57:42,494 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7850, loss[loss=0.0913, simple_loss=0.1141, pruned_loss=0.02519, audio_tagging_loss=0.009066, over 15256.00 frames. ], tot_loss[loss=0.07701, simple_loss=0.09825, pruned_loss=0.01798, audio_tagging_loss=0.009907, over 3041065.54 frames. ], batch size: 59, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:57:50,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1254693.3333333333, ans=0.125 2023-11-20 22:57:53,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1254693.3333333333, ans=0.1 2023-11-20 22:58:36,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1254960.0, ans=0.125 2023-11-20 22:58:37,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.54 vs. limit=10.0 2023-11-20 22:58:39,016 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188250 2023-11-20 22:58:41,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1254960.0, ans=0.125 2023-11-20 22:58:47,262 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7900, loss[loss=0.07988, simple_loss=0.1002, pruned_loss=0.01845, audio_tagging_loss=0.01132, over 16762.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.0973, pruned_loss=0.01788, audio_tagging_loss=0.01006, over 3037454.20 frames. ], batch size: 63, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:59:10,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.630e+01 8.310e+01 8.933e+01 9.750e+01 1.293e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 22:59:12,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1255160.0, ans=0.125 2023-11-20 22:59:43,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188300 2023-11-20 22:59:48,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1255293.3333333333, ans=0.0 2023-11-20 22:59:50,546 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 7950, loss[loss=0.08443, simple_loss=0.101, pruned_loss=0.02057, audio_tagging_loss=0.01334, over 14594.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09679, pruned_loss=0.01783, audio_tagging_loss=0.01014, over 3044570.16 frames. ], batch size: 54, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 22:59:52,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1255360.0, ans=0.0 2023-11-20 23:00:04,606 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:00:17,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1255493.3333333333, ans=0.125 2023-11-20 23:00:26,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1255493.3333333333, ans=0.2 2023-11-20 23:00:32,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-20 23:00:38,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1255560.0, ans=0.125 2023-11-20 23:00:46,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188350 2023-11-20 23:00:48,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1255626.6666666667, ans=0.0 2023-11-20 23:00:50,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1255626.6666666667, ans=0.0 2023-11-20 23:00:52,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1255626.6666666667, ans=0.125 2023-11-20 23:00:54,431 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8000, loss[loss=0.07635, simple_loss=0.09025, pruned_loss=0.01933, audio_tagging_loss=0.01189, over 14709.00 frames. ], tot_loss[loss=0.07643, simple_loss=0.09678, pruned_loss=0.01793, audio_tagging_loss=0.01011, over 3042106.40 frames. ], batch size: 55, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:01:01,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1255693.3333333333, ans=0.125 2023-11-20 23:01:01,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-11-20 23:01:09,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1255760.0, ans=0.1 2023-11-20 23:01:19,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.042e+01 8.522e+01 9.463e+01 1.266e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-20 23:01:29,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1255826.6666666667, ans=0.1 2023-11-20 23:01:30,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1255826.6666666667, ans=0.125 2023-11-20 23:01:48,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1255960.0, ans=0.1 2023-11-20 23:01:50,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188400 2023-11-20 23:01:59,856 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8050, loss[loss=0.08513, simple_loss=0.1086, pruned_loss=0.02198, audio_tagging_loss=0.008857, over 15706.00 frames. ], tot_loss[loss=0.07583, simple_loss=0.09604, pruned_loss=0.01763, audio_tagging_loss=0.01018, over 3044321.55 frames. ], batch size: 59, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:02:30,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.14 vs. limit=6.0 2023-11-20 23:02:40,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-20 23:02:41,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1256226.6666666667, ans=0.02 2023-11-20 23:02:56,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188450 2023-11-20 23:03:03,304 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8100, loss[loss=0.04903, simple_loss=0.06106, pruned_loss=0.006735, audio_tagging_loss=0.01177, over 13999.00 frames. ], tot_loss[loss=0.07674, simple_loss=0.09758, pruned_loss=0.0179, audio_tagging_loss=0.01004, over 3044402.25 frames. ], batch size: 55, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:03:04,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-20 23:03:22,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.10 vs. limit=22.5 2023-11-20 23:03:23,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1256426.6666666667, ans=0.125 2023-11-20 23:03:26,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.237e+01 9.309e+01 1.004e+02 1.229e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-20 23:03:38,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1256493.3333333333, ans=0.125 2023-11-20 23:03:50,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-20 23:03:58,841 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188500 2023-11-20 23:03:59,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 23:04:06,670 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8150, loss[loss=0.09047, simple_loss=0.116, pruned_loss=0.01828, audio_tagging_loss=0.01416, over 15400.00 frames. ], tot_loss[loss=0.07669, simple_loss=0.09771, pruned_loss=0.01793, audio_tagging_loss=0.009905, over 3049264.60 frames. ], batch size: 56, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:04:12,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2023-11-20 23:04:33,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1256826.6666666667, ans=0.125 2023-11-20 23:04:51,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1256893.3333333333, ans=0.02 2023-11-20 23:04:54,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=15.0 2023-11-20 23:05:02,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188550 2023-11-20 23:05:08,396 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:05:09,571 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8200, loss[loss=0.07779, simple_loss=0.1095, pruned_loss=0.01585, audio_tagging_loss=0.00718, over 15906.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.09786, pruned_loss=0.0178, audio_tagging_loss=0.009856, over 3043253.35 frames. ], batch size: 58, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:05:34,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.637e+01 8.183e+01 8.865e+01 9.614e+01 1.183e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 23:05:40,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1257160.0, ans=0.0 2023-11-20 23:05:43,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1257160.0, ans=0.0 2023-11-20 23:05:54,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1257226.6666666667, ans=0.0 2023-11-20 23:06:07,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188600 2023-11-20 23:06:11,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1257293.3333333333, ans=0.09899494936611666 2023-11-20 23:06:15,087 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8250, loss[loss=0.08231, simple_loss=0.1119, pruned_loss=0.01882, audio_tagging_loss=0.007534, over 15697.00 frames. ], tot_loss[loss=0.07612, simple_loss=0.09694, pruned_loss=0.0178, audio_tagging_loss=0.00985, over 3049376.72 frames. ], batch size: 57, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:06:52,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2023-11-20 23:06:54,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1257560.0, ans=0.0 2023-11-20 23:07:01,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1257560.0, ans=0.1 2023-11-20 23:07:05,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1257626.6666666667, ans=0.0 2023-11-20 23:07:07,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1257626.6666666667, ans=0.2 2023-11-20 23:07:11,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188650 2023-11-20 23:07:18,990 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8300, loss[loss=0.06696, simple_loss=0.09112, pruned_loss=0.01314, audio_tagging_loss=0.008253, over 15815.00 frames. ], tot_loss[loss=0.07537, simple_loss=0.09608, pruned_loss=0.01746, audio_tagging_loss=0.00987, over 3048088.11 frames. ], batch size: 59, lr: 4.28e-03, grad_scale: 32.0 2023-11-20 23:07:43,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.445e+01 8.113e+01 8.858e+01 9.502e+01 1.553e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 23:07:50,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1257826.6666666667, ans=0.09899494936611666 2023-11-20 23:07:54,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1257826.6666666667, ans=0.5 2023-11-20 23:08:12,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1257960.0, ans=0.0 2023-11-20 23:08:15,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188700 2023-11-20 23:08:22,652 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8350, loss[loss=0.07533, simple_loss=0.1006, pruned_loss=0.01676, audio_tagging_loss=0.00826, over 16172.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09625, pruned_loss=0.0175, audio_tagging_loss=0.009769, over 3044292.88 frames. ], batch size: 59, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:08:33,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1258026.6666666667, ans=0.125 2023-11-20 23:08:41,893 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 23:08:43,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1258093.3333333333, ans=0.0 2023-11-20 23:08:51,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1258160.0, ans=0.125 2023-11-20 23:09:19,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188750 2023-11-20 23:09:22,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2023-11-20 23:09:27,459 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8400, loss[loss=0.06266, simple_loss=0.07856, pruned_loss=0.0119, audio_tagging_loss=0.01148, over 15169.00 frames. ], tot_loss[loss=0.07477, simple_loss=0.09561, pruned_loss=0.01723, audio_tagging_loss=0.009728, over 3046961.87 frames. ], batch size: 57, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:09:31,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1258360.0, ans=0.0 2023-11-20 23:09:41,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1258426.6666666667, ans=0.1 2023-11-20 23:09:44,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1258426.6666666667, ans=0.0 2023-11-20 23:09:51,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.077e+01 8.855e+01 9.404e+01 1.397e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 23:10:08,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1258560.0, ans=0.1 2023-11-20 23:10:23,344 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188800 2023-11-20 23:10:24,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1258626.6666666667, ans=0.125 2023-11-20 23:10:31,298 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8450, loss[loss=0.05868, simple_loss=0.0745, pruned_loss=0.01207, audio_tagging_loss=0.009362, over 15261.00 frames. ], tot_loss[loss=0.07501, simple_loss=0.09571, pruned_loss=0.01732, audio_tagging_loss=0.009831, over 3040526.07 frames. ], batch size: 58, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:10:32,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1258693.3333333333, ans=0.1 2023-11-20 23:10:42,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1258760.0, ans=0.125 2023-11-20 23:11:02,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1258826.6666666667, ans=0.0 2023-11-20 23:11:13,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1258893.3333333333, ans=0.1 2023-11-20 23:11:15,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1258893.3333333333, ans=0.0 2023-11-20 23:11:27,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188850 2023-11-20 23:11:28,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1258960.0, ans=0.0 2023-11-20 23:11:28,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1258960.0, ans=0.025 2023-11-20 23:11:34,537 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8500, loss[loss=0.07512, simple_loss=0.0931, pruned_loss=0.01642, audio_tagging_loss=0.01215, over 14141.00 frames. ], tot_loss[loss=0.07567, simple_loss=0.09647, pruned_loss=0.01754, audio_tagging_loss=0.009901, over 3039197.25 frames. ], batch size: 53, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:11:46,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1259026.6666666667, ans=0.1 2023-11-20 23:11:52,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1259093.3333333333, ans=0.125 2023-11-20 23:11:59,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.153e+01 8.872e+01 9.707e+01 1.209e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 23:12:31,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188900 2023-11-20 23:12:33,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1259293.3333333333, ans=0.125 2023-11-20 23:12:39,225 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8550, loss[loss=0.06932, simple_loss=0.07928, pruned_loss=0.01733, audio_tagging_loss=0.01234, over 15933.00 frames. ], tot_loss[loss=0.07522, simple_loss=0.09585, pruned_loss=0.01737, audio_tagging_loss=0.009924, over 3038384.14 frames. ], batch size: 61, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:12:42,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-20 23:12:45,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-20 23:13:01,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1259426.6666666667, ans=0.1 2023-11-20 23:13:16,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1259560.0, ans=0.2 2023-11-20 23:13:18,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-20 23:13:20,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 23:13:21,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1259560.0, ans=0.2 2023-11-20 23:13:34,996 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 188950 2023-11-20 23:13:42,785 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8600, loss[loss=0.07586, simple_loss=0.1011, pruned_loss=0.01648, audio_tagging_loss=0.008843, over 15557.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.0967, pruned_loss=0.01756, audio_tagging_loss=0.00988, over 3042853.80 frames. ], batch size: 58, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:14:06,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.038e+01 8.664e+01 9.287e+01 1.205e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 23:14:21,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1259893.3333333333, ans=0.2 2023-11-20 23:14:37,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1259960.0, ans=0.0 2023-11-20 23:14:39,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189000 2023-11-20 23:14:41,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-20 23:14:47,229 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8650, loss[loss=0.0672, simple_loss=0.07969, pruned_loss=0.01722, audio_tagging_loss=0.01014, over 14612.00 frames. ], tot_loss[loss=0.07528, simple_loss=0.09583, pruned_loss=0.01739, audio_tagging_loss=0.00998, over 3048630.30 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:14:51,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1260026.6666666667, ans=0.0 2023-11-20 23:15:16,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1260160.0, ans=0.125 2023-11-20 23:15:33,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1260226.6666666667, ans=0.125 2023-11-20 23:15:39,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1260293.3333333333, ans=0.2 2023-11-20 23:15:43,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189050 2023-11-20 23:15:46,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-20 23:15:52,062 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8700, loss[loss=0.08784, simple_loss=0.1164, pruned_loss=0.02106, audio_tagging_loss=0.008602, over 15122.00 frames. ], tot_loss[loss=0.07574, simple_loss=0.09631, pruned_loss=0.01751, audio_tagging_loss=0.01007, over 3048489.95 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:16:01,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1260360.0, ans=0.125 2023-11-20 23:16:15,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1260426.6666666667, ans=0.125 2023-11-20 23:16:17,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.100e+01 8.533e+01 9.341e+01 1.224e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-20 23:16:27,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1260493.3333333333, ans=0.125 2023-11-20 23:16:43,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1260626.6666666667, ans=0.0 2023-11-20 23:16:48,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189100 2023-11-20 23:16:54,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1260693.3333333333, ans=0.125 2023-11-20 23:16:55,192 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8750, loss[loss=0.07375, simple_loss=0.09945, pruned_loss=0.0166, audio_tagging_loss=0.007431, over 16729.00 frames. ], tot_loss[loss=0.07672, simple_loss=0.09765, pruned_loss=0.01786, audio_tagging_loss=0.01004, over 3048150.08 frames. ], batch size: 62, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:17:02,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1260693.3333333333, ans=0.0 2023-11-20 23:17:07,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1260760.0, ans=0.125 2023-11-20 23:17:17,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1260760.0, ans=0.125 2023-11-20 23:17:29,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1260826.6666666667, ans=0.95 2023-11-20 23:17:30,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1260826.6666666667, ans=0.5 2023-11-20 23:17:36,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1260893.3333333333, ans=0.125 2023-11-20 23:17:36,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-20 23:17:37,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-11-20 23:17:51,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189150 2023-11-20 23:17:55,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1260960.0, ans=0.2 2023-11-20 23:17:58,589 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8800, loss[loss=0.07331, simple_loss=0.09061, pruned_loss=0.01661, audio_tagging_loss=0.0114, over 15242.00 frames. ], tot_loss[loss=0.07797, simple_loss=0.09937, pruned_loss=0.01821, audio_tagging_loss=0.01007, over 3045325.44 frames. ], batch size: 57, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:18:19,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1261093.3333333333, ans=0.0 2023-11-20 23:18:20,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1261093.3333333333, ans=0.125 2023-11-20 23:18:24,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.201e+01 8.889e+01 9.657e+01 1.880e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-20 23:18:43,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-20 23:18:54,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189200 2023-11-20 23:19:03,166 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8850, loss[loss=0.06895, simple_loss=0.08247, pruned_loss=0.0169, audio_tagging_loss=0.01081, over 14538.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.09924, pruned_loss=0.01834, audio_tagging_loss=0.01011, over 3045128.64 frames. ], batch size: 54, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:19:13,004 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:19:27,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1261493.3333333333, ans=0.125 2023-11-20 23:19:28,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-20 23:19:55,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1261626.6666666667, ans=0.05 2023-11-20 23:19:58,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189250 2023-11-20 23:20:01,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1261626.6666666667, ans=0.1 2023-11-20 23:20:05,959 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8900, loss[loss=0.08327, simple_loss=0.1097, pruned_loss=0.02001, audio_tagging_loss=0.008427, over 14871.00 frames. ], tot_loss[loss=0.07838, simple_loss=0.09983, pruned_loss=0.0185, audio_tagging_loss=0.009967, over 3051599.70 frames. ], batch size: 56, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:20:17,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1261760.0, ans=0.1 2023-11-20 23:20:25,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1261760.0, ans=0.125 2023-11-20 23:20:31,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.227e+01 8.677e+01 9.378e+01 1.174e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 23:20:32,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1261826.6666666667, ans=0.1 2023-11-20 23:20:37,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1261826.6666666667, ans=0.1 2023-11-20 23:21:00,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1261960.0, ans=0.125 2023-11-20 23:21:01,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189300 2023-11-20 23:21:03,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1261960.0, ans=0.07 2023-11-20 23:21:09,617 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 8950, loss[loss=0.09091, simple_loss=0.1274, pruned_loss=0.02044, audio_tagging_loss=0.006764, over 14818.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.09936, pruned_loss=0.0185, audio_tagging_loss=0.00989, over 3051346.21 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:21:17,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1262026.6666666667, ans=0.0 2023-11-20 23:21:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1262026.6666666667, ans=0.1 2023-11-20 23:21:23,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1262093.3333333333, ans=0.2 2023-11-20 23:21:25,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1262093.3333333333, ans=0.125 2023-11-20 23:22:05,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189350 2023-11-20 23:22:13,074 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9000, loss[loss=0.07062, simple_loss=0.09402, pruned_loss=0.01539, audio_tagging_loss=0.008218, over 15944.00 frames. ], tot_loss[loss=0.07826, simple_loss=0.1, pruned_loss=0.01854, audio_tagging_loss=0.009708, over 3054581.11 frames. ], batch size: 59, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:22:13,075 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-20 23:22:34,652 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2744, 0.9372, 3.3264, 3.0211, 2.8238, 3.0193, 2.4487, 2.8075], device='cuda:1') 2023-11-20 23:22:55,290 INFO [train_asr.py:1253] (1/4) Epoch 16, validation: loss=0.06115, simple_loss=0.05296, pruned_loss=0.005511, audio_tagging_loss=0.02916, over 4681554.00 frames. 2023-11-20 23:22:55,291 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-20 23:22:57,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-20 23:23:13,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-20 23:23:14,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1262426.6666666667, ans=0.125 2023-11-20 23:23:15,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1262426.6666666667, ans=0.125 2023-11-20 23:23:20,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1262493.3333333333, ans=0.125 2023-11-20 23:23:22,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.262e+01 9.175e+01 9.933e+01 1.311e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 23:23:27,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1262493.3333333333, ans=0.2 2023-11-20 23:23:27,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1262493.3333333333, ans=0.2 2023-11-20 23:23:32,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1262493.3333333333, ans=0.125 2023-11-20 23:23:40,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1262560.0, ans=0.125 2023-11-20 23:23:49,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1262626.6666666667, ans=0.125 2023-11-20 23:23:50,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1262626.6666666667, ans=0.1 2023-11-20 23:23:51,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189400 2023-11-20 23:23:57,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1262626.6666666667, ans=0.125 2023-11-20 23:23:59,683 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9050, loss[loss=0.09295, simple_loss=0.1161, pruned_loss=0.02506, audio_tagging_loss=0.009847, over 15274.00 frames. ], tot_loss[loss=0.07768, simple_loss=0.0993, pruned_loss=0.01847, audio_tagging_loss=0.009557, over 3045689.31 frames. ], batch size: 57, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:24:04,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1262693.3333333333, ans=0.125 2023-11-20 23:24:06,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1262693.3333333333, ans=0.125 2023-11-20 23:24:13,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1262760.0, ans=0.125 2023-11-20 23:24:16,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-20 23:24:17,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1262760.0, ans=0.125 2023-11-20 23:24:23,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-20 23:24:33,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262826.6666666667, ans=0.1 2023-11-20 23:24:42,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1262893.3333333333, ans=0.125 2023-11-20 23:24:56,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189450 2023-11-20 23:25:04,468 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9100, loss[loss=0.08033, simple_loss=0.1085, pruned_loss=0.01857, audio_tagging_loss=0.007512, over 14930.00 frames. ], tot_loss[loss=0.07694, simple_loss=0.09837, pruned_loss=0.01821, audio_tagging_loss=0.009541, over 3044393.69 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:25:11,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1263026.6666666667, ans=0.1 2023-11-20 23:25:21,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1263093.3333333333, ans=0.0 2023-11-20 23:25:22,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1263093.3333333333, ans=0.035 2023-11-20 23:25:30,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.183e+01 8.786e+01 9.651e+01 1.253e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 23:25:33,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1263160.0, ans=0.1 2023-11-20 23:25:39,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1263160.0, ans=0.125 2023-11-20 23:25:44,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1263226.6666666667, ans=0.125 2023-11-20 23:26:00,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189500 2023-11-20 23:26:07,509 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9150, loss[loss=0.06189, simple_loss=0.08161, pruned_loss=0.01071, audio_tagging_loss=0.01037, over 14205.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09886, pruned_loss=0.01824, audio_tagging_loss=0.009481, over 3036405.75 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 16.0 2023-11-20 23:26:19,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1263426.6666666667, ans=0.125 2023-11-20 23:26:20,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263426.6666666667, ans=0.125 2023-11-20 23:26:29,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2023-11-20 23:27:03,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189550 2023-11-20 23:27:03,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263626.6666666667, ans=0.125 2023-11-20 23:27:10,367 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9200, loss[loss=0.08164, simple_loss=0.1128, pruned_loss=0.0162, audio_tagging_loss=0.009043, over 14754.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09852, pruned_loss=0.01834, audio_tagging_loss=0.009493, over 3045591.06 frames. ], batch size: 55, lr: 4.27e-03, grad_scale: 32.0 2023-11-20 23:27:24,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1263760.0, ans=0.0 2023-11-20 23:27:37,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.197e+01 8.151e+01 8.702e+01 9.490e+01 1.171e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 23:27:55,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-20 23:28:06,287 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189600 2023-11-20 23:28:14,506 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9250, loss[loss=0.05509, simple_loss=0.06279, pruned_loss=0.01094, audio_tagging_loss=0.01275, over 15968.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.09819, pruned_loss=0.01817, audio_tagging_loss=0.009524, over 3046961.06 frames. ], batch size: 62, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:28:28,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1264093.3333333333, ans=0.0 2023-11-20 23:28:38,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-20 23:28:55,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-20 23:28:56,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1264226.6666666667, ans=0.125 2023-11-20 23:28:56,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1264226.6666666667, ans=0.125 2023-11-20 23:29:10,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189650 2023-11-20 23:29:18,271 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9300, loss[loss=0.05285, simple_loss=0.05817, pruned_loss=0.01129, audio_tagging_loss=0.01247, over 14114.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09775, pruned_loss=0.01791, audio_tagging_loss=0.009569, over 3051293.99 frames. ], batch size: 55, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:29:30,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1264426.6666666667, ans=0.125 2023-11-20 23:29:44,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.124e+01 8.662e+01 9.573e+01 1.521e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 23:29:46,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1264493.3333333333, ans=0.0 2023-11-20 23:29:49,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-20 23:30:09,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.56 vs. limit=8.0 2023-11-20 23:30:14,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189700 2023-11-20 23:30:22,381 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9350, loss[loss=0.06852, simple_loss=0.08177, pruned_loss=0.01736, audio_tagging_loss=0.01028, over 13244.00 frames. ], tot_loss[loss=0.07669, simple_loss=0.09826, pruned_loss=0.01791, audio_tagging_loss=0.009648, over 3051434.67 frames. ], batch size: 52, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:30:26,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1264693.3333333333, ans=0.125 2023-11-20 23:30:35,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1264760.0, ans=0.125 2023-11-20 23:30:37,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1264760.0, ans=0.125 2023-11-20 23:30:53,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1264826.6666666667, ans=0.1 2023-11-20 23:31:18,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189750 2023-11-20 23:31:20,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1264960.0, ans=0.125 2023-11-20 23:31:26,042 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9400, loss[loss=0.07554, simple_loss=0.096, pruned_loss=0.01467, audio_tagging_loss=0.01287, over 15591.00 frames. ], tot_loss[loss=0.07684, simple_loss=0.09801, pruned_loss=0.01798, audio_tagging_loss=0.009857, over 3054929.90 frames. ], batch size: 59, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:31:36,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1265026.6666666667, ans=0.125 2023-11-20 23:31:45,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1265093.3333333333, ans=0.125 2023-11-20 23:31:54,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.398e+01 8.103e+01 8.759e+01 9.361e+01 1.221e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 23:31:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1265160.0, ans=0.0 2023-11-20 23:31:58,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1265160.0, ans=0.125 2023-11-20 23:32:22,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189800 2023-11-20 23:32:27,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-20 23:32:27,733 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:32:31,397 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9450, loss[loss=0.08741, simple_loss=0.1209, pruned_loss=0.01969, audio_tagging_loss=0.007258, over 15526.00 frames. ], tot_loss[loss=0.07722, simple_loss=0.09858, pruned_loss=0.0181, audio_tagging_loss=0.009829, over 3051625.14 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:32:39,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-20 23:32:46,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-20 23:32:55,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1265493.3333333333, ans=0.125 2023-11-20 23:32:58,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2023-11-20 23:33:22,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1265626.6666666667, ans=0.125 2023-11-20 23:33:27,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189850 2023-11-20 23:33:34,923 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9500, loss[loss=0.0909, simple_loss=0.1197, pruned_loss=0.02346, audio_tagging_loss=0.00759, over 15909.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09915, pruned_loss=0.01818, audio_tagging_loss=0.009804, over 3050245.79 frames. ], batch size: 58, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:33:37,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1265693.3333333333, ans=0.0 2023-11-20 23:33:47,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1265760.0, ans=0.125 2023-11-20 23:33:54,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1265760.0, ans=0.125 2023-11-20 23:34:02,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.467e+01 9.097e+01 1.010e+02 1.655e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-20 23:34:09,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1265826.6666666667, ans=0.125 2023-11-20 23:34:15,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1265893.3333333333, ans=0.125 2023-11-20 23:34:26,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1265960.0, ans=0.1 2023-11-20 23:34:27,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265960.0, ans=0.1 2023-11-20 23:34:30,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189900 2023-11-20 23:34:32,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-20 23:34:38,073 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9550, loss[loss=0.08119, simple_loss=0.09955, pruned_loss=0.01922, audio_tagging_loss=0.01219, over 14995.00 frames. ], tot_loss[loss=0.0773, simple_loss=0.0985, pruned_loss=0.01811, audio_tagging_loss=0.009944, over 3048127.87 frames. ], batch size: 55, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:35:16,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1266226.6666666667, ans=0.2 2023-11-20 23:35:16,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1266226.6666666667, ans=0.2 2023-11-20 23:35:17,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1266226.6666666667, ans=0.0 2023-11-20 23:35:20,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1266226.6666666667, ans=0.2 2023-11-20 23:35:25,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1266226.6666666667, ans=0.07 2023-11-20 23:35:35,277 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 189950 2023-11-20 23:35:36,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1266293.3333333333, ans=0.125 2023-11-20 23:35:41,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1266360.0, ans=0.125 2023-11-20 23:35:42,331 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9600, loss[loss=0.06328, simple_loss=0.08032, pruned_loss=0.01475, audio_tagging_loss=0.008371, over 15665.00 frames. ], tot_loss[loss=0.07731, simple_loss=0.09835, pruned_loss=0.0181, audio_tagging_loss=0.01003, over 3046660.99 frames. ], batch size: 61, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:35:52,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1266360.0, ans=0.0 2023-11-20 23:35:55,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1266426.6666666667, ans=0.5 2023-11-20 23:35:58,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1266426.6666666667, ans=0.125 2023-11-20 23:36:10,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.084e+01 8.487e+01 9.415e+01 1.113e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 23:36:20,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1266560.0, ans=0.0 2023-11-20 23:36:21,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1266560.0, ans=0.125 2023-11-20 23:36:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1266560.0, ans=0.125 2023-11-20 23:36:29,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1266560.0, ans=0.0 2023-11-20 23:36:39,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190000 2023-11-20 23:36:44,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1266626.6666666667, ans=0.0 2023-11-20 23:36:47,521 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9650, loss[loss=0.08899, simple_loss=0.1113, pruned_loss=0.02386, audio_tagging_loss=0.009454, over 15172.00 frames. ], tot_loss[loss=0.0778, simple_loss=0.09901, pruned_loss=0.01836, audio_tagging_loss=0.009935, over 3046465.13 frames. ], batch size: 55, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:36:57,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1266693.3333333333, ans=0.125 2023-11-20 23:37:11,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1266760.0, ans=0.125 2023-11-20 23:37:19,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1266826.6666666667, ans=0.0 2023-11-20 23:37:23,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1266826.6666666667, ans=0.125 2023-11-20 23:37:29,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1266893.3333333333, ans=0.0 2023-11-20 23:37:40,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1266960.0, ans=0.0 2023-11-20 23:37:44,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190050 2023-11-20 23:37:44,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-20 23:37:51,307 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9700, loss[loss=0.08018, simple_loss=0.1035, pruned_loss=0.02051, audio_tagging_loss=0.007926, over 14848.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.1003, pruned_loss=0.01858, audio_tagging_loss=0.009752, over 3044595.27 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:38:20,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1267160.0, ans=0.0 2023-11-20 23:38:21,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.452e+01 8.015e+01 8.951e+01 9.775e+01 1.301e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 23:38:48,755 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190100 2023-11-20 23:38:56,266 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9750, loss[loss=0.07417, simple_loss=0.08478, pruned_loss=0.01672, audio_tagging_loss=0.01506, over 15958.00 frames. ], tot_loss[loss=0.07777, simple_loss=0.09957, pruned_loss=0.01834, audio_tagging_loss=0.009647, over 3045365.15 frames. ], batch size: 61, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:39:00,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1267360.0, ans=0.1 2023-11-20 23:39:10,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1267426.6666666667, ans=0.125 2023-11-20 23:39:21,458 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 23:39:38,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1267560.0, ans=0.125 2023-11-20 23:39:43,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-20 23:39:52,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190150 2023-11-20 23:40:00,675 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9800, loss[loss=0.07148, simple_loss=0.0865, pruned_loss=0.01813, audio_tagging_loss=0.0101, over 15168.00 frames. ], tot_loss[loss=0.07724, simple_loss=0.09881, pruned_loss=0.01824, audio_tagging_loss=0.009598, over 3038490.91 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:40:30,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 7.945e+01 8.740e+01 9.752e+01 1.304e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 23:40:56,335 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:40:56,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 23:40:57,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190200 2023-11-20 23:41:05,206 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9850, loss[loss=0.07354, simple_loss=0.08922, pruned_loss=0.01806, audio_tagging_loss=0.01088, over 14793.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.09946, pruned_loss=0.01824, audio_tagging_loss=0.009611, over 3044731.38 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:41:05,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-11-20 23:41:31,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1268160.0, ans=0.0 2023-11-20 23:41:37,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-20 23:41:42,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1268226.6666666667, ans=0.125 2023-11-20 23:41:44,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1268226.6666666667, ans=0.125 2023-11-20 23:42:00,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190250 2023-11-20 23:42:02,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1268293.3333333333, ans=0.125 2023-11-20 23:42:06,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1268293.3333333333, ans=0.0 2023-11-20 23:42:08,625 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9900, loss[loss=0.07374, simple_loss=0.1023, pruned_loss=0.01316, audio_tagging_loss=0.009438, over 14132.00 frames. ], tot_loss[loss=0.07759, simple_loss=0.09956, pruned_loss=0.01827, audio_tagging_loss=0.009539, over 3045722.31 frames. ], batch size: 52, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:42:14,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-20 23:42:16,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1268360.0, ans=0.0 2023-11-20 23:42:37,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.053e+01 8.764e+01 9.412e+01 1.307e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 23:42:43,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1268493.3333333333, ans=0.125 2023-11-20 23:42:46,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-20 23:43:00,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1268626.6666666667, ans=0.125 2023-11-20 23:43:00,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-20 23:43:04,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190300 2023-11-20 23:43:12,079 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 9950, loss[loss=0.07773, simple_loss=0.09816, pruned_loss=0.02155, audio_tagging_loss=0.007098, over 14732.00 frames. ], tot_loss[loss=0.07665, simple_loss=0.09827, pruned_loss=0.01791, audio_tagging_loss=0.009606, over 3042809.89 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:43:12,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1268693.3333333333, ans=0.125 2023-11-20 23:43:18,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-20 23:43:21,051 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 23:43:30,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1268760.0, ans=0.125 2023-11-20 23:43:39,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-20 23:43:42,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1268826.6666666667, ans=0.125 2023-11-20 23:43:50,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1268893.3333333333, ans=0.09899494936611666 2023-11-20 23:43:52,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1268893.3333333333, ans=0.125 2023-11-20 23:43:52,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1268893.3333333333, ans=0.125 2023-11-20 23:44:08,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190350 2023-11-20 23:44:10,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1268960.0, ans=0.0 2023-11-20 23:44:14,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1268960.0, ans=0.125 2023-11-20 23:44:16,755 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10000, loss[loss=0.06134, simple_loss=0.08099, pruned_loss=0.01193, audio_tagging_loss=0.008913, over 15974.00 frames. ], tot_loss[loss=0.07649, simple_loss=0.09805, pruned_loss=0.0179, audio_tagging_loss=0.009566, over 3042507.23 frames. ], batch size: 61, lr: 4.26e-03, grad_scale: 32.0 2023-11-20 23:44:41,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1269160.0, ans=0.125 2023-11-20 23:44:45,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.435e+01 8.173e+01 8.806e+01 9.776e+01 1.486e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 23:44:56,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1269226.6666666667, ans=0.125 2023-11-20 23:45:05,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1269226.6666666667, ans=0.2 2023-11-20 23:45:13,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190400 2023-11-20 23:45:17,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1269293.3333333333, ans=0.1 2023-11-20 23:45:21,714 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10050, loss[loss=0.06792, simple_loss=0.08219, pruned_loss=0.01607, audio_tagging_loss=0.01074, over 15146.00 frames. ], tot_loss[loss=0.07614, simple_loss=0.09753, pruned_loss=0.01775, audio_tagging_loss=0.009631, over 3045691.09 frames. ], batch size: 56, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:45:25,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1269360.0, ans=0.1 2023-11-20 23:45:35,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1269426.6666666667, ans=0.025 2023-11-20 23:45:40,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1269426.6666666667, ans=0.0 2023-11-20 23:45:54,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1269493.3333333333, ans=0.125 2023-11-20 23:46:01,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-20 23:46:13,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1269626.6666666667, ans=0.125 2023-11-20 23:46:14,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-20 23:46:17,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190450 2023-11-20 23:46:25,270 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10100, loss[loss=0.07239, simple_loss=0.09608, pruned_loss=0.01635, audio_tagging_loss=0.008005, over 15680.00 frames. ], tot_loss[loss=0.07648, simple_loss=0.09787, pruned_loss=0.01789, audio_tagging_loss=0.009653, over 3049681.16 frames. ], batch size: 58, lr: 4.26e-03, grad_scale: 16.0 2023-11-20 23:46:27,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1269693.3333333333, ans=0.125 2023-11-20 23:46:41,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-20 23:46:51,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1269826.6666666667, ans=0.2 2023-11-20 23:46:56,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.119e+01 8.749e+01 9.415e+01 1.269e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 23:47:04,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1269893.3333333333, ans=0.0 2023-11-20 23:47:15,148 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:47:22,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190500 2023-11-20 23:47:24,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1269960.0, ans=0.125 2023-11-20 23:47:28,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-20 23:47:29,427 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10150, loss[loss=0.07178, simple_loss=0.1012, pruned_loss=0.01266, audio_tagging_loss=0.008537, over 14501.00 frames. ], tot_loss[loss=0.07727, simple_loss=0.09898, pruned_loss=0.01807, audio_tagging_loss=0.009707, over 3051088.09 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:47:32,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1270026.6666666667, ans=0.0 2023-11-20 23:47:46,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1270093.3333333333, ans=0.125 2023-11-20 23:47:57,646 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:48:01,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1270160.0, ans=0.07 2023-11-20 23:48:25,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190550 2023-11-20 23:48:26,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1270293.3333333333, ans=0.125 2023-11-20 23:48:33,564 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10200, loss[loss=0.07192, simple_loss=0.09365, pruned_loss=0.01782, audio_tagging_loss=0.007274, over 15172.00 frames. ], tot_loss[loss=0.07702, simple_loss=0.09845, pruned_loss=0.01794, audio_tagging_loss=0.009862, over 3047581.00 frames. ], batch size: 58, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:48:43,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1270360.0, ans=0.1 2023-11-20 23:48:48,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1270426.6666666667, ans=0.125 2023-11-20 23:48:54,070 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 23:48:57,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1270493.3333333333, ans=0.125 2023-11-20 23:48:58,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1270493.3333333333, ans=0.05 2023-11-20 23:49:02,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.808e+01 8.167e+01 8.853e+01 9.625e+01 1.935e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-20 23:49:03,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1270493.3333333333, ans=0.125 2023-11-20 23:49:11,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-20 23:49:11,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1270560.0, ans=0.125 2023-11-20 23:49:28,629 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190600 2023-11-20 23:49:31,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1270626.6666666667, ans=0.025 2023-11-20 23:49:33,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1270626.6666666667, ans=0.0 2023-11-20 23:49:35,990 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10250, loss[loss=0.09144, simple_loss=0.127, pruned_loss=0.0197, audio_tagging_loss=0.008221, over 16638.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.09865, pruned_loss=0.01819, audio_tagging_loss=0.009936, over 3045195.03 frames. ], batch size: 59, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:49:41,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1270693.3333333333, ans=0.0 2023-11-20 23:49:57,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1270760.0, ans=0.04949747468305833 2023-11-20 23:50:01,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1270826.6666666667, ans=0.2 2023-11-20 23:50:11,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1270826.6666666667, ans=0.0 2023-11-20 23:50:27,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-11-20 23:50:28,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1270960.0, ans=0.125 2023-11-20 23:50:32,741 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190650 2023-11-20 23:50:39,974 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10300, loss[loss=0.07684, simple_loss=0.1064, pruned_loss=0.01555, audio_tagging_loss=0.008097, over 15054.00 frames. ], tot_loss[loss=0.07766, simple_loss=0.09915, pruned_loss=0.01825, audio_tagging_loss=0.009829, over 3043588.16 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:50:40,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1271026.6666666667, ans=0.0 2023-11-20 23:50:47,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1271026.6666666667, ans=15.0 2023-11-20 23:50:59,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1271093.3333333333, ans=0.125 2023-11-20 23:51:07,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1271160.0, ans=0.125 2023-11-20 23:51:11,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.201e+01 8.870e+01 9.848e+01 1.361e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 23:51:17,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1271226.6666666667, ans=0.125 2023-11-20 23:51:18,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1271226.6666666667, ans=0.0 2023-11-20 23:51:33,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-20 23:51:36,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190700 2023-11-20 23:51:39,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1271293.3333333333, ans=0.2 2023-11-20 23:51:41,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1271293.3333333333, ans=0.05 2023-11-20 23:51:44,188 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10350, loss[loss=0.08848, simple_loss=0.1116, pruned_loss=0.02386, audio_tagging_loss=0.008842, over 15071.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09938, pruned_loss=0.01828, audio_tagging_loss=0.009917, over 3045314.30 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 8.0 2023-11-20 23:51:44,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1271360.0, ans=0.2 2023-11-20 23:51:48,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1271360.0, ans=0.1 2023-11-20 23:51:49,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1271360.0, ans=0.025 2023-11-20 23:52:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1271560.0, ans=10.0 2023-11-20 23:52:39,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1271626.6666666667, ans=0.0 2023-11-20 23:52:40,396 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190750 2023-11-20 23:52:47,673 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10400, loss[loss=0.1041, simple_loss=0.1465, pruned_loss=0.02375, audio_tagging_loss=0.007053, over 15593.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.09931, pruned_loss=0.01825, audio_tagging_loss=0.01, over 3053699.84 frames. ], batch size: 54, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:53:12,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1271826.6666666667, ans=0.125 2023-11-20 23:53:18,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-11-20 23:53:18,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.195e+01 8.838e+01 9.520e+01 1.388e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 23:53:43,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190800 2023-11-20 23:53:50,995 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10450, loss[loss=0.1175, simple_loss=0.1575, pruned_loss=0.03194, audio_tagging_loss=0.006811, over 15033.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.09836, pruned_loss=0.01825, audio_tagging_loss=0.01007, over 3046944.77 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:53:59,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1272026.6666666667, ans=0.0 2023-11-20 23:54:01,069 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 23:54:05,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1272093.3333333333, ans=0.125 2023-11-20 23:54:10,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1272093.3333333333, ans=0.0 2023-11-20 23:54:21,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1272160.0, ans=0.125 2023-11-20 23:54:46,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190850 2023-11-20 23:54:51,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1272293.3333333333, ans=0.125 2023-11-20 23:54:53,834 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10500, loss[loss=0.07339, simple_loss=0.08546, pruned_loss=0.02123, audio_tagging_loss=0.009428, over 15020.00 frames. ], tot_loss[loss=0.07729, simple_loss=0.09826, pruned_loss=0.01827, audio_tagging_loss=0.009884, over 3041830.49 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:54:58,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1272360.0, ans=0.125 2023-11-20 23:55:05,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1272360.0, ans=0.0 2023-11-20 23:55:05,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-11-20 23:55:16,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1272426.6666666667, ans=0.125 2023-11-20 23:55:26,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.127e+01 8.997e+01 9.660e+01 1.233e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 23:55:31,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1272560.0, ans=0.0 2023-11-20 23:55:50,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190900 2023-11-20 23:55:57,800 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10550, loss[loss=0.063, simple_loss=0.08688, pruned_loss=0.01201, audio_tagging_loss=0.007549, over 16119.00 frames. ], tot_loss[loss=0.07697, simple_loss=0.09814, pruned_loss=0.01816, audio_tagging_loss=0.009741, over 3047486.75 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:56:10,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1272760.0, ans=0.1 2023-11-20 23:56:28,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1272826.6666666667, ans=0.125 2023-11-20 23:56:32,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1272826.6666666667, ans=0.125 2023-11-20 23:56:53,470 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 190950 2023-11-20 23:57:01,258 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10600, loss[loss=0.08668, simple_loss=0.1047, pruned_loss=0.01817, audio_tagging_loss=0.01615, over 14252.00 frames. ], tot_loss[loss=0.07694, simple_loss=0.09845, pruned_loss=0.01807, audio_tagging_loss=0.009645, over 3044817.87 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:57:04,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1273026.6666666667, ans=0.125 2023-11-20 23:57:07,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1273026.6666666667, ans=0.125 2023-11-20 23:57:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1273026.6666666667, ans=0.0 2023-11-20 23:57:32,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 7.871e+01 8.482e+01 9.357e+01 1.125e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-20 23:57:56,266 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191000 2023-11-20 23:58:03,701 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10650, loss[loss=0.08183, simple_loss=0.1063, pruned_loss=0.01838, audio_tagging_loss=0.0103, over 16217.00 frames. ], tot_loss[loss=0.07704, simple_loss=0.09875, pruned_loss=0.01803, audio_tagging_loss=0.009645, over 3040778.67 frames. ], batch size: 61, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:58:04,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2023-11-20 23:58:09,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.21 vs. limit=10.0 2023-11-20 23:58:21,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1273426.6666666667, ans=0.2 2023-11-20 23:59:00,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191050 2023-11-20 23:59:07,288 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10700, loss[loss=0.08767, simple_loss=0.1, pruned_loss=0.02073, audio_tagging_loss=0.01693, over 15938.00 frames. ], tot_loss[loss=0.07738, simple_loss=0.09943, pruned_loss=0.01813, audio_tagging_loss=0.009528, over 3044591.70 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 16.0 2023-11-20 23:59:26,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1273760.0, ans=0.0 2023-11-20 23:59:29,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-20 23:59:34,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1273826.6666666667, ans=0.125 2023-11-20 23:59:38,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 7.953e+01 8.635e+01 9.470e+01 1.499e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-21 00:00:00,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1273960.0, ans=0.125 2023-11-21 00:00:03,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191100 2023-11-21 00:00:10,865 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10750, loss[loss=0.08321, simple_loss=0.1125, pruned_loss=0.01952, audio_tagging_loss=0.007441, over 14824.00 frames. ], tot_loss[loss=0.07738, simple_loss=0.0995, pruned_loss=0.01819, audio_tagging_loss=0.009435, over 3038489.75 frames. ], batch size: 53, lr: 4.25e-03, grad_scale: 16.0 2023-11-21 00:00:12,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-21 00:00:13,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1274026.6666666667, ans=0.0 2023-11-21 00:00:15,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1274026.6666666667, ans=0.0 2023-11-21 00:00:27,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1274093.3333333333, ans=0.125 2023-11-21 00:00:47,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-21 00:00:56,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1274226.6666666667, ans=0.0 2023-11-21 00:01:06,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191150 2023-11-21 00:01:13,354 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10800, loss[loss=0.05866, simple_loss=0.07963, pruned_loss=0.00985, audio_tagging_loss=0.008995, over 15474.00 frames. ], tot_loss[loss=0.07734, simple_loss=0.0993, pruned_loss=0.01818, audio_tagging_loss=0.009509, over 3042209.28 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 32.0 2023-11-21 00:01:13,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1274360.0, ans=0.125 2023-11-21 00:01:19,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1274360.0, ans=0.95 2023-11-21 00:01:20,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1274360.0, ans=0.1 2023-11-21 00:01:30,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1274426.6666666667, ans=0.125 2023-11-21 00:01:30,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1274426.6666666667, ans=0.125 2023-11-21 00:01:40,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1274493.3333333333, ans=0.0 2023-11-21 00:01:44,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.243e+01 8.007e+01 8.718e+01 9.213e+01 1.138e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 00:02:01,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1274560.0, ans=0.125 2023-11-21 00:02:08,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191200 2023-11-21 00:02:16,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1274693.3333333333, ans=0.125 2023-11-21 00:02:17,044 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10850, loss[loss=0.0832, simple_loss=0.1057, pruned_loss=0.01756, audio_tagging_loss=0.01277, over 16232.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09855, pruned_loss=0.01806, audio_tagging_loss=0.009529, over 3040278.30 frames. ], batch size: 59, lr: 4.25e-03, grad_scale: 32.0 2023-11-21 00:02:19,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1274693.3333333333, ans=0.1 2023-11-21 00:02:20,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1274693.3333333333, ans=0.2 2023-11-21 00:02:24,695 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:02:40,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1274760.0, ans=0.1 2023-11-21 00:02:50,783 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:02:53,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1274826.6666666667, ans=0.125 2023-11-21 00:02:54,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-21 00:02:59,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2023-11-21 00:03:13,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191250 2023-11-21 00:03:15,972 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:03:21,483 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10900, loss[loss=0.1161, simple_loss=0.1555, pruned_loss=0.03082, audio_tagging_loss=0.007498, over 16036.00 frames. ], tot_loss[loss=0.07744, simple_loss=0.09915, pruned_loss=0.01818, audio_tagging_loss=0.009675, over 3052320.15 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 32.0 2023-11-21 00:03:52,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.158e+01 8.756e+01 9.674e+01 1.348e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 00:04:04,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1275226.6666666667, ans=0.07 2023-11-21 00:04:16,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191300 2023-11-21 00:04:23,769 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 10950, loss[loss=0.09175, simple_loss=0.1203, pruned_loss=0.02382, audio_tagging_loss=0.007764, over 15873.00 frames. ], tot_loss[loss=0.07705, simple_loss=0.09851, pruned_loss=0.018, audio_tagging_loss=0.009799, over 3044610.31 frames. ], batch size: 59, lr: 4.25e-03, grad_scale: 32.0 2023-11-21 00:04:27,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1275360.0, ans=0.0 2023-11-21 00:05:19,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191350 2023-11-21 00:05:27,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1275693.3333333333, ans=0.1 2023-11-21 00:05:27,995 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11000, loss[loss=0.09298, simple_loss=0.1232, pruned_loss=0.02226, audio_tagging_loss=0.00914, over 15560.00 frames. ], tot_loss[loss=0.07673, simple_loss=0.09775, pruned_loss=0.01796, audio_tagging_loss=0.009888, over 3047231.49 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 32.0 2023-11-21 00:05:32,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1275693.3333333333, ans=0.125 2023-11-21 00:05:36,713 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:05:59,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.401e+01 8.037e+01 8.729e+01 9.395e+01 1.274e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 00:06:02,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1275826.6666666667, ans=0.0 2023-11-21 00:06:12,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-21 00:06:13,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1275893.3333333333, ans=0.125 2023-11-21 00:06:15,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1275893.3333333333, ans=0.025 2023-11-21 00:06:24,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191400 2023-11-21 00:06:27,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1275960.0, ans=0.125 2023-11-21 00:06:29,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2023-11-21 00:06:32,584 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11050, loss[loss=0.05755, simple_loss=0.06953, pruned_loss=0.00928, audio_tagging_loss=0.01351, over 15273.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09839, pruned_loss=0.0179, audio_tagging_loss=0.01, over 3061768.42 frames. ], batch size: 59, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:06:53,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1276093.3333333333, ans=0.0 2023-11-21 00:07:17,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1276226.6666666667, ans=0.035 2023-11-21 00:07:29,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191450 2023-11-21 00:07:30,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1276293.3333333333, ans=0.2 2023-11-21 00:07:30,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1276293.3333333333, ans=0.125 2023-11-21 00:07:33,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1276293.3333333333, ans=0.0 2023-11-21 00:07:36,881 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11100, loss[loss=0.09159, simple_loss=0.1202, pruned_loss=0.02352, audio_tagging_loss=0.007948, over 15551.00 frames. ], tot_loss[loss=0.07726, simple_loss=0.09842, pruned_loss=0.01802, audio_tagging_loss=0.01002, over 3054257.83 frames. ], batch size: 58, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:07:38,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:07:39,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-11-21 00:07:43,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1276360.0, ans=0.0 2023-11-21 00:08:09,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.351e+01 9.133e+01 9.830e+01 1.252e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-21 00:08:32,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191500 2023-11-21 00:08:34,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-21 00:08:39,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1276693.3333333333, ans=0.2 2023-11-21 00:08:40,575 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11150, loss[loss=0.064, simple_loss=0.08209, pruned_loss=0.01085, audio_tagging_loss=0.01211, over 14903.00 frames. ], tot_loss[loss=0.07802, simple_loss=0.09927, pruned_loss=0.01828, audio_tagging_loss=0.0101, over 3046689.29 frames. ], batch size: 59, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:08:47,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1276693.3333333333, ans=0.1 2023-11-21 00:08:52,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1276760.0, ans=0.125 2023-11-21 00:09:04,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1276826.6666666667, ans=0.0 2023-11-21 00:09:12,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-21 00:09:37,022 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191550 2023-11-21 00:09:40,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2023-11-21 00:09:44,274 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11200, loss[loss=0.09199, simple_loss=0.118, pruned_loss=0.02304, audio_tagging_loss=0.009944, over 15449.00 frames. ], tot_loss[loss=0.07783, simple_loss=0.09877, pruned_loss=0.01822, audio_tagging_loss=0.01022, over 3049542.69 frames. ], batch size: 58, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:09:45,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1277026.6666666667, ans=0.0 2023-11-21 00:10:09,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:10:18,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.318e+01 8.964e+01 9.909e+01 1.504e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-21 00:10:24,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1277226.6666666667, ans=0.035 2023-11-21 00:10:41,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191600 2023-11-21 00:10:44,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1277293.3333333333, ans=0.2 2023-11-21 00:10:48,726 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11250, loss[loss=0.06612, simple_loss=0.07743, pruned_loss=0.01591, audio_tagging_loss=0.01149, over 16379.00 frames. ], tot_loss[loss=0.07738, simple_loss=0.09843, pruned_loss=0.01808, audio_tagging_loss=0.01009, over 3050776.33 frames. ], batch size: 63, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:10:55,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1277360.0, ans=0.0 2023-11-21 00:10:58,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1277360.0, ans=0.2 2023-11-21 00:11:10,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277426.6666666667, ans=0.1 2023-11-21 00:11:17,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-21 00:11:20,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1277493.3333333333, ans=0.125 2023-11-21 00:11:37,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-21 00:11:46,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191650 2023-11-21 00:11:48,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1277626.6666666667, ans=0.0 2023-11-21 00:11:53,463 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11300, loss[loss=0.08113, simple_loss=0.1106, pruned_loss=0.01914, audio_tagging_loss=0.006679, over 15606.00 frames. ], tot_loss[loss=0.07767, simple_loss=0.09898, pruned_loss=0.0183, audio_tagging_loss=0.009888, over 3047566.18 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:11:59,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1277693.3333333333, ans=0.125 2023-11-21 00:12:01,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-11-21 00:12:26,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.170e+01 7.843e+01 8.601e+01 9.135e+01 1.178e+02, threshold=1.720e+02, percent-clipped=0.0 2023-11-21 00:12:37,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1277893.3333333333, ans=0.04949747468305833 2023-11-21 00:12:45,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1277960.0, ans=0.125 2023-11-21 00:12:50,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191700 2023-11-21 00:12:52,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1277960.0, ans=0.0 2023-11-21 00:12:56,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-21 00:12:57,876 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11350, loss[loss=0.079, simple_loss=0.1038, pruned_loss=0.02005, audio_tagging_loss=0.007027, over 15323.00 frames. ], tot_loss[loss=0.07725, simple_loss=0.09854, pruned_loss=0.01821, audio_tagging_loss=0.009772, over 3041823.14 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:13:53,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191750 2023-11-21 00:14:00,781 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11400, loss[loss=0.0887, simple_loss=0.1127, pruned_loss=0.02503, audio_tagging_loss=0.007334, over 15536.00 frames. ], tot_loss[loss=0.07709, simple_loss=0.09855, pruned_loss=0.01814, audio_tagging_loss=0.009668, over 3041953.54 frames. ], batch size: 55, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:14:35,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.048e+01 8.638e+01 9.436e+01 1.129e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-21 00:14:57,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191800 2023-11-21 00:14:58,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1278626.6666666667, ans=0.0 2023-11-21 00:15:05,468 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11450, loss[loss=0.08177, simple_loss=0.09754, pruned_loss=0.02512, audio_tagging_loss=0.00788, over 14787.00 frames. ], tot_loss[loss=0.07677, simple_loss=0.09786, pruned_loss=0.0182, audio_tagging_loss=0.009647, over 3040547.81 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:15:06,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1278693.3333333333, ans=0.0 2023-11-21 00:15:12,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1278693.3333333333, ans=0.1 2023-11-21 00:15:26,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1278760.0, ans=0.0 2023-11-21 00:15:27,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1278760.0, ans=0.0 2023-11-21 00:15:28,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1278760.0, ans=0.1 2023-11-21 00:15:33,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1278826.6666666667, ans=0.0 2023-11-21 00:15:47,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1278893.3333333333, ans=0.025 2023-11-21 00:16:02,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191850 2023-11-21 00:16:09,432 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11500, loss[loss=0.086, simple_loss=0.1066, pruned_loss=0.02211, audio_tagging_loss=0.01061, over 15168.00 frames. ], tot_loss[loss=0.0773, simple_loss=0.09865, pruned_loss=0.01835, audio_tagging_loss=0.009622, over 3040630.32 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:16:11,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=22.5 2023-11-21 00:16:14,457 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:16:31,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1279093.3333333333, ans=0.125 2023-11-21 00:16:42,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 7.974e+01 8.630e+01 9.256e+01 1.145e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-21 00:16:44,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1279160.0, ans=0.125 2023-11-21 00:16:48,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2023-11-21 00:17:05,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191900 2023-11-21 00:17:13,187 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11550, loss[loss=0.08275, simple_loss=0.1071, pruned_loss=0.01942, audio_tagging_loss=0.009799, over 15246.00 frames. ], tot_loss[loss=0.07678, simple_loss=0.0981, pruned_loss=0.01805, audio_tagging_loss=0.009681, over 3053879.38 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 16.0 2023-11-21 00:17:13,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1279360.0, ans=0.0 2023-11-21 00:17:24,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1279426.6666666667, ans=0.025 2023-11-21 00:17:25,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1279426.6666666667, ans=0.0 2023-11-21 00:17:43,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1279493.3333333333, ans=0.125 2023-11-21 00:17:45,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1279493.3333333333, ans=0.0 2023-11-21 00:17:49,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2023-11-21 00:17:51,077 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:17:51,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1279560.0, ans=0.125 2023-11-21 00:18:08,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 191950 2023-11-21 00:18:11,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-11-21 00:18:16,012 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11600, loss[loss=0.04206, simple_loss=0.04329, pruned_loss=0.005318, audio_tagging_loss=0.0151, over 14102.00 frames. ], tot_loss[loss=0.07736, simple_loss=0.09895, pruned_loss=0.01822, audio_tagging_loss=0.009668, over 3058431.32 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:18:31,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1279760.0, ans=0.125 2023-11-21 00:18:36,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2023-11-21 00:18:51,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.133e+01 8.974e+01 9.770e+01 1.687e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-21 00:18:53,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1279893.3333333333, ans=0.125 2023-11-21 00:19:08,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1279960.0, ans=0.0 2023-11-21 00:19:13,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192000 2023-11-21 00:19:24,872 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11650, loss[loss=0.07047, simple_loss=0.09291, pruned_loss=0.01323, audio_tagging_loss=0.01079, over 16167.00 frames. ], tot_loss[loss=0.07714, simple_loss=0.09851, pruned_loss=0.0182, audio_tagging_loss=0.009684, over 3055028.85 frames. ], batch size: 59, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:19:32,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1280026.6666666667, ans=0.2 2023-11-21 00:19:36,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1280093.3333333333, ans=0.1 2023-11-21 00:19:41,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1280093.3333333333, ans=0.125 2023-11-21 00:20:21,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192050 2023-11-21 00:20:28,809 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11700, loss[loss=0.07626, simple_loss=0.08886, pruned_loss=0.02024, audio_tagging_loss=0.01159, over 14038.00 frames. ], tot_loss[loss=0.07734, simple_loss=0.09876, pruned_loss=0.01824, audio_tagging_loss=0.009714, over 3054771.84 frames. ], batch size: 53, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:20:30,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-21 00:20:50,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1280426.6666666667, ans=0.125 2023-11-21 00:20:56,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2023-11-21 00:21:03,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.317e+01 8.852e+01 9.641e+01 1.228e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 00:21:03,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1280493.3333333333, ans=0.0 2023-11-21 00:21:24,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192100 2023-11-21 00:21:28,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1280626.6666666667, ans=0.125 2023-11-21 00:21:32,255 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11750, loss[loss=0.08115, simple_loss=0.1019, pruned_loss=0.02243, audio_tagging_loss=0.007768, over 15384.00 frames. ], tot_loss[loss=0.077, simple_loss=0.09815, pruned_loss=0.01811, audio_tagging_loss=0.009812, over 3050877.18 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:21:37,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1280693.3333333333, ans=0.07 2023-11-21 00:22:02,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2023-11-21 00:22:27,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192150 2023-11-21 00:22:34,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1281026.6666666667, ans=0.0 2023-11-21 00:22:35,291 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11800, loss[loss=0.048, simple_loss=0.05497, pruned_loss=0.009887, audio_tagging_loss=0.01062, over 14843.00 frames. ], tot_loss[loss=0.07717, simple_loss=0.09835, pruned_loss=0.01818, audio_tagging_loss=0.009821, over 3045857.13 frames. ], batch size: 57, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:22:45,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-21 00:22:46,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1281026.6666666667, ans=0.0 2023-11-21 00:23:00,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1281160.0, ans=0.125 2023-11-21 00:23:09,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 8.064e+01 8.621e+01 9.481e+01 1.242e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-21 00:23:18,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1281226.6666666667, ans=0.0 2023-11-21 00:23:31,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192200 2023-11-21 00:23:34,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281293.3333333333, ans=0.1 2023-11-21 00:23:36,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2023-11-21 00:23:39,827 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11850, loss[loss=0.06653, simple_loss=0.0857, pruned_loss=0.01492, audio_tagging_loss=0.008762, over 14699.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09818, pruned_loss=0.01816, audio_tagging_loss=0.009842, over 3049436.57 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:23:40,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1281360.0, ans=0.2 2023-11-21 00:23:41,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1281360.0, ans=0.125 2023-11-21 00:23:58,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1281426.6666666667, ans=0.125 2023-11-21 00:24:23,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1281560.0, ans=0.2 2023-11-21 00:24:35,086 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192250 2023-11-21 00:24:42,290 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11900, loss[loss=0.09356, simple_loss=0.12, pruned_loss=0.02681, audio_tagging_loss=0.006753, over 15282.00 frames. ], tot_loss[loss=0.07726, simple_loss=0.09827, pruned_loss=0.01816, audio_tagging_loss=0.009959, over 3054152.36 frames. ], batch size: 57, lr: 4.24e-03, grad_scale: 32.0 2023-11-21 00:24:54,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-21 00:25:04,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2023-11-21 00:25:13,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-21 00:25:17,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.295e+01 9.404e+01 1.030e+02 1.674e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-21 00:25:18,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1281826.6666666667, ans=0.125 2023-11-21 00:25:21,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-21 00:25:36,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-21 00:25:37,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192300 2023-11-21 00:25:45,982 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 11950, loss[loss=0.06546, simple_loss=0.0809, pruned_loss=0.01513, audio_tagging_loss=0.00988, over 15458.00 frames. ], tot_loss[loss=0.07694, simple_loss=0.09804, pruned_loss=0.01789, audio_tagging_loss=0.01004, over 3056227.61 frames. ], batch size: 59, lr: 4.23e-03, grad_scale: 32.0 2023-11-21 00:26:07,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-21 00:26:26,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1282226.6666666667, ans=0.125 2023-11-21 00:26:27,983 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:26:27,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1282226.6666666667, ans=0.125 2023-11-21 00:26:33,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1282293.3333333333, ans=0.0 2023-11-21 00:26:38,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1282293.3333333333, ans=0.0 2023-11-21 00:26:39,413 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192350 2023-11-21 00:26:46,466 INFO [train_asr.py:1221] (1/4) Epoch 16, batch 12000, loss[loss=0.1042, simple_loss=0.1306, pruned_loss=0.02858, audio_tagging_loss=0.01035, over 15826.00 frames. ], tot_loss[loss=0.07664, simple_loss=0.09726, pruned_loss=0.01781, audio_tagging_loss=0.0102, over 3051480.21 frames. ], batch size: 59, lr: 4.23e-03, grad_scale: 32.0 2023-11-21 00:26:46,467 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 00:27:15,082 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1159, 2.2589, 5.0327, 2.5184], device='cuda:1') 2023-11-21 00:27:30,235 INFO [train_asr.py:1253] (1/4) Epoch 16, validation: loss=0.06114, simple_loss=0.05299, pruned_loss=0.005583, audio_tagging_loss=0.02906, over 4681554.00 frames. 2023-11-21 00:27:30,236 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 00:27:38,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1282360.0, ans=0.1 2023-11-21 00:27:41,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1282426.6666666667, ans=0.125 2023-11-21 00:27:45,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1282426.6666666667, ans=0.07 2023-11-21 00:28:33,016 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 0, loss[loss=0.1006, simple_loss=0.1203, pruned_loss=0.02211, audio_tagging_loss=0.01837, over 15617.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1203, pruned_loss=0.02211, audio_tagging_loss=0.01837, over 15617.00 frames. ], batch size: 56, lr: 4.11e-03, grad_scale: 32.0 2023-11-21 00:28:33,017 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 00:28:55,009 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9841, 2.2982, 4.6987, 2.7861], device='cuda:1') 2023-11-21 00:29:03,477 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8083, 4.9078, 4.9145, 4.8846], device='cuda:1') 2023-11-21 00:29:12,186 INFO [train_asr.py:1253] (1/4) Epoch 17, validation: loss=0.06074, simple_loss=0.05295, pruned_loss=0.005487, audio_tagging_loss=0.02878, over 4681554.00 frames. 2023-11-21 00:29:12,187 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 00:29:18,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.043e+01 8.765e+01 9.548e+01 1.252e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 00:29:24,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1282580.0, ans=0.0 2023-11-21 00:29:39,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192400 2023-11-21 00:30:05,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282780.0, ans=0.125 2023-11-21 00:30:16,439 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 50, loss[loss=0.08558, simple_loss=0.0969, pruned_loss=0.01798, audio_tagging_loss=0.01915, over 15372.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.09234, pruned_loss=0.01633, audio_tagging_loss=0.01956, over 689288.65 frames. ], batch size: 57, lr: 4.11e-03, grad_scale: 32.0 2023-11-21 00:30:16,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1282846.6666666667, ans=0.0 2023-11-21 00:30:22,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1282846.6666666667, ans=0.0 2023-11-21 00:30:29,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282913.3333333333, ans=0.125 2023-11-21 00:30:42,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192450 2023-11-21 00:31:18,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1283180.0, ans=0.125 2023-11-21 00:31:20,482 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 100, loss[loss=0.07649, simple_loss=0.08049, pruned_loss=0.01947, audio_tagging_loss=0.01678, over 16485.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.09589, pruned_loss=0.01745, audio_tagging_loss=0.01834, over 1205346.55 frames. ], batch size: 64, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:31:27,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.625e+01 9.421e+01 9.913e+01 1.432e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-21 00:31:47,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192500 2023-11-21 00:31:56,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1283313.3333333333, ans=0.125 2023-11-21 00:31:58,896 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:32:06,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-11-21 00:32:24,259 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 150, loss[loss=0.08058, simple_loss=0.1067, pruned_loss=0.01655, audio_tagging_loss=0.01066, over 16322.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.09815, pruned_loss=0.01785, audio_tagging_loss=0.01614, over 1611342.53 frames. ], batch size: 60, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:32:40,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-21 00:32:42,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1283580.0, ans=0.125 2023-11-21 00:32:43,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-21 00:32:51,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192550 2023-11-21 00:32:54,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1283646.6666666667, ans=0.125 2023-11-21 00:33:02,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1283713.3333333333, ans=0.2 2023-11-21 00:33:02,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1283713.3333333333, ans=0.125 2023-11-21 00:33:12,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1283713.3333333333, ans=0.07 2023-11-21 00:33:22,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-11-21 00:33:25,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1283780.0, ans=0.125 2023-11-21 00:33:27,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1283780.0, ans=0.95 2023-11-21 00:33:29,839 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 200, loss[loss=0.07329, simple_loss=0.09081, pruned_loss=0.01389, audio_tagging_loss=0.014, over 14688.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.09706, pruned_loss=0.01727, audio_tagging_loss=0.01429, over 1941412.30 frames. ], batch size: 55, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:33:37,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.748e+01 8.165e+01 8.890e+01 9.863e+01 2.020e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-21 00:33:37,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1283846.6666666667, ans=0.04949747468305833 2023-11-21 00:33:55,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1283980.0, ans=0.125 2023-11-21 00:33:56,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192600 2023-11-21 00:33:58,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1283980.0, ans=0.0 2023-11-21 00:34:01,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=12.0 2023-11-21 00:34:17,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1284046.6666666667, ans=0.1 2023-11-21 00:34:21,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1284113.3333333333, ans=0.2 2023-11-21 00:34:33,592 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 250, loss[loss=0.09543, simple_loss=0.122, pruned_loss=0.02737, audio_tagging_loss=0.007042, over 16039.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09792, pruned_loss=0.01758, audio_tagging_loss=0.01282, over 2183909.49 frames. ], batch size: 60, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:34:47,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1284246.6666666667, ans=0.125 2023-11-21 00:35:00,432 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192650 2023-11-21 00:35:26,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1284446.6666666667, ans=0.125 2023-11-21 00:35:27,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1284446.6666666667, ans=0.5 2023-11-21 00:35:37,462 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 300, loss[loss=0.06463, simple_loss=0.07951, pruned_loss=0.01625, audio_tagging_loss=0.00862, over 15304.00 frames. ], tot_loss[loss=0.0789, simple_loss=0.09813, pruned_loss=0.01795, audio_tagging_loss=0.01189, over 2379120.70 frames. ], batch size: 58, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:35:45,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.137e+01 8.919e+01 9.657e+01 1.268e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-21 00:36:03,711 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192700 2023-11-21 00:36:17,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-21 00:36:30,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1284780.0, ans=0.125 2023-11-21 00:36:35,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-21 00:36:40,660 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 350, loss[loss=0.05854, simple_loss=0.07065, pruned_loss=0.01197, audio_tagging_loss=0.01124, over 14330.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09933, pruned_loss=0.0183, audio_tagging_loss=0.01123, over 2528790.08 frames. ], batch size: 58, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:36:42,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1284846.6666666667, ans=0.2 2023-11-21 00:36:52,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-21 00:37:07,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192750 2023-11-21 00:37:10,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1284980.0, ans=0.125 2023-11-21 00:37:20,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1285046.6666666667, ans=0.125 2023-11-21 00:37:28,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-21 00:37:38,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1285113.3333333333, ans=0.2 2023-11-21 00:37:39,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1285113.3333333333, ans=0.125 2023-11-21 00:37:44,022 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 400, loss[loss=0.07779, simple_loss=0.1035, pruned_loss=0.01664, audio_tagging_loss=0.009417, over 16350.00 frames. ], tot_loss[loss=0.07889, simple_loss=0.09961, pruned_loss=0.01826, audio_tagging_loss=0.01082, over 2647478.37 frames. ], batch size: 60, lr: 4.10e-03, grad_scale: 32.0 2023-11-21 00:37:51,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.176e+01 8.863e+01 9.795e+01 2.108e+02, threshold=1.773e+02, percent-clipped=1.0 2023-11-21 00:37:59,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1285246.6666666667, ans=0.125 2023-11-21 00:38:11,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192800 2023-11-21 00:38:22,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1285380.0, ans=0.125 2023-11-21 00:38:41,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1285446.6666666667, ans=0.125 2023-11-21 00:38:47,674 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 450, loss[loss=0.08344, simple_loss=0.1175, pruned_loss=0.0167, audio_tagging_loss=0.007971, over 15555.00 frames. ], tot_loss[loss=0.07833, simple_loss=0.09956, pruned_loss=0.0181, audio_tagging_loss=0.01045, over 2741003.62 frames. ], batch size: 57, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:38:56,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1285513.3333333333, ans=0.0 2023-11-21 00:39:14,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192850 2023-11-21 00:39:36,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1285713.3333333333, ans=0.05 2023-11-21 00:39:50,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1285846.6666666667, ans=0.125 2023-11-21 00:39:50,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1285846.6666666667, ans=0.04949747468305833 2023-11-21 00:39:51,818 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 500, loss[loss=0.1041, simple_loss=0.1306, pruned_loss=0.02865, audio_tagging_loss=0.01019, over 16285.00 frames. ], tot_loss[loss=0.07806, simple_loss=0.09941, pruned_loss=0.01808, audio_tagging_loss=0.01027, over 2809569.64 frames. ], batch size: 58, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:39:54,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1285846.6666666667, ans=0.125 2023-11-21 00:40:00,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 7.999e+01 8.766e+01 9.500e+01 1.232e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 00:40:12,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1285913.3333333333, ans=0.125 2023-11-21 00:40:18,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192900 2023-11-21 00:40:41,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1286113.3333333333, ans=0.125 2023-11-21 00:40:50,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1286113.3333333333, ans=0.125 2023-11-21 00:40:55,088 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 550, loss[loss=0.06449, simple_loss=0.08417, pruned_loss=0.01346, audio_tagging_loss=0.008953, over 13824.00 frames. ], tot_loss[loss=0.07793, simple_loss=0.09914, pruned_loss=0.01821, audio_tagging_loss=0.01015, over 2856978.26 frames. ], batch size: 53, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:41:02,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1286180.0, ans=0.2 2023-11-21 00:41:22,237 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 192950 2023-11-21 00:41:22,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1286313.3333333333, ans=0.1 2023-11-21 00:41:23,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-21 00:41:56,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1286446.6666666667, ans=0.0 2023-11-21 00:41:58,725 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 600, loss[loss=0.07375, simple_loss=0.1041, pruned_loss=0.01182, audio_tagging_loss=0.0099, over 13413.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09718, pruned_loss=0.01769, audio_tagging_loss=0.01013, over 2896739.59 frames. ], batch size: 53, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:42:07,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 7.723e+01 8.573e+01 9.272e+01 1.267e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-21 00:42:24,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193000 2023-11-21 00:42:36,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1286713.3333333333, ans=0.125 2023-11-21 00:42:51,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1286780.0, ans=0.2 2023-11-21 00:43:02,787 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 650, loss[loss=0.0838, simple_loss=0.1097, pruned_loss=0.01756, audio_tagging_loss=0.0114, over 15151.00 frames. ], tot_loss[loss=0.07751, simple_loss=0.09886, pruned_loss=0.018, audio_tagging_loss=0.01008, over 2935997.94 frames. ], batch size: 59, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:43:22,793 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:43:29,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193050 2023-11-21 00:43:35,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1286980.0, ans=0.125 2023-11-21 00:43:44,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-21 00:43:45,402 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:43:55,500 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:44:01,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-21 00:44:05,983 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 700, loss[loss=0.05523, simple_loss=0.07042, pruned_loss=0.008892, audio_tagging_loss=0.01113, over 15121.00 frames. ], tot_loss[loss=0.07714, simple_loss=0.09837, pruned_loss=0.01798, audio_tagging_loss=0.00998, over 2962282.69 frames. ], batch size: 58, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:44:06,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1287180.0, ans=0.0 2023-11-21 00:44:11,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1287180.0, ans=0.125 2023-11-21 00:44:15,335 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.536e+01 7.999e+01 8.466e+01 9.113e+01 1.191e+02, threshold=1.693e+02, percent-clipped=0.0 2023-11-21 00:44:29,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2023-11-21 00:44:32,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193100 2023-11-21 00:44:33,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1287313.3333333333, ans=0.1 2023-11-21 00:45:09,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1287513.3333333333, ans=0.125 2023-11-21 00:45:10,271 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 750, loss[loss=0.09374, simple_loss=0.1285, pruned_loss=0.02307, audio_tagging_loss=0.0064, over 15184.00 frames. ], tot_loss[loss=0.07742, simple_loss=0.09884, pruned_loss=0.01804, audio_tagging_loss=0.009956, over 2980106.08 frames. ], batch size: 55, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:45:15,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1287513.3333333333, ans=0.125 2023-11-21 00:45:33,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-11-21 00:45:37,837 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193150 2023-11-21 00:45:58,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1287713.3333333333, ans=0.1 2023-11-21 00:45:59,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1287713.3333333333, ans=0.5 2023-11-21 00:46:02,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1287780.0, ans=0.125 2023-11-21 00:46:14,601 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 800, loss[loss=0.06935, simple_loss=0.08849, pruned_loss=0.01381, audio_tagging_loss=0.01129, over 16351.00 frames. ], tot_loss[loss=0.07748, simple_loss=0.09884, pruned_loss=0.01808, audio_tagging_loss=0.009985, over 3000363.95 frames. ], batch size: 62, lr: 4.10e-03, grad_scale: 32.0 2023-11-21 00:46:24,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 8.140e+01 8.941e+01 9.385e+01 1.183e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-21 00:46:29,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1287913.3333333333, ans=0.125 2023-11-21 00:46:33,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1287913.3333333333, ans=0.1 2023-11-21 00:46:42,127 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193200 2023-11-21 00:46:49,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1287980.0, ans=0.125 2023-11-21 00:47:15,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1288113.3333333333, ans=0.125 2023-11-21 00:47:19,580 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 850, loss[loss=0.05651, simple_loss=0.06472, pruned_loss=0.01347, audio_tagging_loss=0.01069, over 14290.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09839, pruned_loss=0.01787, audio_tagging_loss=0.01008, over 3019468.66 frames. ], batch size: 57, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:47:25,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1288180.0, ans=0.125 2023-11-21 00:47:36,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1288246.6666666667, ans=0.1 2023-11-21 00:47:45,990 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193250 2023-11-21 00:47:54,589 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:47:58,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1288380.0, ans=0.125 2023-11-21 00:47:59,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1288380.0, ans=0.0 2023-11-21 00:48:07,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1288380.0, ans=0.125 2023-11-21 00:48:13,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1288446.6666666667, ans=0.125 2023-11-21 00:48:18,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1288446.6666666667, ans=0.1 2023-11-21 00:48:23,674 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 900, loss[loss=0.08751, simple_loss=0.1221, pruned_loss=0.01923, audio_tagging_loss=0.007242, over 16396.00 frames. ], tot_loss[loss=0.07752, simple_loss=0.09872, pruned_loss=0.01802, audio_tagging_loss=0.01014, over 3028910.64 frames. ], batch size: 59, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:48:33,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.920e+01 8.040e+01 8.568e+01 9.329e+01 1.303e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 00:48:50,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193300 2023-11-21 00:49:01,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1288713.3333333333, ans=0.125 2023-11-21 00:49:27,454 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 950, loss[loss=0.1056, simple_loss=0.137, pruned_loss=0.02674, audio_tagging_loss=0.01037, over 16058.00 frames. ], tot_loss[loss=0.07794, simple_loss=0.09907, pruned_loss=0.01826, audio_tagging_loss=0.01014, over 3028542.45 frames. ], batch size: 55, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:49:41,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1288913.3333333333, ans=0.125 2023-11-21 00:49:55,830 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193350 2023-11-21 00:50:04,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1288980.0, ans=0.125 2023-11-21 00:50:18,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1289113.3333333333, ans=0.125 2023-11-21 00:50:25,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1289113.3333333333, ans=0.0 2023-11-21 00:50:32,431 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1000, loss[loss=0.08725, simple_loss=0.1127, pruned_loss=0.02121, audio_tagging_loss=0.009704, over 14928.00 frames. ], tot_loss[loss=0.0774, simple_loss=0.0986, pruned_loss=0.01815, audio_tagging_loss=0.009957, over 3024345.93 frames. ], batch size: 54, lr: 4.10e-03, grad_scale: 16.0 2023-11-21 00:50:34,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-21 00:50:42,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.002e+01 8.372e+01 8.934e+01 9.683e+01 2.784e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-21 00:50:56,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1289246.6666666667, ans=0.125 2023-11-21 00:50:59,504 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:50:59,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193400 2023-11-21 00:51:02,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1289313.3333333333, ans=0.0 2023-11-21 00:51:05,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1289313.3333333333, ans=0.0 2023-11-21 00:51:24,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-11-21 00:51:33,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1289446.6666666667, ans=0.2 2023-11-21 00:51:38,121 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1050, loss[loss=0.1041, simple_loss=0.1264, pruned_loss=0.03119, audio_tagging_loss=0.009653, over 15206.00 frames. ], tot_loss[loss=0.07736, simple_loss=0.09852, pruned_loss=0.01822, audio_tagging_loss=0.009882, over 3022065.42 frames. ], batch size: 56, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 00:51:53,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:51:55,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1289580.0, ans=0.125 2023-11-21 00:52:01,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1289580.0, ans=0.125 2023-11-21 00:52:05,429 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193450 2023-11-21 00:52:25,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1289713.3333333333, ans=0.125 2023-11-21 00:52:36,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1289780.0, ans=0.0 2023-11-21 00:52:39,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1289780.0, ans=0.125 2023-11-21 00:52:41,795 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1100, loss[loss=0.07903, simple_loss=0.09981, pruned_loss=0.01981, audio_tagging_loss=0.009317, over 15381.00 frames. ], tot_loss[loss=0.07719, simple_loss=0.09866, pruned_loss=0.01814, audio_tagging_loss=0.009727, over 3028644.45 frames. ], batch size: 56, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 00:52:44,417 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:52:52,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.642e+01 8.168e+01 8.741e+01 9.235e+01 1.137e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 00:52:56,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1289913.3333333333, ans=0.2 2023-11-21 00:52:57,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-21 00:53:09,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193500 2023-11-21 00:53:09,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1289980.0, ans=0.0 2023-11-21 00:53:11,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1289980.0, ans=0.125 2023-11-21 00:53:20,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1290046.6666666667, ans=0.125 2023-11-21 00:53:33,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1290113.3333333333, ans=0.0 2023-11-21 00:53:33,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1290113.3333333333, ans=0.125 2023-11-21 00:53:38,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1290113.3333333333, ans=0.125 2023-11-21 00:53:46,392 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1150, loss[loss=0.07349, simple_loss=0.08979, pruned_loss=0.01717, audio_tagging_loss=0.01142, over 15755.00 frames. ], tot_loss[loss=0.07697, simple_loss=0.09857, pruned_loss=0.01796, audio_tagging_loss=0.009723, over 3029969.78 frames. ], batch size: 58, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 00:53:59,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-21 00:54:11,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1290313.3333333333, ans=0.125 2023-11-21 00:54:13,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193550 2023-11-21 00:54:24,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1290380.0, ans=0.2 2023-11-21 00:54:37,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-21 00:54:51,244 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1200, loss[loss=0.07279, simple_loss=0.09152, pruned_loss=0.01762, audio_tagging_loss=0.009417, over 15559.00 frames. ], tot_loss[loss=0.0768, simple_loss=0.09851, pruned_loss=0.01795, audio_tagging_loss=0.009596, over 3026624.30 frames. ], batch size: 57, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 00:55:00,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 7.968e+01 8.692e+01 9.308e+01 1.785e+02, threshold=1.738e+02, percent-clipped=1.0 2023-11-21 00:55:04,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-21 00:55:18,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193600 2023-11-21 00:55:39,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2023-11-21 00:55:44,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1290780.0, ans=0.5 2023-11-21 00:55:55,025 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1250, loss[loss=0.07094, simple_loss=0.08616, pruned_loss=0.01768, audio_tagging_loss=0.01018, over 14740.00 frames. ], tot_loss[loss=0.07649, simple_loss=0.09815, pruned_loss=0.01791, audio_tagging_loss=0.009504, over 3033587.04 frames. ], batch size: 56, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 00:55:56,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 00:56:13,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1290913.3333333333, ans=0.0 2023-11-21 00:56:22,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193650 2023-11-21 00:56:26,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1290980.0, ans=0.125 2023-11-21 00:56:41,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1291046.6666666667, ans=0.2 2023-11-21 00:56:49,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-21 00:56:59,676 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1300, loss[loss=0.06551, simple_loss=0.08447, pruned_loss=0.01367, audio_tagging_loss=0.009611, over 15833.00 frames. ], tot_loss[loss=0.07577, simple_loss=0.09717, pruned_loss=0.01763, audio_tagging_loss=0.009555, over 3031020.70 frames. ], batch size: 58, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 00:57:09,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.443e+01 7.829e+01 8.638e+01 9.326e+01 1.163e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-21 00:57:15,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1291246.6666666667, ans=0.125 2023-11-21 00:57:19,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1291246.6666666667, ans=0.125 2023-11-21 00:57:23,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1291246.6666666667, ans=0.125 2023-11-21 00:57:26,911 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193700 2023-11-21 00:57:45,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1291380.0, ans=0.1 2023-11-21 00:58:03,401 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1350, loss[loss=0.07101, simple_loss=0.08397, pruned_loss=0.01572, audio_tagging_loss=0.0133, over 16877.00 frames. ], tot_loss[loss=0.07603, simple_loss=0.09777, pruned_loss=0.01763, audio_tagging_loss=0.00951, over 3031393.70 frames. ], batch size: 64, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 00:58:04,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1291513.3333333333, ans=0.125 2023-11-21 00:58:30,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193750 2023-11-21 00:58:49,292 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 00:59:05,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1291780.0, ans=0.95 2023-11-21 00:59:07,687 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1400, loss[loss=0.06039, simple_loss=0.07362, pruned_loss=0.01121, audio_tagging_loss=0.01236, over 16455.00 frames. ], tot_loss[loss=0.07598, simple_loss=0.09751, pruned_loss=0.01761, audio_tagging_loss=0.009608, over 3032429.39 frames. ], batch size: 63, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 00:59:12,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1291846.6666666667, ans=0.0 2023-11-21 00:59:18,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 7.749e+01 8.445e+01 9.345e+01 1.750e+02, threshold=1.689e+02, percent-clipped=1.0 2023-11-21 00:59:20,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1291913.3333333333, ans=0.125 2023-11-21 00:59:34,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193800 2023-11-21 00:59:37,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1291980.0, ans=0.125 2023-11-21 00:59:49,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1292046.6666666667, ans=0.1 2023-11-21 00:59:54,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1292046.6666666667, ans=0.0 2023-11-21 01:00:02,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-21 01:00:04,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1292113.3333333333, ans=15.0 2023-11-21 01:00:12,450 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1450, loss[loss=0.0741, simple_loss=0.0993, pruned_loss=0.01333, audio_tagging_loss=0.01113, over 15887.00 frames. ], tot_loss[loss=0.07598, simple_loss=0.09729, pruned_loss=0.01758, audio_tagging_loss=0.009762, over 3031206.92 frames. ], batch size: 56, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 01:00:16,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-21 01:00:20,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-21 01:00:22,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1292180.0, ans=0.2 2023-11-21 01:00:26,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1292246.6666666667, ans=0.125 2023-11-21 01:00:26,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2023-11-21 01:00:38,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193850 2023-11-21 01:01:03,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-11-21 01:01:16,045 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1500, loss[loss=0.08784, simple_loss=0.109, pruned_loss=0.02257, audio_tagging_loss=0.01076, over 14234.00 frames. ], tot_loss[loss=0.07645, simple_loss=0.09768, pruned_loss=0.01776, audio_tagging_loss=0.009851, over 3029871.70 frames. ], batch size: 53, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:01:24,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1292513.3333333333, ans=0.125 2023-11-21 01:01:27,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.416e+01 9.067e+01 9.838e+01 1.487e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-21 01:01:42,970 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193900 2023-11-21 01:01:47,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-21 01:02:02,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1292713.3333333333, ans=0.125 2023-11-21 01:02:18,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1292780.0, ans=0.025 2023-11-21 01:02:20,423 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1550, loss[loss=0.08417, simple_loss=0.1058, pruned_loss=0.01999, audio_tagging_loss=0.01125, over 15033.00 frames. ], tot_loss[loss=0.07671, simple_loss=0.0979, pruned_loss=0.01784, audio_tagging_loss=0.009925, over 3032691.50 frames. ], batch size: 57, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:02:34,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1292913.3333333333, ans=0.1 2023-11-21 01:02:36,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1292913.3333333333, ans=0.1 2023-11-21 01:02:46,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-21 01:02:47,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 193950 2023-11-21 01:02:53,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1292980.0, ans=0.125 2023-11-21 01:03:07,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-21 01:03:09,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-21 01:03:25,726 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1600, loss[loss=0.06769, simple_loss=0.08788, pruned_loss=0.01208, audio_tagging_loss=0.01166, over 15936.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09632, pruned_loss=0.0176, audio_tagging_loss=0.009963, over 3036297.77 frames. ], batch size: 60, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 01:03:36,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.454e+01 8.126e+01 8.877e+01 9.641e+01 1.501e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 01:03:41,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1293246.6666666667, ans=0.125 2023-11-21 01:03:51,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1293313.3333333333, ans=0.0 2023-11-21 01:03:52,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194000 2023-11-21 01:03:57,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-21 01:04:00,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:04:00,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293313.3333333333, ans=0.1 2023-11-21 01:04:18,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-21 01:04:18,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-21 01:04:24,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293446.6666666667, ans=0.1 2023-11-21 01:04:30,481 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1650, loss[loss=0.06989, simple_loss=0.1019, pruned_loss=0.01159, audio_tagging_loss=0.007345, over 14217.00 frames. ], tot_loss[loss=0.07629, simple_loss=0.09698, pruned_loss=0.01773, audio_tagging_loss=0.01006, over 3042057.70 frames. ], batch size: 53, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 01:04:56,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194050 2023-11-21 01:05:00,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1293646.6666666667, ans=0.0 2023-11-21 01:05:32,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2023-11-21 01:05:35,476 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1700, loss[loss=0.06042, simple_loss=0.06997, pruned_loss=0.01404, audio_tagging_loss=0.01139, over 15593.00 frames. ], tot_loss[loss=0.07615, simple_loss=0.09678, pruned_loss=0.01765, audio_tagging_loss=0.0101, over 3041333.69 frames. ], batch size: 61, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 01:05:44,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1293846.6666666667, ans=0.0 2023-11-21 01:05:46,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 8.167e+01 8.614e+01 9.209e+01 1.216e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 01:05:47,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-21 01:06:02,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194100 2023-11-21 01:06:17,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1294046.6666666667, ans=0.0 2023-11-21 01:06:32,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1294113.3333333333, ans=0.125 2023-11-21 01:06:34,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1294113.3333333333, ans=0.125 2023-11-21 01:06:40,568 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1750, loss[loss=0.06579, simple_loss=0.08162, pruned_loss=0.01381, audio_tagging_loss=0.01117, over 14826.00 frames. ], tot_loss[loss=0.07529, simple_loss=0.09603, pruned_loss=0.01732, audio_tagging_loss=0.009957, over 3042718.01 frames. ], batch size: 56, lr: 4.09e-03, grad_scale: 32.0 2023-11-21 01:06:44,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1294180.0, ans=0.125 2023-11-21 01:06:54,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1294246.6666666667, ans=0.0 2023-11-21 01:07:07,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194150 2023-11-21 01:07:17,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1294380.0, ans=0.125 2023-11-21 01:07:43,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1294513.3333333333, ans=0.125 2023-11-21 01:07:43,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1294513.3333333333, ans=0.1 2023-11-21 01:07:44,349 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1800, loss[loss=0.05571, simple_loss=0.06732, pruned_loss=0.01076, audio_tagging_loss=0.01129, over 16469.00 frames. ], tot_loss[loss=0.07507, simple_loss=0.09579, pruned_loss=0.01726, audio_tagging_loss=0.00992, over 3049822.00 frames. ], batch size: 63, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:07:44,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1294513.3333333333, ans=0.125 2023-11-21 01:07:56,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1294580.0, ans=0.125 2023-11-21 01:07:57,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.340e+01 8.982e+01 9.918e+01 1.410e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-21 01:07:57,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1294580.0, ans=0.0 2023-11-21 01:08:10,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194200 2023-11-21 01:08:25,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2023-11-21 01:08:31,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-21 01:08:44,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1294780.0, ans=0.0 2023-11-21 01:08:48,990 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1850, loss[loss=0.08744, simple_loss=0.1189, pruned_loss=0.01981, audio_tagging_loss=0.008201, over 16373.00 frames. ], tot_loss[loss=0.07504, simple_loss=0.09577, pruned_loss=0.0173, audio_tagging_loss=0.009849, over 3047107.92 frames. ], batch size: 59, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:09:05,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-21 01:09:16,927 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194250 2023-11-21 01:09:23,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-21 01:09:33,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1295046.6666666667, ans=0.125 2023-11-21 01:09:51,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1295180.0, ans=10.0 2023-11-21 01:09:52,803 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1900, loss[loss=0.07515, simple_loss=0.1052, pruned_loss=0.01473, audio_tagging_loss=0.007837, over 16393.00 frames. ], tot_loss[loss=0.0755, simple_loss=0.09644, pruned_loss=0.01757, audio_tagging_loss=0.009705, over 3047958.63 frames. ], batch size: 59, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:09:58,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1295180.0, ans=0.125 2023-11-21 01:10:04,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1295180.0, ans=0.125 2023-11-21 01:10:06,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1295246.6666666667, ans=0.125 2023-11-21 01:10:06,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.112e+01 7.944e+01 8.677e+01 9.299e+01 1.429e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 01:10:21,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194300 2023-11-21 01:10:29,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2023-11-21 01:10:33,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2023-11-21 01:10:40,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1295380.0, ans=0.125 2023-11-21 01:10:54,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1295446.6666666667, ans=0.125 2023-11-21 01:10:56,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1295446.6666666667, ans=0.125 2023-11-21 01:10:58,323 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 1950, loss[loss=0.1109, simple_loss=0.1461, pruned_loss=0.02844, audio_tagging_loss=0.009405, over 14965.00 frames. ], tot_loss[loss=0.07529, simple_loss=0.09631, pruned_loss=0.01744, audio_tagging_loss=0.009691, over 3045513.59 frames. ], batch size: 53, lr: 4.09e-03, grad_scale: 16.0 2023-11-21 01:11:24,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194350 2023-11-21 01:11:26,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1295646.6666666667, ans=0.2 2023-11-21 01:11:32,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1295646.6666666667, ans=0.1 2023-11-21 01:11:51,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-21 01:12:02,481 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2000, loss[loss=0.08552, simple_loss=0.117, pruned_loss=0.01915, audio_tagging_loss=0.007884, over 15500.00 frames. ], tot_loss[loss=0.07513, simple_loss=0.0959, pruned_loss=0.01749, audio_tagging_loss=0.009688, over 3047966.00 frames. ], batch size: 59, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:12:11,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1295846.6666666667, ans=0.125 2023-11-21 01:12:14,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.274e+01 9.033e+01 9.827e+01 1.298e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 01:12:16,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-21 01:12:22,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2023-11-21 01:12:23,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1295913.3333333333, ans=0.125 2023-11-21 01:12:27,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1295980.0, ans=0.0 2023-11-21 01:12:29,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194400 2023-11-21 01:12:43,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1296046.6666666667, ans=0.07 2023-11-21 01:12:44,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1296046.6666666667, ans=0.2 2023-11-21 01:12:52,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1296046.6666666667, ans=0.125 2023-11-21 01:12:58,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1296113.3333333333, ans=0.1 2023-11-21 01:13:06,214 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2050, loss[loss=0.09134, simple_loss=0.1218, pruned_loss=0.02135, audio_tagging_loss=0.009081, over 15136.00 frames. ], tot_loss[loss=0.07493, simple_loss=0.09553, pruned_loss=0.01741, audio_tagging_loss=0.009761, over 3045143.10 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:13:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1296180.0, ans=0.125 2023-11-21 01:13:14,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-21 01:13:16,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1296180.0, ans=0.125 2023-11-21 01:13:26,661 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:13:33,601 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194450 2023-11-21 01:13:45,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1296380.0, ans=0.0 2023-11-21 01:13:46,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1296380.0, ans=0.0 2023-11-21 01:13:50,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1296380.0, ans=0.125 2023-11-21 01:14:10,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1296513.3333333333, ans=0.125 2023-11-21 01:14:10,837 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2100, loss[loss=0.06142, simple_loss=0.0724, pruned_loss=0.01261, audio_tagging_loss=0.01261, over 15378.00 frames. ], tot_loss[loss=0.07581, simple_loss=0.09703, pruned_loss=0.0176, audio_tagging_loss=0.009695, over 3042096.42 frames. ], batch size: 57, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:14:24,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.053e+01 8.707e+01 9.415e+01 1.178e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-21 01:14:27,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-21 01:14:37,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194500 2023-11-21 01:14:38,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-21 01:14:50,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1296713.3333333333, ans=0.125 2023-11-21 01:14:51,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1296713.3333333333, ans=0.2 2023-11-21 01:15:13,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1296780.0, ans=0.125 2023-11-21 01:15:15,734 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2150, loss[loss=0.05076, simple_loss=0.06101, pruned_loss=0.009435, audio_tagging_loss=0.01082, over 14296.00 frames. ], tot_loss[loss=0.0756, simple_loss=0.09656, pruned_loss=0.01758, audio_tagging_loss=0.009747, over 3044162.07 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:15:18,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1296846.6666666667, ans=0.125 2023-11-21 01:15:42,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194550 2023-11-21 01:15:48,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-21 01:15:51,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-21 01:15:52,154 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:15:54,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1297046.6666666667, ans=0.1 2023-11-21 01:16:18,802 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2200, loss[loss=0.04753, simple_loss=0.05288, pruned_loss=0.01002, audio_tagging_loss=0.01107, over 14975.00 frames. ], tot_loss[loss=0.07545, simple_loss=0.09624, pruned_loss=0.01748, audio_tagging_loss=0.009859, over 3046069.78 frames. ], batch size: 60, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:16:32,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.162e+01 8.764e+01 9.413e+01 1.161e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 01:16:45,743 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194600 2023-11-21 01:17:08,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1297380.0, ans=0.125 2023-11-21 01:17:17,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1297446.6666666667, ans=0.125 2023-11-21 01:17:20,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1297446.6666666667, ans=0.1 2023-11-21 01:17:23,469 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2250, loss[loss=0.09449, simple_loss=0.1063, pruned_loss=0.03276, audio_tagging_loss=0.00857, over 13925.00 frames. ], tot_loss[loss=0.07677, simple_loss=0.0982, pruned_loss=0.01786, audio_tagging_loss=0.009814, over 3050185.02 frames. ], batch size: 53, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:17:50,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194650 2023-11-21 01:18:01,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1297713.3333333333, ans=0.5 2023-11-21 01:18:03,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2023-11-21 01:18:08,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1297713.3333333333, ans=0.125 2023-11-21 01:18:15,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1297780.0, ans=0.09899494936611666 2023-11-21 01:18:27,464 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2300, loss[loss=0.07422, simple_loss=0.09226, pruned_loss=0.01862, audio_tagging_loss=0.00947, over 16495.00 frames. ], tot_loss[loss=0.0765, simple_loss=0.09783, pruned_loss=0.01775, audio_tagging_loss=0.009837, over 3048226.13 frames. ], batch size: 63, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:18:27,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1297846.6666666667, ans=0.1 2023-11-21 01:18:39,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1297913.3333333333, ans=0.125 2023-11-21 01:18:41,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.363e+01 8.884e+01 1.011e+02 1.300e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 01:18:55,306 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194700 2023-11-21 01:19:00,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1297980.0, ans=0.125 2023-11-21 01:19:11,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1298046.6666666667, ans=0.125 2023-11-21 01:19:23,578 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:19:28,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1298113.3333333333, ans=0.1 2023-11-21 01:19:31,993 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2350, loss[loss=0.08416, simple_loss=0.1052, pruned_loss=0.0206, audio_tagging_loss=0.01096, over 15330.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.0981, pruned_loss=0.01782, audio_tagging_loss=0.009918, over 3052491.14 frames. ], batch size: 56, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:19:59,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194750 2023-11-21 01:20:02,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1298313.3333333333, ans=10.0 2023-11-21 01:20:04,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-21 01:20:05,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2023-11-21 01:20:16,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1298380.0, ans=0.2 2023-11-21 01:20:16,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1298380.0, ans=0.125 2023-11-21 01:20:36,878 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2400, loss[loss=0.05835, simple_loss=0.07, pruned_loss=0.01205, audio_tagging_loss=0.01129, over 15303.00 frames. ], tot_loss[loss=0.07735, simple_loss=0.09905, pruned_loss=0.01794, audio_tagging_loss=0.009887, over 3050527.85 frames. ], batch size: 58, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:20:39,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-11-21 01:20:46,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2023-11-21 01:20:50,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.420e+01 8.164e+01 8.845e+01 9.386e+01 1.132e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 01:21:03,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194800 2023-11-21 01:21:04,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-21 01:21:37,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1298780.0, ans=0.2 2023-11-21 01:21:37,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1298780.0, ans=0.125 2023-11-21 01:21:40,702 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2450, loss[loss=0.07125, simple_loss=0.08617, pruned_loss=0.01632, audio_tagging_loss=0.01185, over 14832.00 frames. ], tot_loss[loss=0.07777, simple_loss=0.09924, pruned_loss=0.01817, audio_tagging_loss=0.009976, over 3052340.29 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:22:08,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194850 2023-11-21 01:22:09,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1298980.0, ans=0.125 2023-11-21 01:22:12,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1298980.0, ans=0.125 2023-11-21 01:22:13,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=8.0 2023-11-21 01:22:14,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1298980.0, ans=0.2 2023-11-21 01:22:17,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1298980.0, ans=0.025 2023-11-21 01:22:37,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1299113.3333333333, ans=0.5 2023-11-21 01:22:41,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-11-21 01:22:45,479 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2500, loss[loss=0.07352, simple_loss=0.09458, pruned_loss=0.01609, audio_tagging_loss=0.01014, over 15266.00 frames. ], tot_loss[loss=0.07729, simple_loss=0.09863, pruned_loss=0.01801, audio_tagging_loss=0.009964, over 3049411.77 frames. ], batch size: 57, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:22:56,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1299180.0, ans=0.125 2023-11-21 01:23:01,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.181e+01 8.676e+01 9.218e+01 1.349e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 01:23:12,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194900 2023-11-21 01:23:43,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1299446.6666666667, ans=0.1 2023-11-21 01:23:46,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2023-11-21 01:23:50,119 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2550, loss[loss=0.1022, simple_loss=0.147, pruned_loss=0.02347, audio_tagging_loss=0.005223, over 15929.00 frames. ], tot_loss[loss=0.07726, simple_loss=0.09883, pruned_loss=0.01798, audio_tagging_loss=0.009869, over 3046075.54 frames. ], batch size: 56, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:24:06,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-21 01:24:16,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 194950 2023-11-21 01:24:25,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1299646.6666666667, ans=0.125 2023-11-21 01:24:29,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2023-11-21 01:24:53,748 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2600, loss[loss=0.07413, simple_loss=0.09756, pruned_loss=0.01801, audio_tagging_loss=0.007341, over 14740.00 frames. ], tot_loss[loss=0.07727, simple_loss=0.09883, pruned_loss=0.01811, audio_tagging_loss=0.009746, over 3046405.20 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:25:02,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1299846.6666666667, ans=0.125 2023-11-21 01:25:04,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1299846.6666666667, ans=0.125 2023-11-21 01:25:09,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 7.907e+01 8.660e+01 9.294e+01 1.206e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 01:25:14,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1299913.3333333333, ans=0.035 2023-11-21 01:25:16,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1299913.3333333333, ans=0.125 2023-11-21 01:25:20,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195000 2023-11-21 01:25:33,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1300046.6666666667, ans=0.07 2023-11-21 01:25:58,425 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2650, loss[loss=0.08714, simple_loss=0.1202, pruned_loss=0.01799, audio_tagging_loss=0.009054, over 15136.00 frames. ], tot_loss[loss=0.07717, simple_loss=0.09833, pruned_loss=0.0182, audio_tagging_loss=0.009799, over 3038118.27 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:26:03,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1300180.0, ans=0.125 2023-11-21 01:26:26,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195050 2023-11-21 01:26:29,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300313.3333333333, ans=0.1 2023-11-21 01:26:31,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1300313.3333333333, ans=0.1 2023-11-21 01:26:40,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300380.0, ans=0.1 2023-11-21 01:26:56,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-21 01:27:03,574 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2700, loss[loss=0.05759, simple_loss=0.07005, pruned_loss=0.01261, audio_tagging_loss=0.009962, over 15514.00 frames. ], tot_loss[loss=0.07709, simple_loss=0.09876, pruned_loss=0.01823, audio_tagging_loss=0.009488, over 3043377.30 frames. ], batch size: 62, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:27:13,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-21 01:27:18,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 7.841e+01 8.793e+01 9.412e+01 1.255e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 01:27:22,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-21 01:27:23,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1300580.0, ans=0.0 2023-11-21 01:27:25,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1300580.0, ans=0.125 2023-11-21 01:27:30,617 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195100 2023-11-21 01:28:02,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1300780.0, ans=0.5 2023-11-21 01:28:08,077 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2750, loss[loss=0.08314, simple_loss=0.111, pruned_loss=0.0198, audio_tagging_loss=0.00784, over 15973.00 frames. ], tot_loss[loss=0.07675, simple_loss=0.09843, pruned_loss=0.01807, audio_tagging_loss=0.009466, over 3042118.51 frames. ], batch size: 58, lr: 4.08e-03, grad_scale: 16.0 2023-11-21 01:28:16,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300846.6666666667, ans=0.1 2023-11-21 01:28:27,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1300913.3333333333, ans=0.07 2023-11-21 01:28:34,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195150 2023-11-21 01:28:39,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1300980.0, ans=0.0 2023-11-21 01:28:55,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1301046.6666666667, ans=0.1 2023-11-21 01:29:02,058 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:29:05,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1301113.3333333333, ans=0.2 2023-11-21 01:29:09,044 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:29:12,437 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2800, loss[loss=0.1042, simple_loss=0.1342, pruned_loss=0.02934, audio_tagging_loss=0.007737, over 15970.00 frames. ], tot_loss[loss=0.07664, simple_loss=0.09789, pruned_loss=0.01816, audio_tagging_loss=0.009531, over 3035526.12 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:29:12,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301180.0, ans=0.1 2023-11-21 01:29:27,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.332e+01 8.111e+01 8.659e+01 9.368e+01 1.203e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 01:29:38,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1301313.3333333333, ans=0.125 2023-11-21 01:29:39,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195200 2023-11-21 01:29:49,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1301380.0, ans=0.0 2023-11-21 01:29:52,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1301380.0, ans=0.125 2023-11-21 01:30:00,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-21 01:30:03,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1301446.6666666667, ans=0.125 2023-11-21 01:30:07,105 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:30:16,066 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2850, loss[loss=0.08471, simple_loss=0.1177, pruned_loss=0.01769, audio_tagging_loss=0.008151, over 15802.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09783, pruned_loss=0.01796, audio_tagging_loss=0.009487, over 3038417.02 frames. ], batch size: 55, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:30:16,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1301513.3333333333, ans=0.2 2023-11-21 01:30:30,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-21 01:30:43,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195250 2023-11-21 01:30:57,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1301713.3333333333, ans=0.125 2023-11-21 01:31:05,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1301713.3333333333, ans=0.5 2023-11-21 01:31:21,294 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2900, loss[loss=0.1102, simple_loss=0.1553, pruned_loss=0.02704, audio_tagging_loss=0.005521, over 16081.00 frames. ], tot_loss[loss=0.0767, simple_loss=0.0983, pruned_loss=0.01801, audio_tagging_loss=0.009539, over 3043749.13 frames. ], batch size: 56, lr: 4.08e-03, grad_scale: 32.0 2023-11-21 01:31:28,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2023-11-21 01:31:29,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2023-11-21 01:31:33,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1301913.3333333333, ans=0.1 2023-11-21 01:31:36,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1301913.3333333333, ans=0.125 2023-11-21 01:31:36,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.167e+01 8.153e+01 9.015e+01 9.723e+01 1.263e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-21 01:31:37,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1301913.3333333333, ans=0.125 2023-11-21 01:31:48,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195300 2023-11-21 01:31:55,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1301980.0, ans=0.125 2023-11-21 01:32:00,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1302046.6666666667, ans=0.0 2023-11-21 01:32:07,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1302046.6666666667, ans=0.0 2023-11-21 01:32:18,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1302113.3333333333, ans=0.0 2023-11-21 01:32:26,133 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 2950, loss[loss=0.05826, simple_loss=0.06583, pruned_loss=0.01155, audio_tagging_loss=0.0138, over 14283.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.09828, pruned_loss=0.01795, audio_tagging_loss=0.009696, over 3039954.14 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:32:30,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1302180.0, ans=0.0 2023-11-21 01:32:53,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195350 2023-11-21 01:32:55,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1302313.3333333333, ans=0.125 2023-11-21 01:33:04,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1302380.0, ans=0.125 2023-11-21 01:33:28,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1302513.3333333333, ans=0.0 2023-11-21 01:33:29,879 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3000, loss[loss=0.09396, simple_loss=0.1251, pruned_loss=0.02154, audio_tagging_loss=0.009878, over 15860.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09859, pruned_loss=0.01786, audio_tagging_loss=0.009699, over 3038834.52 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:33:29,880 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 01:33:50,736 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3734, 3.5376, 3.1970, 3.0239], device='cuda:1') 2023-11-21 01:34:10,807 INFO [train_asr.py:1253] (1/4) Epoch 17, validation: loss=0.06009, simple_loss=0.05276, pruned_loss=0.005332, audio_tagging_loss=0.02838, over 4681554.00 frames. 2023-11-21 01:34:10,808 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 01:34:11,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-21 01:34:12,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1302513.3333333333, ans=0.0 2023-11-21 01:34:19,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1302513.3333333333, ans=0.125 2023-11-21 01:34:26,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2023-11-21 01:34:27,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.276e+01 8.086e+01 8.879e+01 9.716e+01 1.329e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 01:34:29,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1302580.0, ans=0.0 2023-11-21 01:34:32,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1302580.0, ans=0.0 2023-11-21 01:34:37,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195400 2023-11-21 01:34:42,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-21 01:34:50,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2023-11-21 01:34:56,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1302713.3333333333, ans=0.125 2023-11-21 01:35:15,247 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3050, loss[loss=0.08254, simple_loss=0.1119, pruned_loss=0.01942, audio_tagging_loss=0.007189, over 14594.00 frames. ], tot_loss[loss=0.07656, simple_loss=0.09809, pruned_loss=0.01771, audio_tagging_loss=0.009805, over 3045303.61 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:35:26,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1302913.3333333333, ans=0.125 2023-11-21 01:35:41,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195450 2023-11-21 01:35:52,268 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:35:59,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1303046.6666666667, ans=0.125 2023-11-21 01:36:01,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1303046.6666666667, ans=0.125 2023-11-21 01:36:02,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1303046.6666666667, ans=0.125 2023-11-21 01:36:04,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1303046.6666666667, ans=0.04949747468305833 2023-11-21 01:36:07,673 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:36:15,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1303113.3333333333, ans=0.0 2023-11-21 01:36:18,599 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3100, loss[loss=0.08638, simple_loss=0.1114, pruned_loss=0.02116, audio_tagging_loss=0.009515, over 15534.00 frames. ], tot_loss[loss=0.07724, simple_loss=0.09867, pruned_loss=0.01811, audio_tagging_loss=0.009795, over 3040833.12 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:36:26,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1303180.0, ans=0.015 2023-11-21 01:36:36,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.198e+01 8.813e+01 9.373e+01 1.166e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-21 01:36:43,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-11-21 01:36:46,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195500 2023-11-21 01:36:58,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303380.0, ans=0.1 2023-11-21 01:37:02,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-21 01:37:04,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-21 01:37:20,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1303446.6666666667, ans=0.2 2023-11-21 01:37:21,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1303446.6666666667, ans=0.125 2023-11-21 01:37:23,933 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3150, loss[loss=0.06945, simple_loss=0.09159, pruned_loss=0.01505, audio_tagging_loss=0.008602, over 15103.00 frames. ], tot_loss[loss=0.0772, simple_loss=0.0983, pruned_loss=0.01812, audio_tagging_loss=0.00993, over 3043019.35 frames. ], batch size: 55, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:37:50,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195550 2023-11-21 01:38:04,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-21 01:38:19,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1303780.0, ans=0.0 2023-11-21 01:38:25,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1303780.0, ans=0.0 2023-11-21 01:38:28,821 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3200, loss[loss=0.0451, simple_loss=0.04909, pruned_loss=0.00734, audio_tagging_loss=0.01321, over 14691.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09825, pruned_loss=0.01803, audio_tagging_loss=0.009994, over 3047468.92 frames. ], batch size: 58, lr: 4.07e-03, grad_scale: 32.0 2023-11-21 01:38:34,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1303846.6666666667, ans=0.125 2023-11-21 01:38:40,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1303913.3333333333, ans=0.125 2023-11-21 01:38:44,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.338e+01 8.946e+01 9.583e+01 3.907e+02, threshold=1.789e+02, percent-clipped=1.0 2023-11-21 01:38:52,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-21 01:38:55,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195600 2023-11-21 01:38:57,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-21 01:39:07,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1304046.6666666667, ans=0.125 2023-11-21 01:39:32,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1304180.0, ans=0.125 2023-11-21 01:39:33,072 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3250, loss[loss=0.08155, simple_loss=0.1108, pruned_loss=0.01745, audio_tagging_loss=0.008714, over 16359.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.09872, pruned_loss=0.01812, audio_tagging_loss=0.01011, over 3050310.93 frames. ], batch size: 60, lr: 4.07e-03, grad_scale: 32.0 2023-11-21 01:39:34,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1304180.0, ans=0.0 2023-11-21 01:40:00,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195650 2023-11-21 01:40:09,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1304313.3333333333, ans=0.0 2023-11-21 01:40:11,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1304380.0, ans=0.0 2023-11-21 01:40:33,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1304446.6666666667, ans=0.0 2023-11-21 01:40:37,912 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3300, loss[loss=0.09396, simple_loss=0.1268, pruned_loss=0.01995, audio_tagging_loss=0.01059, over 15222.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09862, pruned_loss=0.01792, audio_tagging_loss=0.01009, over 3057005.85 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 32.0 2023-11-21 01:40:49,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1304580.0, ans=0.125 2023-11-21 01:40:56,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.228e+01 8.124e+01 8.901e+01 9.520e+01 1.377e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 01:41:04,020 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:41:05,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195700 2023-11-21 01:41:13,817 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 01:41:17,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2023-11-21 01:41:34,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1304780.0, ans=0.1 2023-11-21 01:41:37,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1304780.0, ans=0.125 2023-11-21 01:41:41,960 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3350, loss[loss=0.09909, simple_loss=0.1303, pruned_loss=0.02452, audio_tagging_loss=0.009414, over 15507.00 frames. ], tot_loss[loss=0.0766, simple_loss=0.09769, pruned_loss=0.0177, audio_tagging_loss=0.01005, over 3055148.93 frames. ], batch size: 59, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:41:47,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1304846.6666666667, ans=0.025 2023-11-21 01:42:04,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1304913.3333333333, ans=0.0 2023-11-21 01:42:07,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1304980.0, ans=0.1 2023-11-21 01:42:08,294 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195750 2023-11-21 01:42:15,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1304980.0, ans=0.125 2023-11-21 01:42:18,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1305046.6666666667, ans=0.0 2023-11-21 01:42:37,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1305113.3333333333, ans=0.0 2023-11-21 01:42:38,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1305113.3333333333, ans=0.125 2023-11-21 01:42:45,579 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3400, loss[loss=0.08617, simple_loss=0.1205, pruned_loss=0.02177, audio_tagging_loss=0.004143, over 15372.00 frames. ], tot_loss[loss=0.07723, simple_loss=0.09879, pruned_loss=0.01803, audio_tagging_loss=0.009807, over 3057269.13 frames. ], batch size: 59, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:42:55,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1305180.0, ans=0.2 2023-11-21 01:42:58,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1305246.6666666667, ans=0.0 2023-11-21 01:43:03,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.561e+01 8.029e+01 8.814e+01 9.645e+01 1.986e+02, threshold=1.763e+02, percent-clipped=1.0 2023-11-21 01:43:04,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1305246.6666666667, ans=0.125 2023-11-21 01:43:10,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-21 01:43:11,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1305313.3333333333, ans=0.0 2023-11-21 01:43:12,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195800 2023-11-21 01:43:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1305380.0, ans=0.125 2023-11-21 01:43:33,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1305380.0, ans=0.125 2023-11-21 01:43:43,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1305446.6666666667, ans=0.125 2023-11-21 01:43:47,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-11-21 01:43:48,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1305446.6666666667, ans=0.125 2023-11-21 01:43:51,472 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3450, loss[loss=0.07912, simple_loss=0.09595, pruned_loss=0.01984, audio_tagging_loss=0.0113, over 15905.00 frames. ], tot_loss[loss=0.07765, simple_loss=0.09961, pruned_loss=0.01825, audio_tagging_loss=0.009598, over 3060726.61 frames. ], batch size: 60, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:44:18,458 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195850 2023-11-21 01:44:28,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1305713.3333333333, ans=0.125 2023-11-21 01:44:29,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1305713.3333333333, ans=0.125 2023-11-21 01:44:34,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1305713.3333333333, ans=0.1 2023-11-21 01:44:41,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1305713.3333333333, ans=0.125 2023-11-21 01:44:44,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-21 01:44:51,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1305780.0, ans=0.05 2023-11-21 01:44:56,178 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3500, loss[loss=0.06146, simple_loss=0.0712, pruned_loss=0.01397, audio_tagging_loss=0.01189, over 16138.00 frames. ], tot_loss[loss=0.07826, simple_loss=0.1007, pruned_loss=0.01838, audio_tagging_loss=0.009526, over 3056161.31 frames. ], batch size: 64, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:45:13,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.440e+01 8.307e+01 8.800e+01 9.894e+01 1.360e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 01:45:23,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195900 2023-11-21 01:45:29,940 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:45:36,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1306046.6666666667, ans=0.015 2023-11-21 01:46:00,380 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3550, loss[loss=0.09189, simple_loss=0.1165, pruned_loss=0.02417, audio_tagging_loss=0.009483, over 14451.00 frames. ], tot_loss[loss=0.07711, simple_loss=0.09884, pruned_loss=0.01809, audio_tagging_loss=0.009601, over 3046933.35 frames. ], batch size: 53, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:46:21,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1306246.6666666667, ans=0.125 2023-11-21 01:46:27,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2023-11-21 01:46:27,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 195950 2023-11-21 01:46:33,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-21 01:46:34,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1306313.3333333333, ans=0.125 2023-11-21 01:46:41,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1306380.0, ans=0.2 2023-11-21 01:46:52,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1306446.6666666667, ans=0.1 2023-11-21 01:47:04,446 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3600, loss[loss=0.06925, simple_loss=0.08452, pruned_loss=0.01662, audio_tagging_loss=0.01037, over 14870.00 frames. ], tot_loss[loss=0.07741, simple_loss=0.09936, pruned_loss=0.01816, audio_tagging_loss=0.009566, over 3049657.52 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 32.0 2023-11-21 01:47:12,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1306513.3333333333, ans=0.125 2023-11-21 01:47:18,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1306580.0, ans=0.125 2023-11-21 01:47:22,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.055e+01 8.078e+01 8.527e+01 9.382e+01 1.423e+02, threshold=1.705e+02, percent-clipped=0.0 2023-11-21 01:47:23,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1306580.0, ans=0.125 2023-11-21 01:47:31,147 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196000 2023-11-21 01:47:41,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-21 01:47:53,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-11-21 01:48:08,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2023-11-21 01:48:12,003 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3650, loss[loss=0.04788, simple_loss=0.05897, pruned_loss=0.007447, audio_tagging_loss=0.01094, over 14388.00 frames. ], tot_loss[loss=0.07717, simple_loss=0.09885, pruned_loss=0.0181, audio_tagging_loss=0.009651, over 3047677.66 frames. ], batch size: 55, lr: 4.07e-03, grad_scale: 32.0 2023-11-21 01:48:22,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1306846.6666666667, ans=0.0 2023-11-21 01:48:39,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196050 2023-11-21 01:49:08,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1307113.3333333333, ans=0.0 2023-11-21 01:49:15,789 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3700, loss[loss=0.08064, simple_loss=0.1085, pruned_loss=0.01606, audio_tagging_loss=0.01034, over 14786.00 frames. ], tot_loss[loss=0.07726, simple_loss=0.09919, pruned_loss=0.01813, audio_tagging_loss=0.009539, over 3054483.33 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:49:17,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1307180.0, ans=0.125 2023-11-21 01:49:22,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1307180.0, ans=0.07 2023-11-21 01:49:36,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.365e+01 9.163e+01 1.051e+02 1.464e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-21 01:49:42,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1307313.3333333333, ans=0.1 2023-11-21 01:49:43,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196100 2023-11-21 01:49:45,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-11-21 01:49:52,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-21 01:50:02,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1307380.0, ans=0.2 2023-11-21 01:50:17,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1307446.6666666667, ans=0.035 2023-11-21 01:50:21,109 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3750, loss[loss=0.05891, simple_loss=0.07415, pruned_loss=0.01221, audio_tagging_loss=0.009632, over 16143.00 frames. ], tot_loss[loss=0.07771, simple_loss=0.09989, pruned_loss=0.0183, audio_tagging_loss=0.009472, over 3061587.16 frames. ], batch size: 61, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:50:25,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1307513.3333333333, ans=0.2 2023-11-21 01:50:34,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-21 01:50:48,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196150 2023-11-21 01:50:49,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2023-11-21 01:51:05,806 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:51:25,133 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3800, loss[loss=0.08781, simple_loss=0.1106, pruned_loss=0.02051, audio_tagging_loss=0.01198, over 14560.00 frames. ], tot_loss[loss=0.07759, simple_loss=0.09994, pruned_loss=0.0181, audio_tagging_loss=0.009527, over 3055505.83 frames. ], batch size: 55, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:51:44,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.062e+01 7.915e+01 8.603e+01 9.483e+01 1.210e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-21 01:51:52,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196200 2023-11-21 01:52:05,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=12.0 2023-11-21 01:52:13,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1308046.6666666667, ans=0.125 2023-11-21 01:52:16,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1308113.3333333333, ans=0.0 2023-11-21 01:52:29,391 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3850, loss[loss=0.07155, simple_loss=0.09565, pruned_loss=0.01344, audio_tagging_loss=0.01029, over 15505.00 frames. ], tot_loss[loss=0.0774, simple_loss=0.0993, pruned_loss=0.0181, audio_tagging_loss=0.009655, over 3047193.08 frames. ], batch size: 56, lr: 4.07e-03, grad_scale: 16.0 2023-11-21 01:52:35,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1308180.0, ans=0.125 2023-11-21 01:52:44,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1308246.6666666667, ans=0.0 2023-11-21 01:52:44,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1308246.6666666667, ans=0.0 2023-11-21 01:52:56,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2023-11-21 01:52:56,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196250 2023-11-21 01:53:15,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1308380.0, ans=0.125 2023-11-21 01:53:19,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2023-11-21 01:53:26,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2023-11-21 01:53:28,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-21 01:53:33,584 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3900, loss[loss=0.07344, simple_loss=0.09945, pruned_loss=0.01517, audio_tagging_loss=0.008543, over 15624.00 frames. ], tot_loss[loss=0.07631, simple_loss=0.09764, pruned_loss=0.01771, audio_tagging_loss=0.009777, over 3040098.49 frames. ], batch size: 60, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 01:53:33,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1308513.3333333333, ans=0.125 2023-11-21 01:53:46,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1308580.0, ans=0.125 2023-11-21 01:53:52,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 8.160e+01 8.803e+01 9.797e+01 1.334e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 01:54:00,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196300 2023-11-21 01:54:06,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1308646.6666666667, ans=0.125 2023-11-21 01:54:09,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1308646.6666666667, ans=0.125 2023-11-21 01:54:23,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1308780.0, ans=0.125 2023-11-21 01:54:36,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-21 01:54:37,445 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 3950, loss[loss=0.1011, simple_loss=0.1282, pruned_loss=0.02784, audio_tagging_loss=0.009147, over 17054.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09752, pruned_loss=0.01775, audio_tagging_loss=0.009847, over 3037609.30 frames. ], batch size: 60, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 01:54:45,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1308846.6666666667, ans=0.125 2023-11-21 01:54:54,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1308913.3333333333, ans=0.0 2023-11-21 01:54:59,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1308913.3333333333, ans=0.125 2023-11-21 01:55:03,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196350 2023-11-21 01:55:11,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1308980.0, ans=0.2 2023-11-21 01:55:34,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1309113.3333333333, ans=0.125 2023-11-21 01:55:39,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1309113.3333333333, ans=0.1 2023-11-21 01:55:41,766 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4000, loss[loss=0.07955, simple_loss=0.1035, pruned_loss=0.01886, audio_tagging_loss=0.008951, over 15681.00 frames. ], tot_loss[loss=0.07716, simple_loss=0.09824, pruned_loss=0.01816, audio_tagging_loss=0.009887, over 3038948.49 frames. ], batch size: 59, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 01:55:51,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1309180.0, ans=0.1 2023-11-21 01:55:57,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1309246.6666666667, ans=0.015 2023-11-21 01:55:58,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1309246.6666666667, ans=0.0 2023-11-21 01:56:00,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.301e+01 8.874e+01 9.780e+01 1.152e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 01:56:09,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196400 2023-11-21 01:56:16,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-21 01:56:22,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1309380.0, ans=0.125 2023-11-21 01:56:34,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1309446.6666666667, ans=0.125 2023-11-21 01:56:45,644 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4050, loss[loss=0.07825, simple_loss=0.1149, pruned_loss=0.01228, audio_tagging_loss=0.008507, over 15798.00 frames. ], tot_loss[loss=0.07672, simple_loss=0.09796, pruned_loss=0.01784, audio_tagging_loss=0.0099, over 3039456.89 frames. ], batch size: 58, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 01:56:48,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1309513.3333333333, ans=0.125 2023-11-21 01:56:49,893 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:57:13,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196450 2023-11-21 01:57:30,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1309713.3333333333, ans=0.125 2023-11-21 01:57:41,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1309780.0, ans=0.125 2023-11-21 01:57:51,781 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4100, loss[loss=0.08458, simple_loss=0.1027, pruned_loss=0.02008, audio_tagging_loss=0.01316, over 15486.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09862, pruned_loss=0.01791, audio_tagging_loss=0.009888, over 3037705.19 frames. ], batch size: 59, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 01:58:10,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.112e+01 8.710e+01 9.440e+01 1.199e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-21 01:58:18,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196500 2023-11-21 01:58:24,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1309980.0, ans=0.0 2023-11-21 01:58:40,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2023-11-21 01:58:44,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1310113.3333333333, ans=0.1 2023-11-21 01:58:56,145 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4150, loss[loss=0.06979, simple_loss=0.08107, pruned_loss=0.0167, audio_tagging_loss=0.01255, over 15299.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09847, pruned_loss=0.01781, audio_tagging_loss=0.009823, over 3034754.99 frames. ], batch size: 58, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 01:58:58,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-21 01:59:00,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1310180.0, ans=0.0 2023-11-21 01:59:21,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1310313.3333333333, ans=0.125 2023-11-21 01:59:23,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196550 2023-11-21 01:59:23,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1310313.3333333333, ans=0.1 2023-11-21 01:59:30,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1310313.3333333333, ans=0.125 2023-11-21 01:59:41,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1310380.0, ans=0.125 2023-11-21 01:59:43,272 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 01:59:46,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-21 01:59:54,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2023-11-21 01:59:59,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1310513.3333333333, ans=0.125 2023-11-21 02:00:00,195 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4200, loss[loss=0.08324, simple_loss=0.1087, pruned_loss=0.02059, audio_tagging_loss=0.008278, over 15166.00 frames. ], tot_loss[loss=0.07718, simple_loss=0.09918, pruned_loss=0.01789, audio_tagging_loss=0.009704, over 3037202.52 frames. ], batch size: 56, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:00:09,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1310513.3333333333, ans=0.125 2023-11-21 02:00:14,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1310580.0, ans=0.125 2023-11-21 02:00:21,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1310580.0, ans=0.1 2023-11-21 02:00:21,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.242e+01 8.811e+01 9.654e+01 1.216e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 02:00:23,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1310580.0, ans=0.0 2023-11-21 02:00:24,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1310580.0, ans=0.125 2023-11-21 02:00:28,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196600 2023-11-21 02:00:57,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1310780.0, ans=0.125 2023-11-21 02:01:05,047 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4250, loss[loss=0.0725, simple_loss=0.08402, pruned_loss=0.02125, audio_tagging_loss=0.009236, over 14666.00 frames. ], tot_loss[loss=0.07678, simple_loss=0.0988, pruned_loss=0.01774, audio_tagging_loss=0.009636, over 3043022.04 frames. ], batch size: 56, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:01:12,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=22.5 2023-11-21 02:01:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1310980.0, ans=0.0 2023-11-21 02:01:32,419 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196650 2023-11-21 02:01:39,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1310980.0, ans=0.0 2023-11-21 02:01:45,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-21 02:01:49,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-11-21 02:01:54,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2023-11-21 02:02:05,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1311113.3333333333, ans=0.0 2023-11-21 02:02:09,888 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4300, loss[loss=0.06596, simple_loss=0.08176, pruned_loss=0.01617, audio_tagging_loss=0.008912, over 15246.00 frames. ], tot_loss[loss=0.078, simple_loss=0.1005, pruned_loss=0.01826, audio_tagging_loss=0.009482, over 3048670.08 frames. ], batch size: 55, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:02:18,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1311180.0, ans=0.5 2023-11-21 02:02:29,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.098e+01 8.656e+01 9.268e+01 1.139e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 02:02:36,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196700 2023-11-21 02:02:43,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1311313.3333333333, ans=0.0 2023-11-21 02:02:49,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-21 02:02:50,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1311380.0, ans=0.125 2023-11-21 02:02:54,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1311380.0, ans=0.0 2023-11-21 02:03:01,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1311446.6666666667, ans=0.125 2023-11-21 02:03:02,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1311446.6666666667, ans=0.125 2023-11-21 02:03:03,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1311446.6666666667, ans=0.125 2023-11-21 02:03:03,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1311446.6666666667, ans=0.125 2023-11-21 02:03:07,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1311446.6666666667, ans=0.2 2023-11-21 02:03:11,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1311446.6666666667, ans=0.0 2023-11-21 02:03:13,561 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4350, loss[loss=0.08263, simple_loss=0.1045, pruned_loss=0.02304, audio_tagging_loss=0.00733, over 15245.00 frames. ], tot_loss[loss=0.07786, simple_loss=0.1005, pruned_loss=0.01826, audio_tagging_loss=0.009358, over 3051875.07 frames. ], batch size: 58, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:03:16,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1311513.3333333333, ans=0.125 2023-11-21 02:03:33,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1311580.0, ans=0.1 2023-11-21 02:03:40,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196750 2023-11-21 02:03:45,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1311646.6666666667, ans=0.0 2023-11-21 02:04:13,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1311780.0, ans=0.0 2023-11-21 02:04:16,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1311846.6666666667, ans=0.1 2023-11-21 02:04:17,743 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4400, loss[loss=0.0893, simple_loss=0.1238, pruned_loss=0.02126, audio_tagging_loss=0.006161, over 14611.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.1004, pruned_loss=0.01823, audio_tagging_loss=0.009451, over 3046270.97 frames. ], batch size: 57, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 02:04:33,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-21 02:04:37,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1311913.3333333333, ans=0.125 2023-11-21 02:04:38,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 7.827e+01 8.393e+01 8.998e+01 1.090e+02, threshold=1.679e+02, percent-clipped=0.0 2023-11-21 02:04:44,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196800 2023-11-21 02:04:48,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1311980.0, ans=0.0 2023-11-21 02:05:02,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1312046.6666666667, ans=0.1 2023-11-21 02:05:22,831 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4450, loss[loss=0.07267, simple_loss=0.09125, pruned_loss=0.01645, audio_tagging_loss=0.0106, over 14649.00 frames. ], tot_loss[loss=0.07699, simple_loss=0.09932, pruned_loss=0.01788, audio_tagging_loss=0.009446, over 3047741.55 frames. ], batch size: 55, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 02:05:23,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1312180.0, ans=0.95 2023-11-21 02:05:35,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1312246.6666666667, ans=0.125 2023-11-21 02:05:49,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196850 2023-11-21 02:05:53,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1312313.3333333333, ans=0.125 2023-11-21 02:05:57,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1312313.3333333333, ans=0.125 2023-11-21 02:06:13,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1312446.6666666667, ans=0.125 2023-11-21 02:06:26,529 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4500, loss[loss=0.0611, simple_loss=0.07829, pruned_loss=0.01158, audio_tagging_loss=0.01038, over 15536.00 frames. ], tot_loss[loss=0.07657, simple_loss=0.09906, pruned_loss=0.01767, audio_tagging_loss=0.009371, over 3050334.23 frames. ], batch size: 61, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:06:48,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.230e+01 8.666e+01 9.490e+01 1.272e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 02:06:53,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196900 2023-11-21 02:07:16,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1312780.0, ans=0.1 2023-11-21 02:07:29,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1312846.6666666667, ans=0.0 2023-11-21 02:07:30,290 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4550, loss[loss=0.07529, simple_loss=0.1075, pruned_loss=0.01391, audio_tagging_loss=0.007624, over 15298.00 frames. ], tot_loss[loss=0.07619, simple_loss=0.09865, pruned_loss=0.0175, audio_tagging_loss=0.009362, over 3047353.68 frames. ], batch size: 55, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:07:34,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1312846.6666666667, ans=0.0 2023-11-21 02:07:39,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-21 02:07:55,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1312980.0, ans=0.2 2023-11-21 02:07:56,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1312980.0, ans=0.125 2023-11-21 02:07:57,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 196950 2023-11-21 02:08:00,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-21 02:08:15,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1313046.6666666667, ans=0.0 2023-11-21 02:08:19,054 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 02:08:24,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1313113.3333333333, ans=0.2 2023-11-21 02:08:34,856 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4600, loss[loss=0.07219, simple_loss=0.09359, pruned_loss=0.01652, audio_tagging_loss=0.008875, over 15532.00 frames. ], tot_loss[loss=0.07592, simple_loss=0.09785, pruned_loss=0.01745, audio_tagging_loss=0.009545, over 3049378.96 frames. ], batch size: 59, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:08:37,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1313180.0, ans=0.035 2023-11-21 02:08:46,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2023-11-21 02:08:56,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.061e+01 8.654e+01 9.730e+01 1.161e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 02:09:01,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197000 2023-11-21 02:09:13,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1313380.0, ans=0.2 2023-11-21 02:09:14,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1313380.0, ans=0.125 2023-11-21 02:09:27,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1313446.6666666667, ans=0.0 2023-11-21 02:09:38,899 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4650, loss[loss=0.08467, simple_loss=0.09769, pruned_loss=0.02611, audio_tagging_loss=0.009722, over 15464.00 frames. ], tot_loss[loss=0.07561, simple_loss=0.09711, pruned_loss=0.01739, audio_tagging_loss=0.009661, over 3057360.81 frames. ], batch size: 58, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:09:46,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1313513.3333333333, ans=0.125 2023-11-21 02:09:46,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1313513.3333333333, ans=0.125 2023-11-21 02:09:59,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1313580.0, ans=0.125 2023-11-21 02:10:05,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197050 2023-11-21 02:10:07,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1313646.6666666667, ans=0.2 2023-11-21 02:10:24,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1313713.3333333333, ans=0.125 2023-11-21 02:10:35,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1313780.0, ans=0.0 2023-11-21 02:10:42,950 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4700, loss[loss=0.1074, simple_loss=0.1426, pruned_loss=0.0291, audio_tagging_loss=0.007051, over 15430.00 frames. ], tot_loss[loss=0.07602, simple_loss=0.09744, pruned_loss=0.01751, audio_tagging_loss=0.009796, over 3060844.72 frames. ], batch size: 57, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:11:04,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.068e+01 8.673e+01 9.283e+01 1.102e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 02:11:09,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197100 2023-11-21 02:11:19,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1313980.0, ans=0.5 2023-11-21 02:11:47,012 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4750, loss[loss=0.07751, simple_loss=0.1056, pruned_loss=0.01585, audio_tagging_loss=0.008852, over 14457.00 frames. ], tot_loss[loss=0.07641, simple_loss=0.09783, pruned_loss=0.01754, audio_tagging_loss=0.009955, over 3054896.52 frames. ], batch size: 53, lr: 4.06e-03, grad_scale: 16.0 2023-11-21 02:12:13,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197150 2023-11-21 02:12:15,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1314313.3333333333, ans=0.125 2023-11-21 02:12:44,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1314446.6666666667, ans=0.125 2023-11-21 02:12:50,784 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4800, loss[loss=0.07207, simple_loss=0.08913, pruned_loss=0.01389, audio_tagging_loss=0.01361, over 14237.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09748, pruned_loss=0.0175, audio_tagging_loss=0.01013, over 3049397.10 frames. ], batch size: 52, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 02:13:07,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1314580.0, ans=0.2 2023-11-21 02:13:12,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 8.143e+01 8.825e+01 9.696e+01 1.559e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-21 02:13:18,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197200 2023-11-21 02:13:21,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1314646.6666666667, ans=0.0 2023-11-21 02:13:52,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1314780.0, ans=0.2 2023-11-21 02:13:55,403 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4850, loss[loss=0.074, simple_loss=0.09678, pruned_loss=0.0169, audio_tagging_loss=0.008703, over 13702.00 frames. ], tot_loss[loss=0.07673, simple_loss=0.09777, pruned_loss=0.01769, audio_tagging_loss=0.01015, over 3045627.49 frames. ], batch size: 53, lr: 4.06e-03, grad_scale: 32.0 2023-11-21 02:14:13,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-21 02:14:16,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1314913.3333333333, ans=0.125 2023-11-21 02:14:22,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197250 2023-11-21 02:14:31,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1314980.0, ans=0.125 2023-11-21 02:14:44,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1315046.6666666667, ans=0.125 2023-11-21 02:14:58,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1315180.0, ans=0.2 2023-11-21 02:14:59,942 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4900, loss[loss=0.07623, simple_loss=0.1042, pruned_loss=0.01534, audio_tagging_loss=0.008794, over 15143.00 frames. ], tot_loss[loss=0.07655, simple_loss=0.09762, pruned_loss=0.01764, audio_tagging_loss=0.0101, over 3046127.08 frames. ], batch size: 57, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:15:21,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.125e+01 8.648e+01 9.394e+01 1.252e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 02:15:26,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197300 2023-11-21 02:15:28,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1315313.3333333333, ans=0.1 2023-11-21 02:15:39,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1315380.0, ans=0.0 2023-11-21 02:15:49,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-21 02:15:56,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1315446.6666666667, ans=0.2 2023-11-21 02:16:03,673 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 4950, loss[loss=0.05121, simple_loss=0.05456, pruned_loss=0.01089, audio_tagging_loss=0.01304, over 13393.00 frames. ], tot_loss[loss=0.07587, simple_loss=0.09684, pruned_loss=0.01753, audio_tagging_loss=0.009923, over 3045697.38 frames. ], batch size: 54, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:16:15,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1315580.0, ans=0.2 2023-11-21 02:16:27,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1315580.0, ans=0.125 2023-11-21 02:16:28,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1315646.6666666667, ans=0.2 2023-11-21 02:16:31,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197350 2023-11-21 02:16:31,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1315646.6666666667, ans=0.125 2023-11-21 02:16:41,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2023-11-21 02:16:48,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-21 02:16:50,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-21 02:16:51,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1315713.3333333333, ans=0.125 2023-11-21 02:17:00,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2023-11-21 02:17:07,633 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5000, loss[loss=0.1007, simple_loss=0.1254, pruned_loss=0.0269, audio_tagging_loss=0.01107, over 16283.00 frames. ], tot_loss[loss=0.07614, simple_loss=0.09728, pruned_loss=0.01775, audio_tagging_loss=0.009753, over 3040949.39 frames. ], batch size: 60, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:17:28,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1315913.3333333333, ans=10.0 2023-11-21 02:17:29,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.139e+01 8.846e+01 9.798e+01 1.314e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 02:17:29,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1315913.3333333333, ans=0.125 2023-11-21 02:17:34,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197400 2023-11-21 02:18:00,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1316113.3333333333, ans=0.125 2023-11-21 02:18:05,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1316113.3333333333, ans=0.125 2023-11-21 02:18:12,062 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5050, loss[loss=0.08088, simple_loss=0.09867, pruned_loss=0.02164, audio_tagging_loss=0.009899, over 15890.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09771, pruned_loss=0.01788, audio_tagging_loss=0.009633, over 3040119.63 frames. ], batch size: 57, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:18:38,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197450 2023-11-21 02:18:53,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-11-21 02:19:02,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1316446.6666666667, ans=0.125 2023-11-21 02:19:03,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1316446.6666666667, ans=0.0 2023-11-21 02:19:16,012 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5100, loss[loss=0.08902, simple_loss=0.1087, pruned_loss=0.0241, audio_tagging_loss=0.01059, over 15397.00 frames. ], tot_loss[loss=0.07617, simple_loss=0.09737, pruned_loss=0.01779, audio_tagging_loss=0.009693, over 3039312.74 frames. ], batch size: 56, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:19:32,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1316580.0, ans=10.0 2023-11-21 02:19:37,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 7.939e+01 8.776e+01 9.689e+01 1.456e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 02:19:42,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197500 2023-11-21 02:20:07,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1316780.0, ans=0.125 2023-11-21 02:20:18,818 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5150, loss[loss=0.07144, simple_loss=0.09308, pruned_loss=0.01445, audio_tagging_loss=0.01046, over 16033.00 frames. ], tot_loss[loss=0.07629, simple_loss=0.09779, pruned_loss=0.01767, audio_tagging_loss=0.009724, over 3045752.15 frames. ], batch size: 59, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:20:28,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1316846.6666666667, ans=0.0 2023-11-21 02:20:35,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1316913.3333333333, ans=0.0 2023-11-21 02:20:36,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1316913.3333333333, ans=0.125 2023-11-21 02:20:46,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197550 2023-11-21 02:21:23,759 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5200, loss[loss=0.08982, simple_loss=0.1155, pruned_loss=0.02249, audio_tagging_loss=0.00958, over 15522.00 frames. ], tot_loss[loss=0.07711, simple_loss=0.099, pruned_loss=0.01797, audio_tagging_loss=0.00964, over 3047445.64 frames. ], batch size: 58, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:21:37,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1317246.6666666667, ans=0.0 2023-11-21 02:21:39,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1317246.6666666667, ans=0.125 2023-11-21 02:21:44,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.249e+01 8.824e+01 9.467e+01 1.175e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-21 02:21:49,705 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197600 2023-11-21 02:21:54,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2023-11-21 02:21:59,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1317380.0, ans=0.0 2023-11-21 02:22:03,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1317380.0, ans=0.125 2023-11-21 02:22:27,581 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5250, loss[loss=0.07572, simple_loss=0.1008, pruned_loss=0.0184, audio_tagging_loss=0.006899, over 14269.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.09866, pruned_loss=0.01793, audio_tagging_loss=0.009605, over 3041912.22 frames. ], batch size: 55, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:22:27,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1317513.3333333333, ans=0.125 2023-11-21 02:22:38,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1317580.0, ans=0.0 2023-11-21 02:22:40,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1317580.0, ans=0.0 2023-11-21 02:22:54,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197650 2023-11-21 02:23:24,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2023-11-21 02:23:25,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2023-11-21 02:23:30,858 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5300, loss[loss=0.06631, simple_loss=0.08381, pruned_loss=0.01163, audio_tagging_loss=0.01278, over 14437.00 frames. ], tot_loss[loss=0.07641, simple_loss=0.0981, pruned_loss=0.01773, audio_tagging_loss=0.009631, over 3049909.81 frames. ], batch size: 55, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:23:43,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1317913.3333333333, ans=0.0 2023-11-21 02:23:53,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.065e+01 8.722e+01 9.594e+01 1.316e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 02:23:58,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197700 2023-11-21 02:23:58,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1317980.0, ans=0.2 2023-11-21 02:24:09,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1318046.6666666667, ans=0.125 2023-11-21 02:24:12,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1318046.6666666667, ans=0.2 2023-11-21 02:24:34,802 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5350, loss[loss=0.08292, simple_loss=0.1056, pruned_loss=0.02042, audio_tagging_loss=0.00971, over 14305.00 frames. ], tot_loss[loss=0.07654, simple_loss=0.09831, pruned_loss=0.01776, audio_tagging_loss=0.009629, over 3048485.99 frames. ], batch size: 54, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:24:43,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-21 02:24:53,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2023-11-21 02:24:55,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318246.6666666667, ans=0.1 2023-11-21 02:24:57,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-21 02:25:02,183 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197750 2023-11-21 02:25:10,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1318313.3333333333, ans=0.125 2023-11-21 02:25:27,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1318446.6666666667, ans=0.04949747468305833 2023-11-21 02:25:38,675 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5400, loss[loss=0.07975, simple_loss=0.1079, pruned_loss=0.016, audio_tagging_loss=0.009782, over 15903.00 frames. ], tot_loss[loss=0.07651, simple_loss=0.09827, pruned_loss=0.01769, audio_tagging_loss=0.009687, over 3051379.75 frames. ], batch size: 58, lr: 4.05e-03, grad_scale: 32.0 2023-11-21 02:25:49,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1318580.0, ans=0.125 2023-11-21 02:25:52,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-21 02:26:00,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.677e+01 8.214e+01 8.750e+01 9.356e+01 1.169e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 02:26:04,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197800 2023-11-21 02:26:05,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-21 02:26:08,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2023-11-21 02:26:08,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1318646.6666666667, ans=0.1 2023-11-21 02:26:14,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318646.6666666667, ans=0.1 2023-11-21 02:26:30,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2023-11-21 02:26:37,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1318780.0, ans=0.05 2023-11-21 02:26:38,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1318780.0, ans=0.125 2023-11-21 02:26:41,769 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5450, loss[loss=0.07443, simple_loss=0.09936, pruned_loss=0.01526, audio_tagging_loss=0.00948, over 14584.00 frames. ], tot_loss[loss=0.07664, simple_loss=0.09851, pruned_loss=0.0177, audio_tagging_loss=0.009683, over 3054726.30 frames. ], batch size: 55, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:26:46,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1318846.6666666667, ans=0.0 2023-11-21 02:26:51,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1318846.6666666667, ans=0.0 2023-11-21 02:26:51,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1318846.6666666667, ans=0.0 2023-11-21 02:26:58,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1318913.3333333333, ans=0.2 2023-11-21 02:27:09,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197850 2023-11-21 02:27:11,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1318980.0, ans=0.0 2023-11-21 02:27:17,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-21 02:27:19,459 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:27:41,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1319113.3333333333, ans=0.035 2023-11-21 02:27:45,086 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5500, loss[loss=0.05355, simple_loss=0.07299, pruned_loss=0.007497, audio_tagging_loss=0.009563, over 14293.00 frames. ], tot_loss[loss=0.07611, simple_loss=0.09755, pruned_loss=0.01757, audio_tagging_loss=0.009774, over 3052531.10 frames. ], batch size: 52, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:28:02,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-21 02:28:08,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.510e+01 8.155e+01 8.783e+01 9.511e+01 1.245e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 02:28:12,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197900 2023-11-21 02:28:18,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1319313.3333333333, ans=0.2 2023-11-21 02:28:23,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1319380.0, ans=0.2 2023-11-21 02:28:49,924 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5550, loss[loss=0.09959, simple_loss=0.1282, pruned_loss=0.0242, audio_tagging_loss=0.01131, over 14384.00 frames. ], tot_loss[loss=0.07721, simple_loss=0.09901, pruned_loss=0.01789, audio_tagging_loss=0.009818, over 3051421.14 frames. ], batch size: 55, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:28:57,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1319513.3333333333, ans=0.125 2023-11-21 02:29:11,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1319580.0, ans=0.0 2023-11-21 02:29:15,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 197950 2023-11-21 02:29:29,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1319713.3333333333, ans=0.0 2023-11-21 02:29:40,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=15.0 2023-11-21 02:29:43,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1319780.0, ans=0.125 2023-11-21 02:29:47,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1319780.0, ans=0.125 2023-11-21 02:29:50,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1319780.0, ans=0.0 2023-11-21 02:29:52,785 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5600, loss[loss=0.09242, simple_loss=0.1275, pruned_loss=0.02125, audio_tagging_loss=0.007426, over 15399.00 frames. ], tot_loss[loss=0.07757, simple_loss=0.09965, pruned_loss=0.0178, audio_tagging_loss=0.009941, over 3053506.02 frames. ], batch size: 56, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:29:55,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1319846.6666666667, ans=0.1 2023-11-21 02:30:06,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1319913.3333333333, ans=0.125 2023-11-21 02:30:17,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.123e+01 8.788e+01 9.495e+01 1.515e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 02:30:17,632 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:30:19,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198000 2023-11-21 02:30:26,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1319980.0, ans=0.125 2023-11-21 02:30:37,692 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 02:30:39,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1320046.6666666667, ans=0.125 2023-11-21 02:30:45,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320113.3333333333, ans=0.1 2023-11-21 02:30:47,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1320113.3333333333, ans=0.04949747468305833 2023-11-21 02:30:55,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320180.0, ans=0.1 2023-11-21 02:30:55,900 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5650, loss[loss=0.06297, simple_loss=0.07795, pruned_loss=0.01421, audio_tagging_loss=0.009787, over 15330.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.09945, pruned_loss=0.0178, audio_tagging_loss=0.009977, over 3055406.28 frames. ], batch size: 57, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:31:10,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1320246.6666666667, ans=0.2 2023-11-21 02:31:23,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198050 2023-11-21 02:31:33,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1320380.0, ans=0.2 2023-11-21 02:31:35,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1320380.0, ans=0.125 2023-11-21 02:31:50,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1320446.6666666667, ans=0.0 2023-11-21 02:31:59,793 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5700, loss[loss=0.08718, simple_loss=0.11, pruned_loss=0.0234, audio_tagging_loss=0.00879, over 15914.00 frames. ], tot_loss[loss=0.07664, simple_loss=0.0981, pruned_loss=0.01757, audio_tagging_loss=0.01002, over 3056786.65 frames. ], batch size: 58, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:32:07,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1320513.3333333333, ans=0.0 2023-11-21 02:32:09,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-21 02:32:15,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2023-11-21 02:32:22,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1320580.0, ans=0.0 2023-11-21 02:32:23,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.139e+01 8.085e+01 8.943e+01 9.516e+01 1.355e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-21 02:32:24,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1320646.6666666667, ans=0.0 2023-11-21 02:32:26,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198100 2023-11-21 02:32:31,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1320646.6666666667, ans=0.0 2023-11-21 02:32:37,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-21 02:32:38,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1320713.3333333333, ans=0.0 2023-11-21 02:33:01,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1320780.0, ans=0.2 2023-11-21 02:33:03,899 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5750, loss[loss=0.0844, simple_loss=0.104, pruned_loss=0.02266, audio_tagging_loss=0.009721, over 15132.00 frames. ], tot_loss[loss=0.07594, simple_loss=0.09711, pruned_loss=0.0175, audio_tagging_loss=0.009878, over 3051887.34 frames. ], batch size: 56, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:33:15,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1320913.3333333333, ans=0.2 2023-11-21 02:33:30,766 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198150 2023-11-21 02:33:41,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1321046.6666666667, ans=0.125 2023-11-21 02:33:56,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1321113.3333333333, ans=0.125 2023-11-21 02:34:02,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-21 02:34:06,838 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5800, loss[loss=0.07779, simple_loss=0.1013, pruned_loss=0.01911, audio_tagging_loss=0.00803, over 14966.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09752, pruned_loss=0.01764, audio_tagging_loss=0.009765, over 3045566.85 frames. ], batch size: 56, lr: 4.05e-03, grad_scale: 16.0 2023-11-21 02:34:21,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1321246.6666666667, ans=22.5 2023-11-21 02:34:22,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1321246.6666666667, ans=0.125 2023-11-21 02:34:26,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1321246.6666666667, ans=0.1 2023-11-21 02:34:29,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1321246.6666666667, ans=0.125 2023-11-21 02:34:31,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.619e+01 8.028e+01 8.750e+01 9.322e+01 1.418e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 02:34:32,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1321313.3333333333, ans=0.1 2023-11-21 02:34:34,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198200 2023-11-21 02:34:40,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1321313.3333333333, ans=0.0 2023-11-21 02:34:46,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-11-21 02:34:48,580 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:35:07,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2023-11-21 02:35:11,406 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5850, loss[loss=0.07176, simple_loss=0.09992, pruned_loss=0.0144, audio_tagging_loss=0.0074, over 15596.00 frames. ], tot_loss[loss=0.07571, simple_loss=0.09713, pruned_loss=0.01746, audio_tagging_loss=0.00969, over 3053937.72 frames. ], batch size: 58, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:35:19,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1321513.3333333333, ans=0.125 2023-11-21 02:35:38,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198250 2023-11-21 02:35:40,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-21 02:35:40,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1321646.6666666667, ans=0.1 2023-11-21 02:36:14,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1321846.6666666667, ans=0.0 2023-11-21 02:36:15,452 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5900, loss[loss=0.07383, simple_loss=0.1044, pruned_loss=0.01388, audio_tagging_loss=0.007754, over 15770.00 frames. ], tot_loss[loss=0.07581, simple_loss=0.09753, pruned_loss=0.01742, audio_tagging_loss=0.009626, over 3054416.78 frames. ], batch size: 56, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:36:26,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1321913.3333333333, ans=10.0 2023-11-21 02:36:30,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2023-11-21 02:36:35,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1321913.3333333333, ans=0.0 2023-11-21 02:36:39,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.061e+01 8.738e+01 9.398e+01 2.090e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-21 02:36:41,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198300 2023-11-21 02:36:46,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-11-21 02:36:48,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-21 02:36:59,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-11-21 02:37:11,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2023-11-21 02:37:13,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1322113.3333333333, ans=0.0 2023-11-21 02:37:14,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1322113.3333333333, ans=0.125 2023-11-21 02:37:18,282 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 5950, loss[loss=0.07612, simple_loss=0.1001, pruned_loss=0.01738, audio_tagging_loss=0.008684, over 13789.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09742, pruned_loss=0.01735, audio_tagging_loss=0.009615, over 3053053.88 frames. ], batch size: 53, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:37:24,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1322180.0, ans=0.125 2023-11-21 02:37:40,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-21 02:37:42,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1322246.6666666667, ans=0.125 2023-11-21 02:37:45,628 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198350 2023-11-21 02:38:01,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1322380.0, ans=0.125 2023-11-21 02:38:19,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=22.5 2023-11-21 02:38:22,696 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6000, loss[loss=0.08407, simple_loss=0.1178, pruned_loss=0.01806, audio_tagging_loss=0.007112, over 15662.00 frames. ], tot_loss[loss=0.07588, simple_loss=0.09782, pruned_loss=0.01745, audio_tagging_loss=0.009524, over 3050568.25 frames. ], batch size: 54, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:38:22,697 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 02:38:41,935 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4119, 5.0880, 4.9080, 4.9382], device='cuda:1') 2023-11-21 02:38:44,277 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4495, 2.8502, 4.2366, 3.0548], device='cuda:1') 2023-11-21 02:39:04,194 INFO [train_asr.py:1253] (1/4) Epoch 17, validation: loss=0.06056, simple_loss=0.05273, pruned_loss=0.005281, audio_tagging_loss=0.02892, over 4681554.00 frames. 2023-11-21 02:39:04,195 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 02:39:06,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-11-21 02:39:09,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-21 02:39:11,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1322513.3333333333, ans=0.0 2023-11-21 02:39:14,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1322513.3333333333, ans=0.125 2023-11-21 02:39:28,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.088e+01 8.691e+01 9.569e+01 1.376e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 02:39:31,143 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198400 2023-11-21 02:39:47,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1322713.3333333333, ans=0.125 2023-11-21 02:39:50,589 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 02:39:55,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2023-11-21 02:40:01,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-21 02:40:07,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.66 vs. limit=10.0 2023-11-21 02:40:08,173 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6050, loss[loss=0.06013, simple_loss=0.07736, pruned_loss=0.01181, audio_tagging_loss=0.009632, over 14517.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09746, pruned_loss=0.01743, audio_tagging_loss=0.009565, over 3049895.07 frames. ], batch size: 58, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:40:21,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1322913.3333333333, ans=0.125 2023-11-21 02:40:35,378 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198450 2023-11-21 02:41:10,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1323113.3333333333, ans=0.0 2023-11-21 02:41:10,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1323113.3333333333, ans=15.0 2023-11-21 02:41:11,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1323180.0, ans=0.125 2023-11-21 02:41:12,323 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6100, loss[loss=0.05287, simple_loss=0.06399, pruned_loss=0.01149, audio_tagging_loss=0.009385, over 15747.00 frames. ], tot_loss[loss=0.07582, simple_loss=0.09744, pruned_loss=0.01751, audio_tagging_loss=0.009589, over 3049796.68 frames. ], batch size: 62, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:41:18,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2023-11-21 02:41:20,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1323180.0, ans=0.0 2023-11-21 02:41:36,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.186e+01 8.873e+01 9.701e+01 1.239e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 02:41:39,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198500 2023-11-21 02:41:42,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1323313.3333333333, ans=0.0 2023-11-21 02:41:46,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2023-11-21 02:42:12,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1323446.6666666667, ans=0.125 2023-11-21 02:42:16,014 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6150, loss[loss=0.07893, simple_loss=0.1011, pruned_loss=0.01978, audio_tagging_loss=0.008589, over 15047.00 frames. ], tot_loss[loss=0.07633, simple_loss=0.09788, pruned_loss=0.0177, audio_tagging_loss=0.009691, over 3045743.42 frames. ], batch size: 56, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:42:32,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1323580.0, ans=0.2 2023-11-21 02:42:37,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-21 02:42:39,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1323580.0, ans=0.07 2023-11-21 02:42:43,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198550 2023-11-21 02:42:53,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2023-11-21 02:43:19,572 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6200, loss[loss=0.08532, simple_loss=0.1068, pruned_loss=0.02413, audio_tagging_loss=0.007808, over 15564.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09695, pruned_loss=0.01752, audio_tagging_loss=0.009755, over 3048629.15 frames. ], batch size: 60, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:43:45,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.158e+01 8.699e+01 9.470e+01 1.213e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 02:43:47,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198600 2023-11-21 02:43:57,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1324046.6666666667, ans=0.2 2023-11-21 02:44:07,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1324046.6666666667, ans=0.125 2023-11-21 02:44:23,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1324180.0, ans=0.0 2023-11-21 02:44:24,698 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6250, loss[loss=0.05039, simple_loss=0.05806, pruned_loss=0.007904, audio_tagging_loss=0.01345, over 16957.00 frames. ], tot_loss[loss=0.0758, simple_loss=0.09671, pruned_loss=0.01752, audio_tagging_loss=0.009919, over 3050569.93 frames. ], batch size: 64, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:44:44,936 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:44:49,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1324313.3333333333, ans=0.125 2023-11-21 02:44:50,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198650 2023-11-21 02:45:00,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1324313.3333333333, ans=0.0 2023-11-21 02:45:07,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1324380.0, ans=0.0 2023-11-21 02:45:12,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1324380.0, ans=0.0 2023-11-21 02:45:17,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-21 02:45:26,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1324446.6666666667, ans=0.0 2023-11-21 02:45:28,191 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6300, loss[loss=0.09601, simple_loss=0.1167, pruned_loss=0.02644, audio_tagging_loss=0.01125, over 15066.00 frames. ], tot_loss[loss=0.0761, simple_loss=0.09689, pruned_loss=0.0176, audio_tagging_loss=0.01006, over 3044245.21 frames. ], batch size: 58, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:45:29,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1324513.3333333333, ans=0.1 2023-11-21 02:45:33,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1324513.3333333333, ans=0.0 2023-11-21 02:45:44,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1324580.0, ans=0.125 2023-11-21 02:45:45,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1324580.0, ans=0.0 2023-11-21 02:45:53,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.207e+01 8.844e+01 9.523e+01 1.211e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 02:45:55,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198700 2023-11-21 02:46:12,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1324713.3333333333, ans=0.125 2023-11-21 02:46:13,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1324713.3333333333, ans=0.0 2023-11-21 02:46:16,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1324713.3333333333, ans=0.0 2023-11-21 02:46:16,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1324713.3333333333, ans=0.025 2023-11-21 02:46:20,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1324780.0, ans=0.0 2023-11-21 02:46:23,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1324780.0, ans=10.0 2023-11-21 02:46:31,993 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6350, loss[loss=0.07779, simple_loss=0.1004, pruned_loss=0.01652, audio_tagging_loss=0.01107, over 15938.00 frames. ], tot_loss[loss=0.07628, simple_loss=0.09707, pruned_loss=0.01756, audio_tagging_loss=0.01019, over 3039862.33 frames. ], batch size: 61, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:46:50,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1324913.3333333333, ans=0.2 2023-11-21 02:46:59,703 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198750 2023-11-21 02:47:12,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1325046.6666666667, ans=0.125 2023-11-21 02:47:13,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1325046.6666666667, ans=0.0 2023-11-21 02:47:18,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-21 02:47:22,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1325113.3333333333, ans=0.0 2023-11-21 02:47:33,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1325113.3333333333, ans=0.5 2023-11-21 02:47:36,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-11-21 02:47:37,352 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6400, loss[loss=0.0676, simple_loss=0.08033, pruned_loss=0.01541, audio_tagging_loss=0.01202, over 15027.00 frames. ], tot_loss[loss=0.07653, simple_loss=0.09776, pruned_loss=0.0176, audio_tagging_loss=0.01005, over 3036530.21 frames. ], batch size: 57, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:48:02,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.215e+01 8.788e+01 9.513e+01 1.963e+02, threshold=1.758e+02, percent-clipped=1.0 2023-11-21 02:48:03,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198800 2023-11-21 02:48:04,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1325313.3333333333, ans=0.125 2023-11-21 02:48:13,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1325313.3333333333, ans=0.0 2023-11-21 02:48:18,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1325380.0, ans=0.125 2023-11-21 02:48:22,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-21 02:48:24,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1325380.0, ans=0.0 2023-11-21 02:48:38,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:48:41,903 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6450, loss[loss=0.08911, simple_loss=0.1175, pruned_loss=0.0221, audio_tagging_loss=0.008271, over 17031.00 frames. ], tot_loss[loss=0.07639, simple_loss=0.09732, pruned_loss=0.01762, audio_tagging_loss=0.01011, over 3036950.24 frames. ], batch size: 61, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:48:42,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1325513.3333333333, ans=0.125 2023-11-21 02:48:46,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1325513.3333333333, ans=0.0 2023-11-21 02:49:04,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-21 02:49:07,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198850 2023-11-21 02:49:21,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1325713.3333333333, ans=0.125 2023-11-21 02:49:27,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1325713.3333333333, ans=0.0 2023-11-21 02:49:27,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1325713.3333333333, ans=0.0 2023-11-21 02:49:45,158 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6500, loss[loss=0.04976, simple_loss=0.05314, pruned_loss=0.009948, audio_tagging_loss=0.01324, over 15282.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09634, pruned_loss=0.01748, audio_tagging_loss=0.0101, over 3034273.89 frames. ], batch size: 59, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:49:45,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-21 02:49:48,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1325846.6666666667, ans=0.1 2023-11-21 02:49:55,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1325846.6666666667, ans=0.1 2023-11-21 02:49:56,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1325913.3333333333, ans=0.0 2023-11-21 02:50:05,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2023-11-21 02:50:10,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.009e+01 8.684e+01 9.310e+01 1.221e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 02:50:11,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-11-21 02:50:11,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198900 2023-11-21 02:50:26,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1326046.6666666667, ans=0.125 2023-11-21 02:50:48,728 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6550, loss[loss=0.08162, simple_loss=0.114, pruned_loss=0.01611, audio_tagging_loss=0.008501, over 15511.00 frames. ], tot_loss[loss=0.07584, simple_loss=0.09681, pruned_loss=0.01755, audio_tagging_loss=0.009875, over 3036272.61 frames. ], batch size: 57, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:50:58,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1326180.0, ans=15.0 2023-11-21 02:51:09,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1326246.6666666667, ans=0.2 2023-11-21 02:51:15,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 198950 2023-11-21 02:51:19,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326313.3333333333, ans=0.1 2023-11-21 02:51:44,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1326446.6666666667, ans=0.0 2023-11-21 02:51:52,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-11-21 02:51:52,734 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6600, loss[loss=0.07402, simple_loss=0.09703, pruned_loss=0.01581, audio_tagging_loss=0.009687, over 16068.00 frames. ], tot_loss[loss=0.07614, simple_loss=0.09744, pruned_loss=0.01764, audio_tagging_loss=0.009777, over 3033310.49 frames. ], batch size: 59, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:51:57,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1326513.3333333333, ans=0.0 2023-11-21 02:51:59,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1326513.3333333333, ans=0.125 2023-11-21 02:52:17,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.252e+01 8.813e+01 9.521e+01 1.214e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-21 02:52:19,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199000 2023-11-21 02:52:24,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-21 02:52:54,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=12.0 2023-11-21 02:52:56,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1326846.6666666667, ans=0.125 2023-11-21 02:52:57,303 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6650, loss[loss=0.04255, simple_loss=0.05509, pruned_loss=0.007069, audio_tagging_loss=0.007939, over 14992.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.09681, pruned_loss=0.01762, audio_tagging_loss=0.009758, over 3030801.21 frames. ], batch size: 57, lr: 4.04e-03, grad_scale: 32.0 2023-11-21 02:52:57,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1326846.6666666667, ans=0.125 2023-11-21 02:52:58,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1326846.6666666667, ans=0.09899494936611666 2023-11-21 02:53:00,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1326846.6666666667, ans=0.125 2023-11-21 02:53:25,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199050 2023-11-21 02:53:42,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2023-11-21 02:53:44,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1327046.6666666667, ans=0.125 2023-11-21 02:53:53,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1327113.3333333333, ans=0.125 2023-11-21 02:54:01,142 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6700, loss[loss=0.08375, simple_loss=0.1094, pruned_loss=0.02273, audio_tagging_loss=0.006313, over 16090.00 frames. ], tot_loss[loss=0.07554, simple_loss=0.09667, pruned_loss=0.01747, audio_tagging_loss=0.009735, over 3034864.38 frames. ], batch size: 59, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:54:08,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1327180.0, ans=0.0 2023-11-21 02:54:10,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-11-21 02:54:13,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-21 02:54:22,414 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 02:54:26,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1327246.6666666667, ans=0.125 2023-11-21 02:54:29,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 7.825e+01 8.466e+01 9.524e+01 1.131e+02, threshold=1.693e+02, percent-clipped=0.0 2023-11-21 02:54:29,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199100 2023-11-21 02:54:32,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1327313.3333333333, ans=0.2 2023-11-21 02:54:40,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1327380.0, ans=0.125 2023-11-21 02:55:06,854 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6750, loss[loss=0.07618, simple_loss=0.09973, pruned_loss=0.01686, audio_tagging_loss=0.009454, over 15165.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09597, pruned_loss=0.01738, audio_tagging_loss=0.009716, over 3033292.47 frames. ], batch size: 54, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:55:19,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1327580.0, ans=0.125 2023-11-21 02:55:32,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199150 2023-11-21 02:56:10,670 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6800, loss[loss=0.07802, simple_loss=0.1096, pruned_loss=0.0161, audio_tagging_loss=0.007134, over 16211.00 frames. ], tot_loss[loss=0.075, simple_loss=0.0962, pruned_loss=0.0173, audio_tagging_loss=0.009599, over 3035129.72 frames. ], batch size: 59, lr: 4.04e-03, grad_scale: 16.0 2023-11-21 02:56:12,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1327846.6666666667, ans=0.2 2023-11-21 02:56:24,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1327913.3333333333, ans=0.2 2023-11-21 02:56:37,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199200 2023-11-21 02:56:38,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.071e+01 8.833e+01 9.608e+01 1.176e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 02:57:11,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1328113.3333333333, ans=0.2 2023-11-21 02:57:14,621 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6850, loss[loss=0.07404, simple_loss=0.09443, pruned_loss=0.01793, audio_tagging_loss=0.00889, over 15228.00 frames. ], tot_loss[loss=0.075, simple_loss=0.09656, pruned_loss=0.01716, audio_tagging_loss=0.009554, over 3042302.82 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 02:57:42,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199250 2023-11-21 02:58:06,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1328446.6666666667, ans=0.0 2023-11-21 02:58:19,381 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6900, loss[loss=0.09869, simple_loss=0.1292, pruned_loss=0.025, audio_tagging_loss=0.009103, over 15003.00 frames. ], tot_loss[loss=0.07532, simple_loss=0.09703, pruned_loss=0.01724, audio_tagging_loss=0.009557, over 3042214.15 frames. ], batch size: 54, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 02:58:23,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2023-11-21 02:58:26,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1328513.3333333333, ans=0.125 2023-11-21 02:58:44,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1328646.6666666667, ans=0.125 2023-11-21 02:58:46,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199300 2023-11-21 02:58:47,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 7.846e+01 8.588e+01 9.245e+01 1.215e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-21 02:59:08,608 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 02:59:12,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1328780.0, ans=0.0 2023-11-21 02:59:23,845 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 6950, loss[loss=0.0819, simple_loss=0.1094, pruned_loss=0.01987, audio_tagging_loss=0.007303, over 14674.00 frames. ], tot_loss[loss=0.07594, simple_loss=0.09765, pruned_loss=0.01755, audio_tagging_loss=0.009561, over 3046817.34 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 02:59:38,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-21 02:59:51,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199350 2023-11-21 03:00:00,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1329046.6666666667, ans=0.0 2023-11-21 03:00:02,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329046.6666666667, ans=0.1 2023-11-21 03:00:12,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1329046.6666666667, ans=0.125 2023-11-21 03:00:25,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1329113.3333333333, ans=0.125 2023-11-21 03:00:27,843 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7000, loss[loss=0.07806, simple_loss=0.1041, pruned_loss=0.01702, audio_tagging_loss=0.008973, over 15305.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09715, pruned_loss=0.01741, audio_tagging_loss=0.009687, over 3049523.44 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:00:38,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2023-11-21 03:00:43,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1329246.6666666667, ans=0.0 2023-11-21 03:00:53,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1329313.3333333333, ans=0.1 2023-11-21 03:00:55,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199400 2023-11-21 03:00:57,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.260e+01 8.901e+01 9.434e+01 1.193e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 03:01:05,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1329313.3333333333, ans=0.125 2023-11-21 03:01:07,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-21 03:01:27,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1329446.6666666667, ans=0.125 2023-11-21 03:01:28,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2023-11-21 03:01:30,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1329446.6666666667, ans=0.125 2023-11-21 03:01:32,995 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7050, loss[loss=0.09413, simple_loss=0.1268, pruned_loss=0.02179, audio_tagging_loss=0.008953, over 15522.00 frames. ], tot_loss[loss=0.07598, simple_loss=0.0974, pruned_loss=0.01761, audio_tagging_loss=0.009673, over 3045819.44 frames. ], batch size: 55, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:01:39,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1329513.3333333333, ans=0.125 2023-11-21 03:01:42,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1329513.3333333333, ans=0.0 2023-11-21 03:01:54,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1329580.0, ans=0.05 2023-11-21 03:01:56,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1329580.0, ans=0.125 2023-11-21 03:02:00,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199450 2023-11-21 03:02:03,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2023-11-21 03:02:04,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1329646.6666666667, ans=0.125 2023-11-21 03:02:35,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1329780.0, ans=0.0 2023-11-21 03:02:38,000 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7100, loss[loss=0.08322, simple_loss=0.1056, pruned_loss=0.02201, audio_tagging_loss=0.008406, over 14199.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09699, pruned_loss=0.01755, audio_tagging_loss=0.009846, over 3046844.29 frames. ], batch size: 54, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:02:48,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329846.6666666667, ans=0.1 2023-11-21 03:02:59,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1329913.3333333333, ans=0.125 2023-11-21 03:02:59,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1329913.3333333333, ans=0.125 2023-11-21 03:03:05,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199500 2023-11-21 03:03:06,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 7.897e+01 8.673e+01 9.642e+01 1.180e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 03:03:20,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1330046.6666666667, ans=0.0 2023-11-21 03:03:33,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1330113.3333333333, ans=0.0 2023-11-21 03:03:42,082 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7150, loss[loss=0.08593, simple_loss=0.1127, pruned_loss=0.021, audio_tagging_loss=0.008584, over 14824.00 frames. ], tot_loss[loss=0.07656, simple_loss=0.09802, pruned_loss=0.01774, audio_tagging_loss=0.009804, over 3040413.77 frames. ], batch size: 57, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:03:46,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1330180.0, ans=0.125 2023-11-21 03:03:55,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2023-11-21 03:03:55,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.01 vs. limit=15.0 2023-11-21 03:03:58,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2023-11-21 03:04:09,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199550 2023-11-21 03:04:10,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-21 03:04:21,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-21 03:04:46,669 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7200, loss[loss=0.07897, simple_loss=0.1, pruned_loss=0.01769, audio_tagging_loss=0.01126, over 16289.00 frames. ], tot_loss[loss=0.07707, simple_loss=0.0987, pruned_loss=0.01782, audio_tagging_loss=0.009899, over 3043667.19 frames. ], batch size: 58, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:05:02,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-21 03:05:13,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199600 2023-11-21 03:05:16,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 7.951e+01 8.767e+01 9.502e+01 1.229e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 03:05:19,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1330646.6666666667, ans=0.0 2023-11-21 03:05:20,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1330646.6666666667, ans=0.125 2023-11-21 03:05:25,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1330713.3333333333, ans=0.0 2023-11-21 03:05:34,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1330713.3333333333, ans=0.0 2023-11-21 03:05:50,868 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7250, loss[loss=0.06296, simple_loss=0.07242, pruned_loss=0.01383, audio_tagging_loss=0.01293, over 15439.00 frames. ], tot_loss[loss=0.07643, simple_loss=0.09767, pruned_loss=0.01756, audio_tagging_loss=0.01003, over 3044677.13 frames. ], batch size: 58, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:05:51,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1330846.6666666667, ans=0.2 2023-11-21 03:05:56,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1330846.6666666667, ans=0.125 2023-11-21 03:06:02,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1330913.3333333333, ans=0.0 2023-11-21 03:06:18,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199650 2023-11-21 03:06:54,326 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7300, loss[loss=0.07929, simple_loss=0.1164, pruned_loss=0.0144, audio_tagging_loss=0.006675, over 15744.00 frames. ], tot_loss[loss=0.07644, simple_loss=0.09809, pruned_loss=0.01751, audio_tagging_loss=0.009888, over 3044009.36 frames. ], batch size: 57, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:07:14,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1331246.6666666667, ans=0.125 2023-11-21 03:07:20,991 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199700 2023-11-21 03:07:23,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.086e+01 8.011e+01 8.733e+01 9.607e+01 1.282e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 03:07:24,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1331313.3333333333, ans=0.125 2023-11-21 03:07:30,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1331313.3333333333, ans=0.125 2023-11-21 03:07:38,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1331380.0, ans=0.2 2023-11-21 03:07:56,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1331446.6666666667, ans=0.125 2023-11-21 03:07:58,460 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7350, loss[loss=0.09616, simple_loss=0.1264, pruned_loss=0.02313, audio_tagging_loss=0.009813, over 15569.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.0997, pruned_loss=0.01799, audio_tagging_loss=0.009659, over 3043264.81 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:08:07,152 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:08:07,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1331513.3333333333, ans=0.0 2023-11-21 03:08:24,790 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199750 2023-11-21 03:08:27,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1331646.6666666667, ans=0.125 2023-11-21 03:08:36,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1331713.3333333333, ans=0.0 2023-11-21 03:08:40,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331713.3333333333, ans=0.1 2023-11-21 03:08:52,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1331780.0, ans=0.125 2023-11-21 03:08:59,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331780.0, ans=0.1 2023-11-21 03:09:02,762 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7400, loss[loss=0.05911, simple_loss=0.07734, pruned_loss=0.01237, audio_tagging_loss=0.008071, over 14589.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09834, pruned_loss=0.0176, audio_tagging_loss=0.009647, over 3042701.34 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:09:29,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199800 2023-11-21 03:09:32,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.394e+01 8.010e+01 8.731e+01 9.621e+01 1.537e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 03:09:37,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1331980.0, ans=0.04949747468305833 2023-11-21 03:09:43,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1332046.6666666667, ans=10.0 2023-11-21 03:09:49,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1332046.6666666667, ans=0.0 2023-11-21 03:10:01,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1332113.3333333333, ans=0.125 2023-11-21 03:10:06,587 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7450, loss[loss=0.04529, simple_loss=0.05342, pruned_loss=0.007621, audio_tagging_loss=0.01096, over 14804.00 frames. ], tot_loss[loss=0.07612, simple_loss=0.09781, pruned_loss=0.01763, audio_tagging_loss=0.00959, over 3045717.83 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:10:06,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1332180.0, ans=0.0 2023-11-21 03:10:33,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199850 2023-11-21 03:10:47,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1332380.0, ans=0.0 2023-11-21 03:10:51,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1332380.0, ans=0.0 2023-11-21 03:11:10,949 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7500, loss[loss=0.07501, simple_loss=0.09815, pruned_loss=0.01825, audio_tagging_loss=0.007688, over 15081.00 frames. ], tot_loss[loss=0.07648, simple_loss=0.09842, pruned_loss=0.01772, audio_tagging_loss=0.009545, over 3043283.84 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:11:21,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1332513.3333333333, ans=0.125 2023-11-21 03:11:29,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2023-11-21 03:11:37,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199900 2023-11-21 03:11:39,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.350e+01 8.922e+01 9.540e+01 1.272e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-21 03:12:14,029 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7550, loss[loss=0.08076, simple_loss=0.09975, pruned_loss=0.02076, audio_tagging_loss=0.01012, over 14138.00 frames. ], tot_loss[loss=0.07609, simple_loss=0.0979, pruned_loss=0.01753, audio_tagging_loss=0.009603, over 3042659.06 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 16.0 2023-11-21 03:12:19,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-21 03:12:26,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1332913.3333333333, ans=0.2 2023-11-21 03:12:28,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1332913.3333333333, ans=0.125 2023-11-21 03:12:40,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 199950 2023-11-21 03:12:53,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1333046.6666666667, ans=15.0 2023-11-21 03:13:01,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-11-21 03:13:10,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-11-21 03:13:15,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1333113.3333333333, ans=0.125 2023-11-21 03:13:17,615 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7600, loss[loss=0.08008, simple_loss=0.1, pruned_loss=0.02165, audio_tagging_loss=0.008427, over 14532.00 frames. ], tot_loss[loss=0.07577, simple_loss=0.0974, pruned_loss=0.0174, audio_tagging_loss=0.00967, over 3043020.82 frames. ], batch size: 54, lr: 4.03e-03, grad_scale: 32.0 2023-11-21 03:13:20,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1333180.0, ans=0.125 2023-11-21 03:13:22,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1333180.0, ans=0.125 2023-11-21 03:13:28,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1333246.6666666667, ans=0.5 2023-11-21 03:13:28,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1333246.6666666667, ans=0.125 2023-11-21 03:13:45,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200000 2023-11-21 03:13:50,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.060e+01 8.807e+01 9.387e+01 1.277e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 03:13:52,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1333313.3333333333, ans=0.025 2023-11-21 03:14:01,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333380.0, ans=0.1 2023-11-21 03:14:02,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1333380.0, ans=0.02 2023-11-21 03:14:18,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-21 03:14:18,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1333446.6666666667, ans=0.125 2023-11-21 03:14:25,256 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7650, loss[loss=0.05337, simple_loss=0.06678, pruned_loss=0.009989, audio_tagging_loss=0.009998, over 14974.00 frames. ], tot_loss[loss=0.07531, simple_loss=0.09665, pruned_loss=0.01734, audio_tagging_loss=0.009647, over 3046770.11 frames. ], batch size: 56, lr: 4.03e-03, grad_scale: 32.0 2023-11-21 03:14:36,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1333513.3333333333, ans=0.0 2023-11-21 03:14:39,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-21 03:14:41,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1333580.0, ans=0.07 2023-11-21 03:14:50,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333646.6666666667, ans=0.1 2023-11-21 03:14:52,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200050 2023-11-21 03:14:57,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1333646.6666666667, ans=0.1 2023-11-21 03:15:10,251 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:15:18,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1333780.0, ans=0.125 2023-11-21 03:15:29,945 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7700, loss[loss=0.09689, simple_loss=0.1273, pruned_loss=0.0253, audio_tagging_loss=0.007942, over 15185.00 frames. ], tot_loss[loss=0.07544, simple_loss=0.09698, pruned_loss=0.01732, audio_tagging_loss=0.009631, over 3051949.12 frames. ], batch size: 55, lr: 4.03e-03, grad_scale: 32.0 2023-11-21 03:15:37,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-11-21 03:15:55,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200100 2023-11-21 03:15:58,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.520e+01 8.098e+01 8.698e+01 9.708e+01 1.361e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 03:16:14,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1334046.6666666667, ans=0.0 2023-11-21 03:16:21,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1334113.3333333333, ans=0.1 2023-11-21 03:16:21,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1334113.3333333333, ans=0.125 2023-11-21 03:16:22,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1334113.3333333333, ans=0.125 2023-11-21 03:16:33,104 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7750, loss[loss=0.08298, simple_loss=0.1038, pruned_loss=0.02068, audio_tagging_loss=0.01042, over 15537.00 frames. ], tot_loss[loss=0.07586, simple_loss=0.09744, pruned_loss=0.01744, audio_tagging_loss=0.009692, over 3053634.73 frames. ], batch size: 57, lr: 4.03e-03, grad_scale: 32.0 2023-11-21 03:16:33,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1334180.0, ans=0.025 2023-11-21 03:16:44,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1334246.6666666667, ans=0.125 2023-11-21 03:17:00,618 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200150 2023-11-21 03:17:00,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1334313.3333333333, ans=10.0 2023-11-21 03:17:16,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1334380.0, ans=0.2 2023-11-21 03:17:17,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-21 03:17:24,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1334446.6666666667, ans=0.125 2023-11-21 03:17:24,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1334446.6666666667, ans=0.125 2023-11-21 03:17:36,766 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7800, loss[loss=0.07293, simple_loss=0.1031, pruned_loss=0.01318, audio_tagging_loss=0.008181, over 16225.00 frames. ], tot_loss[loss=0.07558, simple_loss=0.09687, pruned_loss=0.01735, audio_tagging_loss=0.009793, over 3055006.41 frames. ], batch size: 58, lr: 4.03e-03, grad_scale: 32.0 2023-11-21 03:17:40,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-21 03:18:04,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200200 2023-11-21 03:18:07,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 7.982e+01 8.620e+01 9.218e+01 1.876e+02, threshold=1.724e+02, percent-clipped=1.0 2023-11-21 03:18:26,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1334713.3333333333, ans=0.0 2023-11-21 03:18:37,646 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:18:42,223 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7850, loss[loss=0.09554, simple_loss=0.1247, pruned_loss=0.02454, audio_tagging_loss=0.008652, over 14654.00 frames. ], tot_loss[loss=0.07565, simple_loss=0.09673, pruned_loss=0.01747, audio_tagging_loss=0.009813, over 3050095.20 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:18:55,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1334913.3333333333, ans=0.125 2023-11-21 03:19:08,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200250 2023-11-21 03:19:16,210 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:19:29,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1335046.6666666667, ans=0.125 2023-11-21 03:19:46,680 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7900, loss[loss=0.07311, simple_loss=0.08552, pruned_loss=0.01724, audio_tagging_loss=0.01311, over 15165.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09708, pruned_loss=0.01749, audio_tagging_loss=0.009858, over 3050836.74 frames. ], batch size: 58, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:19:48,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1335180.0, ans=0.1 2023-11-21 03:19:50,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1335180.0, ans=0.0 2023-11-21 03:20:04,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-21 03:20:05,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1335246.6666666667, ans=0.2 2023-11-21 03:20:13,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200300 2023-11-21 03:20:16,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.233e+01 9.036e+01 9.766e+01 1.372e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 03:20:18,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1335313.3333333333, ans=0.125 2023-11-21 03:20:33,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1335380.0, ans=0.125 2023-11-21 03:20:42,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1335446.6666666667, ans=15.0 2023-11-21 03:20:47,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1335446.6666666667, ans=0.125 2023-11-21 03:20:49,703 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 7950, loss[loss=0.06512, simple_loss=0.0835, pruned_loss=0.0125, audio_tagging_loss=0.01087, over 14724.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09661, pruned_loss=0.01748, audio_tagging_loss=0.009934, over 3044959.77 frames. ], batch size: 55, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:20:52,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1335513.3333333333, ans=0.0 2023-11-21 03:21:04,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1335580.0, ans=0.125 2023-11-21 03:21:05,450 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 03:21:17,883 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200350 2023-11-21 03:21:51,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1335780.0, ans=0.2 2023-11-21 03:21:54,635 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8000, loss[loss=0.1039, simple_loss=0.1389, pruned_loss=0.02638, audio_tagging_loss=0.008105, over 14948.00 frames. ], tot_loss[loss=0.0755, simple_loss=0.09633, pruned_loss=0.01735, audio_tagging_loss=0.009988, over 3045714.29 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:22:08,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335913.3333333333, ans=0.1 2023-11-21 03:22:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335980.0, ans=0.1 2023-11-21 03:22:21,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200400 2023-11-21 03:22:22,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-21 03:22:24,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.612e+01 7.996e+01 8.659e+01 9.313e+01 1.449e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 03:22:27,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-21 03:22:35,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2023-11-21 03:22:59,639 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8050, loss[loss=0.07891, simple_loss=0.09933, pruned_loss=0.02054, audio_tagging_loss=0.008705, over 15305.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09718, pruned_loss=0.01762, audio_tagging_loss=0.009944, over 3041207.93 frames. ], batch size: 56, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:23:06,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-21 03:23:08,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2023-11-21 03:23:17,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1336246.6666666667, ans=0.125 2023-11-21 03:23:26,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200450 2023-11-21 03:23:41,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1336380.0, ans=0.0 2023-11-21 03:23:47,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1336380.0, ans=0.125 2023-11-21 03:24:01,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1336513.3333333333, ans=0.1 2023-11-21 03:24:02,432 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8100, loss[loss=0.08333, simple_loss=0.1086, pruned_loss=0.01991, audio_tagging_loss=0.009104, over 16356.00 frames. ], tot_loss[loss=0.0756, simple_loss=0.09678, pruned_loss=0.01743, audio_tagging_loss=0.009771, over 3040289.49 frames. ], batch size: 63, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:24:02,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1336513.3333333333, ans=0.0 2023-11-21 03:24:18,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1336580.0, ans=0.0 2023-11-21 03:24:29,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200500 2023-11-21 03:24:29,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1336646.6666666667, ans=0.0 2023-11-21 03:24:32,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.135e+01 8.794e+01 9.348e+01 1.385e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 03:24:34,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1336646.6666666667, ans=0.0 2023-11-21 03:24:53,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2023-11-21 03:24:57,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1336780.0, ans=0.1 2023-11-21 03:25:06,596 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8150, loss[loss=0.08925, simple_loss=0.1195, pruned_loss=0.02175, audio_tagging_loss=0.007739, over 16640.00 frames. ], tot_loss[loss=0.07558, simple_loss=0.09672, pruned_loss=0.01754, audio_tagging_loss=0.009688, over 3042117.58 frames. ], batch size: 59, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:25:12,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1336846.6666666667, ans=0.1 2023-11-21 03:25:18,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1336913.3333333333, ans=0.0 2023-11-21 03:25:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1336913.3333333333, ans=0.125 2023-11-21 03:25:33,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200550 2023-11-21 03:25:43,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1337046.6666666667, ans=0.125 2023-11-21 03:26:11,135 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8200, loss[loss=0.09235, simple_loss=0.1122, pruned_loss=0.0276, audio_tagging_loss=0.008631, over 15615.00 frames. ], tot_loss[loss=0.07541, simple_loss=0.09659, pruned_loss=0.01742, audio_tagging_loss=0.009694, over 3041680.33 frames. ], batch size: 59, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:26:11,166 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 03:26:30,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1337246.6666666667, ans=0.0 2023-11-21 03:26:37,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200600 2023-11-21 03:26:40,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 8.383e+01 9.269e+01 1.042e+02 1.722e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-21 03:26:45,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-21 03:26:45,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1337313.3333333333, ans=0.125 2023-11-21 03:26:50,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-21 03:26:52,066 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:26:56,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1337380.0, ans=0.5 2023-11-21 03:27:09,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1337446.6666666667, ans=0.125 2023-11-21 03:27:15,011 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8250, loss[loss=0.08469, simple_loss=0.1104, pruned_loss=0.01991, audio_tagging_loss=0.009565, over 15939.00 frames. ], tot_loss[loss=0.07567, simple_loss=0.09724, pruned_loss=0.01752, audio_tagging_loss=0.009539, over 3038460.08 frames. ], batch size: 57, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:27:25,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1337513.3333333333, ans=0.0 2023-11-21 03:27:37,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1337580.0, ans=0.05 2023-11-21 03:27:41,843 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200650 2023-11-21 03:27:51,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1337646.6666666667, ans=0.125 2023-11-21 03:28:00,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1337713.3333333333, ans=0.125 2023-11-21 03:28:13,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-21 03:28:13,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1337780.0, ans=0.125 2023-11-21 03:28:13,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1337780.0, ans=0.0 2023-11-21 03:28:19,774 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8300, loss[loss=0.08586, simple_loss=0.1148, pruned_loss=0.02102, audio_tagging_loss=0.007429, over 14148.00 frames. ], tot_loss[loss=0.07535, simple_loss=0.09657, pruned_loss=0.01746, audio_tagging_loss=0.009609, over 3042198.01 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:28:46,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200700 2023-11-21 03:28:50,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.192e+01 8.783e+01 9.537e+01 1.149e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 03:29:15,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2023-11-21 03:29:16,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-21 03:29:23,816 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8350, loss[loss=0.08122, simple_loss=0.1084, pruned_loss=0.01901, audio_tagging_loss=0.008027, over 13899.00 frames. ], tot_loss[loss=0.07611, simple_loss=0.09764, pruned_loss=0.01767, audio_tagging_loss=0.009624, over 3050159.98 frames. ], batch size: 52, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:29:27,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1338180.0, ans=0.07 2023-11-21 03:29:30,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1338180.0, ans=0.05 2023-11-21 03:29:50,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200750 2023-11-21 03:29:55,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=12.0 2023-11-21 03:30:11,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1338380.0, ans=0.04949747468305833 2023-11-21 03:30:27,687 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8400, loss[loss=0.06756, simple_loss=0.08041, pruned_loss=0.01777, audio_tagging_loss=0.009579, over 14013.00 frames. ], tot_loss[loss=0.07598, simple_loss=0.09721, pruned_loss=0.01771, audio_tagging_loss=0.00967, over 3047907.74 frames. ], batch size: 55, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:30:42,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1338580.0, ans=0.0 2023-11-21 03:30:54,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200800 2023-11-21 03:30:55,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1338646.6666666667, ans=0.2 2023-11-21 03:30:57,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.229e+01 8.784e+01 9.362e+01 1.197e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 03:31:10,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1338713.3333333333, ans=0.1 2023-11-21 03:31:16,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1338713.3333333333, ans=0.125 2023-11-21 03:31:21,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1338780.0, ans=0.1 2023-11-21 03:31:31,819 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8450, loss[loss=0.07936, simple_loss=0.09937, pruned_loss=0.01829, audio_tagging_loss=0.01139, over 14855.00 frames. ], tot_loss[loss=0.07634, simple_loss=0.09771, pruned_loss=0.01788, audio_tagging_loss=0.0096, over 3043146.29 frames. ], batch size: 56, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:31:49,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1338913.3333333333, ans=0.125 2023-11-21 03:31:52,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1338913.3333333333, ans=0.125 2023-11-21 03:31:58,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200850 2023-11-21 03:32:10,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1339046.6666666667, ans=0.125 2023-11-21 03:32:28,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1339113.3333333333, ans=0.09899494936611666 2023-11-21 03:32:34,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1339180.0, ans=0.125 2023-11-21 03:32:35,652 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8500, loss[loss=0.06368, simple_loss=0.07891, pruned_loss=0.01542, audio_tagging_loss=0.008801, over 13985.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09802, pruned_loss=0.01771, audio_tagging_loss=0.009649, over 3044154.44 frames. ], batch size: 53, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:32:40,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1339180.0, ans=0.125 2023-11-21 03:33:01,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1339313.3333333333, ans=0.2 2023-11-21 03:33:02,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200900 2023-11-21 03:33:03,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-11-21 03:33:06,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 7.935e+01 8.661e+01 9.459e+01 1.270e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 03:33:28,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2023-11-21 03:33:39,579 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8550, loss[loss=0.08822, simple_loss=0.1169, pruned_loss=0.02251, audio_tagging_loss=0.007268, over 14237.00 frames. ], tot_loss[loss=0.07571, simple_loss=0.09706, pruned_loss=0.01751, audio_tagging_loss=0.009675, over 3040995.98 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:33:44,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1339513.3333333333, ans=0.125 2023-11-21 03:33:55,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-21 03:34:06,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 200950 2023-11-21 03:34:08,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1339646.6666666667, ans=0.0 2023-11-21 03:34:10,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-21 03:34:14,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1339646.6666666667, ans=0.125 2023-11-21 03:34:18,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-21 03:34:31,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1339780.0, ans=0.125 2023-11-21 03:34:32,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1339780.0, ans=0.125 2023-11-21 03:34:32,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1339780.0, ans=0.2 2023-11-21 03:34:39,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.44 vs. limit=10.0 2023-11-21 03:34:43,515 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8600, loss[loss=0.06817, simple_loss=0.08252, pruned_loss=0.01546, audio_tagging_loss=0.01145, over 14774.00 frames. ], tot_loss[loss=0.07506, simple_loss=0.09598, pruned_loss=0.01727, audio_tagging_loss=0.009801, over 3040645.61 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:34:47,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-21 03:35:10,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201000 2023-11-21 03:35:14,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.482e+01 7.981e+01 8.707e+01 9.381e+01 1.149e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-21 03:35:16,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1339980.0, ans=0.125 2023-11-21 03:35:43,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1340113.3333333333, ans=0.0 2023-11-21 03:35:47,419 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8650, loss[loss=0.07145, simple_loss=0.07857, pruned_loss=0.0175, audio_tagging_loss=0.01466, over 15308.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.09711, pruned_loss=0.01743, audio_tagging_loss=0.009791, over 3055808.34 frames. ], batch size: 59, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:35:48,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1340180.0, ans=0.1 2023-11-21 03:36:14,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201050 2023-11-21 03:36:14,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1340313.3333333333, ans=0.125 2023-11-21 03:36:38,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1340446.6666666667, ans=0.125 2023-11-21 03:36:45,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1340446.6666666667, ans=0.125 2023-11-21 03:36:49,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1340513.3333333333, ans=0.0 2023-11-21 03:36:50,936 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8700, loss[loss=0.06759, simple_loss=0.07755, pruned_loss=0.01778, audio_tagging_loss=0.01103, over 13877.00 frames. ], tot_loss[loss=0.0763, simple_loss=0.0977, pruned_loss=0.01763, audio_tagging_loss=0.009819, over 3050203.58 frames. ], batch size: 54, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:36:52,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=12.0 2023-11-21 03:37:13,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1340580.0, ans=0.125 2023-11-21 03:37:17,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201100 2023-11-21 03:37:22,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.240e+01 8.992e+01 1.008e+02 1.229e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-21 03:37:37,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-21 03:37:40,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1340780.0, ans=0.1 2023-11-21 03:37:46,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1340780.0, ans=0.125 2023-11-21 03:37:54,263 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8750, loss[loss=0.06799, simple_loss=0.0842, pruned_loss=0.0139, audio_tagging_loss=0.01198, over 16939.00 frames. ], tot_loss[loss=0.0766, simple_loss=0.098, pruned_loss=0.01767, audio_tagging_loss=0.009934, over 3054542.45 frames. ], batch size: 66, lr: 4.02e-03, grad_scale: 16.0 2023-11-21 03:37:54,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2023-11-21 03:37:57,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1340846.6666666667, ans=0.1 2023-11-21 03:38:21,163 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201150 2023-11-21 03:38:21,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1340980.0, ans=0.125 2023-11-21 03:38:28,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1340980.0, ans=0.125 2023-11-21 03:38:38,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1341046.6666666667, ans=0.125 2023-11-21 03:38:51,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1341113.3333333333, ans=0.0 2023-11-21 03:38:58,552 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8800, loss[loss=0.0817, simple_loss=0.1025, pruned_loss=0.01796, audio_tagging_loss=0.01252, over 15383.00 frames. ], tot_loss[loss=0.07747, simple_loss=0.09872, pruned_loss=0.01796, audio_tagging_loss=0.01015, over 3053926.68 frames. ], batch size: 58, lr: 4.02e-03, grad_scale: 32.0 2023-11-21 03:39:06,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1341180.0, ans=0.125 2023-11-21 03:39:07,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-11-21 03:39:13,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1341246.6666666667, ans=0.125 2023-11-21 03:39:24,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201200 2023-11-21 03:39:26,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1341313.3333333333, ans=0.2 2023-11-21 03:39:28,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1341313.3333333333, ans=0.1 2023-11-21 03:39:30,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.266e+01 8.872e+01 9.572e+01 1.229e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 03:39:40,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1341380.0, ans=0.125 2023-11-21 03:39:51,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2023-11-21 03:40:02,356 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8850, loss[loss=0.1076, simple_loss=0.1388, pruned_loss=0.02985, audio_tagging_loss=0.008347, over 15388.00 frames. ], tot_loss[loss=0.07697, simple_loss=0.09843, pruned_loss=0.01765, audio_tagging_loss=0.0101, over 3052372.79 frames. ], batch size: 54, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:40:13,280 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 03:40:28,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1341646.6666666667, ans=0.015 2023-11-21 03:40:29,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201250 2023-11-21 03:40:29,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1341646.6666666667, ans=0.125 2023-11-21 03:41:05,627 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8900, loss[loss=0.07215, simple_loss=0.09571, pruned_loss=0.01603, audio_tagging_loss=0.008268, over 15514.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09894, pruned_loss=0.01771, audio_tagging_loss=0.009886, over 3048253.64 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:41:20,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1341913.3333333333, ans=0.0 2023-11-21 03:41:32,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=12.0 2023-11-21 03:41:33,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201300 2023-11-21 03:41:34,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1341980.0, ans=0.125 2023-11-21 03:41:38,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.632e+01 8.120e+01 8.967e+01 9.567e+01 1.214e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-21 03:41:53,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1342046.6666666667, ans=0.125 2023-11-21 03:41:57,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1342113.3333333333, ans=0.125 2023-11-21 03:42:11,239 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 8950, loss[loss=0.07864, simple_loss=0.1021, pruned_loss=0.01863, audio_tagging_loss=0.00897, over 14516.00 frames. ], tot_loss[loss=0.07674, simple_loss=0.0988, pruned_loss=0.01768, audio_tagging_loss=0.009663, over 3045180.65 frames. ], batch size: 52, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:42:15,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1342180.0, ans=0.125 2023-11-21 03:42:37,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201350 2023-11-21 03:43:13,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1342513.3333333333, ans=0.0 2023-11-21 03:43:14,478 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9000, loss[loss=0.08473, simple_loss=0.1105, pruned_loss=0.01835, audio_tagging_loss=0.01113, over 15793.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09903, pruned_loss=0.01783, audio_tagging_loss=0.009517, over 3045196.20 frames. ], batch size: 58, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:43:14,479 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 03:43:42,117 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8885, 3.4193, 4.8589, 4.4097], device='cuda:1') 2023-11-21 03:43:55,719 INFO [train_asr.py:1253] (1/4) Epoch 17, validation: loss=0.06143, simple_loss=0.05268, pruned_loss=0.005433, audio_tagging_loss=0.02966, over 4681554.00 frames. 2023-11-21 03:43:55,720 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 03:44:23,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201400 2023-11-21 03:44:28,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.286e+01 9.090e+01 9.750e+01 1.340e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-21 03:44:30,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1342646.6666666667, ans=0.0 2023-11-21 03:44:31,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1342646.6666666667, ans=0.1 2023-11-21 03:44:41,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1342713.3333333333, ans=0.125 2023-11-21 03:44:46,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1342780.0, ans=0.125 2023-11-21 03:45:00,656 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9050, loss[loss=0.08458, simple_loss=0.0949, pruned_loss=0.02321, audio_tagging_loss=0.01392, over 14457.00 frames. ], tot_loss[loss=0.07715, simple_loss=0.09922, pruned_loss=0.01802, audio_tagging_loss=0.009521, over 3050123.50 frames. ], batch size: 54, lr: 4.01e-03, grad_scale: 16.0 2023-11-21 03:45:02,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1342846.6666666667, ans=0.0 2023-11-21 03:45:18,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1342913.3333333333, ans=0.125 2023-11-21 03:45:26,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201450 2023-11-21 03:45:28,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1342980.0, ans=0.1 2023-11-21 03:45:29,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1342980.0, ans=0.5 2023-11-21 03:45:30,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1342980.0, ans=0.125 2023-11-21 03:45:41,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-21 03:45:50,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1343113.3333333333, ans=0.1 2023-11-21 03:45:52,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-21 03:45:57,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1343113.3333333333, ans=0.125 2023-11-21 03:45:57,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-21 03:46:04,611 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9100, loss[loss=0.09614, simple_loss=0.1266, pruned_loss=0.0257, audio_tagging_loss=0.007148, over 15543.00 frames. ], tot_loss[loss=0.07701, simple_loss=0.09899, pruned_loss=0.01799, audio_tagging_loss=0.009524, over 3042597.60 frames. ], batch size: 58, lr: 4.01e-03, grad_scale: 16.0 2023-11-21 03:46:32,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201500 2023-11-21 03:46:33,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1343313.3333333333, ans=0.125 2023-11-21 03:46:34,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1343313.3333333333, ans=0.125 2023-11-21 03:46:38,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.193e+01 8.891e+01 9.703e+01 1.287e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 03:47:00,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-21 03:47:08,808 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9150, loss[loss=0.0946, simple_loss=0.1247, pruned_loss=0.02525, audio_tagging_loss=0.006968, over 14552.00 frames. ], tot_loss[loss=0.07706, simple_loss=0.09935, pruned_loss=0.01798, audio_tagging_loss=0.009403, over 3040411.01 frames. ], batch size: 54, lr: 4.01e-03, grad_scale: 16.0 2023-11-21 03:47:36,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201550 2023-11-21 03:47:38,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1343646.6666666667, ans=0.2 2023-11-21 03:47:38,375 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:47:40,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1343646.6666666667, ans=0.0 2023-11-21 03:47:45,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1343646.6666666667, ans=0.0 2023-11-21 03:48:06,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1343780.0, ans=0.0 2023-11-21 03:48:13,911 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9200, loss[loss=0.07755, simple_loss=0.09363, pruned_loss=0.01724, audio_tagging_loss=0.0135, over 14384.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09972, pruned_loss=0.01811, audio_tagging_loss=0.009365, over 3039956.66 frames. ], batch size: 55, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:48:18,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1343846.6666666667, ans=0.07 2023-11-21 03:48:19,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2023-11-21 03:48:20,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-11-21 03:48:23,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1343846.6666666667, ans=0.0 2023-11-21 03:48:31,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1343913.3333333333, ans=0.125 2023-11-21 03:48:32,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1343913.3333333333, ans=0.125 2023-11-21 03:48:32,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1343913.3333333333, ans=0.0 2023-11-21 03:48:40,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201600 2023-11-21 03:48:46,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.667e+01 8.039e+01 8.659e+01 9.327e+01 1.552e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 03:49:12,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1344113.3333333333, ans=0.125 2023-11-21 03:49:15,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-21 03:49:18,532 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9250, loss[loss=0.07617, simple_loss=0.1028, pruned_loss=0.0148, audio_tagging_loss=0.00996, over 14450.00 frames. ], tot_loss[loss=0.07698, simple_loss=0.09916, pruned_loss=0.01803, audio_tagging_loss=0.009368, over 3046726.30 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:49:21,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344180.0, ans=0.1 2023-11-21 03:49:27,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344180.0, ans=0.1 2023-11-21 03:49:45,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201650 2023-11-21 03:50:12,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1344446.6666666667, ans=0.2 2023-11-21 03:50:13,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1344446.6666666667, ans=0.07 2023-11-21 03:50:15,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1344446.6666666667, ans=0.125 2023-11-21 03:50:16,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1344446.6666666667, ans=0.2 2023-11-21 03:50:22,244 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9300, loss[loss=0.08373, simple_loss=0.0924, pruned_loss=0.02617, audio_tagging_loss=0.01135, over 14978.00 frames. ], tot_loss[loss=0.07676, simple_loss=0.09903, pruned_loss=0.0178, audio_tagging_loss=0.009446, over 3053736.41 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:50:29,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1344513.3333333333, ans=0.125 2023-11-21 03:50:37,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1344580.0, ans=0.0 2023-11-21 03:50:50,185 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201700 2023-11-21 03:50:53,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1344646.6666666667, ans=0.0 2023-11-21 03:50:55,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1344646.6666666667, ans=0.125 2023-11-21 03:50:56,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.829e+01 7.965e+01 8.483e+01 9.250e+01 1.325e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-21 03:51:27,500 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9350, loss[loss=0.0722, simple_loss=0.0934, pruned_loss=0.01448, audio_tagging_loss=0.01102, over 15409.00 frames. ], tot_loss[loss=0.07673, simple_loss=0.0988, pruned_loss=0.01783, audio_tagging_loss=0.009499, over 3051535.85 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:51:42,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1344913.3333333333, ans=0.125 2023-11-21 03:51:49,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2023-11-21 03:51:54,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201750 2023-11-21 03:51:54,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1344980.0, ans=0.0 2023-11-21 03:52:07,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-21 03:52:32,190 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9400, loss[loss=0.07734, simple_loss=0.09826, pruned_loss=0.02076, audio_tagging_loss=0.007445, over 16410.00 frames. ], tot_loss[loss=0.07671, simple_loss=0.09877, pruned_loss=0.01778, audio_tagging_loss=0.009549, over 3053263.33 frames. ], batch size: 60, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:52:55,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1345313.3333333333, ans=0.2 2023-11-21 03:52:58,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201800 2023-11-21 03:52:58,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1345313.3333333333, ans=0.015 2023-11-21 03:53:04,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.174e+01 8.028e+01 8.710e+01 9.500e+01 1.230e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-21 03:53:08,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1345380.0, ans=0.125 2023-11-21 03:53:20,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2023-11-21 03:53:31,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1345446.6666666667, ans=0.125 2023-11-21 03:53:33,247 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 03:53:35,718 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9450, loss[loss=0.1199, simple_loss=0.1551, pruned_loss=0.03367, audio_tagging_loss=0.008638, over 15246.00 frames. ], tot_loss[loss=0.07688, simple_loss=0.09879, pruned_loss=0.01784, audio_tagging_loss=0.009646, over 3049609.40 frames. ], batch size: 57, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:53:40,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1345513.3333333333, ans=0.0 2023-11-21 03:54:03,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201850 2023-11-21 03:54:03,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1345646.6666666667, ans=0.1 2023-11-21 03:54:40,221 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9500, loss[loss=0.09905, simple_loss=0.1324, pruned_loss=0.02406, audio_tagging_loss=0.00879, over 16018.00 frames. ], tot_loss[loss=0.07719, simple_loss=0.09888, pruned_loss=0.01798, audio_tagging_loss=0.009771, over 3051855.90 frames. ], batch size: 60, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:54:48,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-21 03:55:01,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1345913.3333333333, ans=0.2 2023-11-21 03:55:03,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-11-21 03:55:06,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201900 2023-11-21 03:55:12,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.201e+01 8.880e+01 9.737e+01 1.239e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 03:55:19,365 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:55:19,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-21 03:55:25,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1346046.6666666667, ans=0.125 2023-11-21 03:55:31,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1346113.3333333333, ans=0.0 2023-11-21 03:55:38,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2023-11-21 03:55:41,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=22.5 2023-11-21 03:55:44,094 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9550, loss[loss=0.08035, simple_loss=0.1086, pruned_loss=0.01798, audio_tagging_loss=0.008085, over 15069.00 frames. ], tot_loss[loss=0.0774, simple_loss=0.09907, pruned_loss=0.01802, audio_tagging_loss=0.009848, over 3052026.06 frames. ], batch size: 55, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:55:48,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1346180.0, ans=0.125 2023-11-21 03:55:53,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1346180.0, ans=0.125 2023-11-21 03:55:54,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1346180.0, ans=0.04949747468305833 2023-11-21 03:56:10,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 201950 2023-11-21 03:56:16,616 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 03:56:27,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1346380.0, ans=0.2 2023-11-21 03:56:28,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1346380.0, ans=0.125 2023-11-21 03:56:40,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1346446.6666666667, ans=0.125 2023-11-21 03:56:42,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-21 03:56:48,196 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9600, loss[loss=0.07909, simple_loss=0.09965, pruned_loss=0.01894, audio_tagging_loss=0.01033, over 15902.00 frames. ], tot_loss[loss=0.07718, simple_loss=0.09898, pruned_loss=0.01777, audio_tagging_loss=0.00992, over 3047957.64 frames. ], batch size: 59, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:56:55,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1346513.3333333333, ans=0.0 2023-11-21 03:57:06,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1346580.0, ans=0.0 2023-11-21 03:57:15,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202000 2023-11-21 03:57:21,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.303e+01 7.767e+01 8.568e+01 9.180e+01 1.180e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 03:57:47,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1346780.0, ans=0.0 2023-11-21 03:57:53,420 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9650, loss[loss=0.09478, simple_loss=0.1254, pruned_loss=0.02549, audio_tagging_loss=0.006591, over 15308.00 frames. ], tot_loss[loss=0.07643, simple_loss=0.09755, pruned_loss=0.01759, audio_tagging_loss=0.01006, over 3044728.22 frames. ], batch size: 53, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:57:55,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1346846.6666666667, ans=0.125 2023-11-21 03:58:20,085 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202050 2023-11-21 03:58:22,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1346980.0, ans=0.125 2023-11-21 03:58:32,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1347046.6666666667, ans=0.125 2023-11-21 03:58:35,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1347046.6666666667, ans=0.0 2023-11-21 03:58:38,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1347046.6666666667, ans=0.125 2023-11-21 03:58:39,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1347046.6666666667, ans=0.125 2023-11-21 03:58:48,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=22.5 2023-11-21 03:58:50,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1347113.3333333333, ans=0.1 2023-11-21 03:58:57,055 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9700, loss[loss=0.07896, simple_loss=0.09806, pruned_loss=0.02058, audio_tagging_loss=0.009355, over 14781.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09761, pruned_loss=0.01754, audio_tagging_loss=0.009817, over 3047432.58 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 03:59:13,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1347246.6666666667, ans=0.1 2023-11-21 03:59:21,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1347313.3333333333, ans=0.125 2023-11-21 03:59:24,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202100 2023-11-21 03:59:31,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.071e+01 8.852e+01 9.589e+01 1.175e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 03:59:32,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1347313.3333333333, ans=0.0 2023-11-21 03:59:55,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1347446.6666666667, ans=0.125 2023-11-21 04:00:01,937 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9750, loss[loss=0.088, simple_loss=0.1275, pruned_loss=0.01776, audio_tagging_loss=0.006485, over 16009.00 frames. ], tot_loss[loss=0.07596, simple_loss=0.09771, pruned_loss=0.01747, audio_tagging_loss=0.009641, over 3051054.62 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 04:00:07,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1347513.3333333333, ans=0.09899494936611666 2023-11-21 04:00:09,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-11-21 04:00:28,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202150 2023-11-21 04:00:47,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1347713.3333333333, ans=0.125 2023-11-21 04:00:56,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1347780.0, ans=0.125 2023-11-21 04:01:05,906 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9800, loss[loss=0.05171, simple_loss=0.05767, pruned_loss=0.01023, audio_tagging_loss=0.01264, over 15854.00 frames. ], tot_loss[loss=0.07662, simple_loss=0.09858, pruned_loss=0.01772, audio_tagging_loss=0.009607, over 3052662.67 frames. ], batch size: 63, lr: 4.01e-03, grad_scale: 32.0 2023-11-21 04:01:14,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1347846.6666666667, ans=0.2 2023-11-21 04:01:17,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1347913.3333333333, ans=0.125 2023-11-21 04:01:33,219 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202200 2023-11-21 04:01:36,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2023-11-21 04:01:40,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.218e+01 8.911e+01 9.582e+01 1.351e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-21 04:01:53,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-21 04:01:56,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1348046.6666666667, ans=0.125 2023-11-21 04:01:59,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-21 04:02:02,310 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:02:10,978 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9850, loss[loss=0.07719, simple_loss=0.09481, pruned_loss=0.01979, audio_tagging_loss=0.00999, over 14497.00 frames. ], tot_loss[loss=0.07697, simple_loss=0.09915, pruned_loss=0.01783, audio_tagging_loss=0.009566, over 3049632.02 frames. ], batch size: 53, lr: 4.00e-03, grad_scale: 32.0 2023-11-21 04:02:33,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1348246.6666666667, ans=0.2 2023-11-21 04:02:33,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1348246.6666666667, ans=0.1 2023-11-21 04:02:38,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202250 2023-11-21 04:03:02,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1348446.6666666667, ans=0.125 2023-11-21 04:03:15,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-11-21 04:03:16,209 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9900, loss[loss=0.08006, simple_loss=0.1117, pruned_loss=0.01833, audio_tagging_loss=0.005879, over 15382.00 frames. ], tot_loss[loss=0.07645, simple_loss=0.09849, pruned_loss=0.01759, audio_tagging_loss=0.009604, over 3045820.90 frames. ], batch size: 59, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:03:24,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1348513.3333333333, ans=0.0 2023-11-21 04:03:44,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202300 2023-11-21 04:03:51,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.036e+01 8.846e+01 9.540e+01 1.221e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 04:03:58,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1348713.3333333333, ans=0.0 2023-11-21 04:04:12,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1348780.0, ans=0.125 2023-11-21 04:04:12,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-21 04:04:21,071 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 9950, loss[loss=0.09266, simple_loss=0.1261, pruned_loss=0.02037, audio_tagging_loss=0.00925, over 14680.00 frames. ], tot_loss[loss=0.07696, simple_loss=0.09907, pruned_loss=0.01773, audio_tagging_loss=0.009693, over 3044396.52 frames. ], batch size: 54, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:04:23,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1348846.6666666667, ans=0.125 2023-11-21 04:04:48,406 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202350 2023-11-21 04:04:49,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1348980.0, ans=0.09899494936611666 2023-11-21 04:05:00,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1349046.6666666667, ans=0.125 2023-11-21 04:05:25,834 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10000, loss[loss=0.09385, simple_loss=0.1258, pruned_loss=0.02281, audio_tagging_loss=0.008126, over 14874.00 frames. ], tot_loss[loss=0.07585, simple_loss=0.09739, pruned_loss=0.01749, audio_tagging_loss=0.00967, over 3040407.74 frames. ], batch size: 54, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:05:26,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1349180.0, ans=0.125 2023-11-21 04:05:51,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1349313.3333333333, ans=0.125 2023-11-21 04:05:52,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202400 2023-11-21 04:06:02,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.414e+01 8.043e+01 8.738e+01 9.707e+01 1.240e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 04:06:30,497 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10050, loss[loss=0.07519, simple_loss=0.0947, pruned_loss=0.01734, audio_tagging_loss=0.0105, over 16385.00 frames. ], tot_loss[loss=0.07625, simple_loss=0.09788, pruned_loss=0.01764, audio_tagging_loss=0.00967, over 3042851.74 frames. ], batch size: 63, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:06:32,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1349513.3333333333, ans=0.1 2023-11-21 04:06:32,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1349513.3333333333, ans=0.125 2023-11-21 04:06:38,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2023-11-21 04:06:57,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1349646.6666666667, ans=0.125 2023-11-21 04:06:58,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202450 2023-11-21 04:07:10,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1349713.3333333333, ans=0.2 2023-11-21 04:07:18,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1349713.3333333333, ans=0.2 2023-11-21 04:07:32,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1349780.0, ans=0.0 2023-11-21 04:07:34,512 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10100, loss[loss=0.07752, simple_loss=0.09751, pruned_loss=0.01941, audio_tagging_loss=0.009359, over 14820.00 frames. ], tot_loss[loss=0.07562, simple_loss=0.09707, pruned_loss=0.01745, audio_tagging_loss=0.00964, over 3041695.16 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:08:00,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-21 04:08:02,804 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202500 2023-11-21 04:08:11,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.354e+01 8.956e+01 9.533e+01 1.205e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-21 04:08:12,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1350046.6666666667, ans=0.125 2023-11-21 04:08:21,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1350046.6666666667, ans=0.125 2023-11-21 04:08:24,721 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:08:39,847 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10150, loss[loss=0.08196, simple_loss=0.1094, pruned_loss=0.01711, audio_tagging_loss=0.01017, over 15467.00 frames. ], tot_loss[loss=0.07509, simple_loss=0.09619, pruned_loss=0.01718, audio_tagging_loss=0.009814, over 3041537.08 frames. ], batch size: 59, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:08:59,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1350246.6666666667, ans=0.1 2023-11-21 04:09:01,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1350246.6666666667, ans=0.125 2023-11-21 04:09:06,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202550 2023-11-21 04:09:07,622 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:09:15,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1350313.3333333333, ans=0.125 2023-11-21 04:09:44,011 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10200, loss[loss=0.08314, simple_loss=0.1071, pruned_loss=0.02017, audio_tagging_loss=0.009413, over 16180.00 frames. ], tot_loss[loss=0.07653, simple_loss=0.09788, pruned_loss=0.01782, audio_tagging_loss=0.009771, over 3052512.69 frames. ], batch size: 59, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:10:03,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.96 vs. limit=10.0 2023-11-21 04:10:04,740 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:10:10,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202600 2023-11-21 04:10:12,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1350646.6666666667, ans=0.125 2023-11-21 04:10:17,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1350646.6666666667, ans=0.0 2023-11-21 04:10:19,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.093e+01 8.508e+01 9.534e+01 1.524e+02, threshold=1.702e+02, percent-clipped=0.0 2023-11-21 04:10:45,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1350780.0, ans=0.0 2023-11-21 04:10:47,475 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10250, loss[loss=0.06006, simple_loss=0.0733, pruned_loss=0.01096, audio_tagging_loss=0.01245, over 15225.00 frames. ], tot_loss[loss=0.07615, simple_loss=0.09738, pruned_loss=0.01759, audio_tagging_loss=0.009868, over 3062563.63 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:10:47,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1350846.6666666667, ans=0.0 2023-11-21 04:10:55,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1350846.6666666667, ans=0.1 2023-11-21 04:10:58,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1350846.6666666667, ans=0.0 2023-11-21 04:11:14,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202650 2023-11-21 04:11:19,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1350980.0, ans=0.0 2023-11-21 04:11:19,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1350980.0, ans=0.0 2023-11-21 04:11:22,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1350980.0, ans=0.125 2023-11-21 04:11:52,696 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10300, loss[loss=0.08695, simple_loss=0.1118, pruned_loss=0.0203, audio_tagging_loss=0.01073, over 15111.00 frames. ], tot_loss[loss=0.07675, simple_loss=0.09803, pruned_loss=0.01782, audio_tagging_loss=0.009916, over 3060178.51 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:12:10,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1351246.6666666667, ans=0.04949747468305833 2023-11-21 04:12:15,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351246.6666666667, ans=0.1 2023-11-21 04:12:19,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202700 2023-11-21 04:12:27,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.129e+01 8.690e+01 9.404e+01 1.408e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 04:12:36,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-21 04:12:57,288 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10350, loss[loss=0.08563, simple_loss=0.1106, pruned_loss=0.02004, audio_tagging_loss=0.01028, over 14768.00 frames. ], tot_loss[loss=0.0768, simple_loss=0.09785, pruned_loss=0.01781, audio_tagging_loss=0.01007, over 3054911.28 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:13:03,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1351513.3333333333, ans=0.125 2023-11-21 04:13:18,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1351580.0, ans=0.125 2023-11-21 04:13:19,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1351580.0, ans=0.125 2023-11-21 04:13:19,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1351580.0, ans=0.125 2023-11-21 04:13:22,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1351646.6666666667, ans=0.04949747468305833 2023-11-21 04:13:24,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202750 2023-11-21 04:13:30,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1351646.6666666667, ans=0.125 2023-11-21 04:13:38,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1351713.3333333333, ans=0.2 2023-11-21 04:13:54,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1351780.0, ans=10.0 2023-11-21 04:13:59,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1351780.0, ans=0.125 2023-11-21 04:14:01,509 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10400, loss[loss=0.07731, simple_loss=0.09962, pruned_loss=0.01763, audio_tagging_loss=0.009871, over 15758.00 frames. ], tot_loss[loss=0.07645, simple_loss=0.09742, pruned_loss=0.01769, audio_tagging_loss=0.01005, over 3053485.51 frames. ], batch size: 58, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:14:15,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2023-11-21 04:14:16,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1351913.3333333333, ans=0.125 2023-11-21 04:14:16,496 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 04:14:17,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1351913.3333333333, ans=0.1 2023-11-21 04:14:23,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-21 04:14:29,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202800 2023-11-21 04:14:39,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.428e+01 7.927e+01 8.563e+01 9.335e+01 1.364e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 04:14:45,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1352046.6666666667, ans=0.125 2023-11-21 04:15:06,487 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10450, loss[loss=0.06809, simple_loss=0.08673, pruned_loss=0.01303, audio_tagging_loss=0.0117, over 14476.00 frames. ], tot_loss[loss=0.07652, simple_loss=0.09776, pruned_loss=0.01765, audio_tagging_loss=0.009986, over 3053148.04 frames. ], batch size: 55, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:15:18,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-21 04:15:28,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-21 04:15:33,455 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202850 2023-11-21 04:15:42,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1352313.3333333333, ans=0.0 2023-11-21 04:15:55,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1352380.0, ans=0.125 2023-11-21 04:15:57,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1352446.6666666667, ans=0.125 2023-11-21 04:16:07,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1352446.6666666667, ans=0.125 2023-11-21 04:16:10,735 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10500, loss[loss=0.06343, simple_loss=0.07696, pruned_loss=0.01541, audio_tagging_loss=0.009538, over 15727.00 frames. ], tot_loss[loss=0.07605, simple_loss=0.0975, pruned_loss=0.01749, audio_tagging_loss=0.009805, over 3050126.90 frames. ], batch size: 61, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:16:36,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1352646.6666666667, ans=0.125 2023-11-21 04:16:37,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202900 2023-11-21 04:16:37,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1352646.6666666667, ans=0.0 2023-11-21 04:16:46,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1352646.6666666667, ans=0.2 2023-11-21 04:16:50,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.399e+01 8.004e+01 8.567e+01 9.229e+01 1.198e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 04:16:52,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1352713.3333333333, ans=0.2 2023-11-21 04:17:16,058 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10550, loss[loss=0.07702, simple_loss=0.09773, pruned_loss=0.01962, audio_tagging_loss=0.008536, over 16080.00 frames. ], tot_loss[loss=0.0759, simple_loss=0.09738, pruned_loss=0.01747, audio_tagging_loss=0.009738, over 3044749.29 frames. ], batch size: 60, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:17:43,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 202950 2023-11-21 04:17:51,982 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.750e-02 2023-11-21 04:18:21,415 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10600, loss[loss=0.05664, simple_loss=0.07155, pruned_loss=0.01168, audio_tagging_loss=0.009188, over 13932.00 frames. ], tot_loss[loss=0.07522, simple_loss=0.09662, pruned_loss=0.01723, audio_tagging_loss=0.009684, over 3041727.94 frames. ], batch size: 55, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:18:21,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1353180.0, ans=0.2 2023-11-21 04:18:48,703 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203000 2023-11-21 04:18:59,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.372e+01 8.133e+01 8.810e+01 9.778e+01 1.215e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 04:19:15,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1353446.6666666667, ans=0.125 2023-11-21 04:19:18,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1353446.6666666667, ans=0.125 2023-11-21 04:19:26,436 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10650, loss[loss=0.07751, simple_loss=0.1082, pruned_loss=0.01584, audio_tagging_loss=0.007547, over 14962.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.09753, pruned_loss=0.01744, audio_tagging_loss=0.009578, over 3039824.13 frames. ], batch size: 54, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:19:26,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1353513.3333333333, ans=0.125 2023-11-21 04:19:41,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1353580.0, ans=0.125 2023-11-21 04:19:44,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1353580.0, ans=0.0 2023-11-21 04:19:53,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203050 2023-11-21 04:20:30,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1353846.6666666667, ans=0.2 2023-11-21 04:20:31,168 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10700, loss[loss=0.08473, simple_loss=0.1055, pruned_loss=0.02229, audio_tagging_loss=0.009689, over 14977.00 frames. ], tot_loss[loss=0.07625, simple_loss=0.09853, pruned_loss=0.01757, audio_tagging_loss=0.009413, over 3040418.92 frames. ], batch size: 55, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:20:58,616 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203100 2023-11-21 04:21:03,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353980.0, ans=0.1 2023-11-21 04:21:09,497 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 04:21:10,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 7.991e+01 8.733e+01 9.483e+01 1.206e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 04:21:35,716 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10750, loss[loss=0.07226, simple_loss=0.09591, pruned_loss=0.01554, audio_tagging_loss=0.008755, over 13772.00 frames. ], tot_loss[loss=0.07606, simple_loss=0.09805, pruned_loss=0.01762, audio_tagging_loss=0.009421, over 3041531.17 frames. ], batch size: 52, lr: 4.00e-03, grad_scale: 8.0 2023-11-21 04:21:41,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1354180.0, ans=0.125 2023-11-21 04:22:02,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1354313.3333333333, ans=0.125 2023-11-21 04:22:03,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203150 2023-11-21 04:22:06,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1354313.3333333333, ans=0.04949747468305833 2023-11-21 04:22:18,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1354380.0, ans=0.125 2023-11-21 04:22:21,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-21 04:22:41,285 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10800, loss[loss=0.09079, simple_loss=0.1238, pruned_loss=0.02064, audio_tagging_loss=0.008246, over 15445.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.0989, pruned_loss=0.01773, audio_tagging_loss=0.009413, over 3051297.10 frames. ], batch size: 55, lr: 4.00e-03, grad_scale: 16.0 2023-11-21 04:22:54,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1354580.0, ans=0.1 2023-11-21 04:23:08,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1354646.6666666667, ans=0.0 2023-11-21 04:23:09,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203200 2023-11-21 04:23:13,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1354646.6666666667, ans=0.125 2023-11-21 04:23:21,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.183e+01 7.867e+01 8.488e+01 9.244e+01 1.126e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-21 04:23:23,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-11-21 04:23:25,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1354713.3333333333, ans=0.0 2023-11-21 04:23:47,613 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10850, loss[loss=0.07492, simple_loss=0.1012, pruned_loss=0.01625, audio_tagging_loss=0.008076, over 14366.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.09908, pruned_loss=0.01786, audio_tagging_loss=0.009394, over 3053806.08 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:24:02,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2023-11-21 04:24:03,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1354913.3333333333, ans=15.0 2023-11-21 04:24:08,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1354913.3333333333, ans=0.125 2023-11-21 04:24:14,726 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203250 2023-11-21 04:24:34,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1355046.6666666667, ans=0.125 2023-11-21 04:24:39,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1355113.3333333333, ans=0.0 2023-11-21 04:24:41,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-21 04:24:45,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-21 04:24:45,954 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:24:52,201 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10900, loss[loss=0.07878, simple_loss=0.1036, pruned_loss=0.01814, audio_tagging_loss=0.00886, over 14962.00 frames. ], tot_loss[loss=0.07663, simple_loss=0.09864, pruned_loss=0.01773, audio_tagging_loss=0.009586, over 3051731.10 frames. ], batch size: 55, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:25:01,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1355180.0, ans=10.0 2023-11-21 04:25:10,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-21 04:25:20,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203300 2023-11-21 04:25:31,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.236e+01 9.164e+01 1.009e+02 1.942e+02, threshold=1.833e+02, percent-clipped=1.0 2023-11-21 04:25:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1355380.0, ans=0.0 2023-11-21 04:25:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1355446.6666666667, ans=0.0 2023-11-21 04:25:57,477 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 10950, loss[loss=0.07064, simple_loss=0.08975, pruned_loss=0.0153, audio_tagging_loss=0.01046, over 14658.00 frames. ], tot_loss[loss=0.07679, simple_loss=0.09886, pruned_loss=0.01779, audio_tagging_loss=0.009572, over 3047519.52 frames. ], batch size: 54, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:26:05,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1355513.3333333333, ans=0.125 2023-11-21 04:26:10,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1355580.0, ans=0.1 2023-11-21 04:26:24,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203350 2023-11-21 04:26:26,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-21 04:26:48,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1355780.0, ans=0.05 2023-11-21 04:26:49,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1355780.0, ans=0.0 2023-11-21 04:26:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1355780.0, ans=0.2 2023-11-21 04:26:53,953 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 04:27:02,252 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11000, loss[loss=0.07933, simple_loss=0.09989, pruned_loss=0.01937, audio_tagging_loss=0.01001, over 15740.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.09848, pruned_loss=0.01769, audio_tagging_loss=0.009665, over 3042403.54 frames. ], batch size: 57, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:27:05,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1355846.6666666667, ans=0.0 2023-11-21 04:27:10,233 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:27:25,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1355913.3333333333, ans=0.125 2023-11-21 04:27:29,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203400 2023-11-21 04:27:40,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-21 04:27:40,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.079e+01 8.776e+01 9.579e+01 1.135e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 04:27:46,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1356046.6666666667, ans=0.0 2023-11-21 04:27:50,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1356046.6666666667, ans=0.125 2023-11-21 04:27:52,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1356113.3333333333, ans=0.0 2023-11-21 04:28:02,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1356113.3333333333, ans=0.0 2023-11-21 04:28:06,823 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11050, loss[loss=0.07917, simple_loss=0.1029, pruned_loss=0.01728, audio_tagging_loss=0.01044, over 15614.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09893, pruned_loss=0.01774, audio_tagging_loss=0.009656, over 3048894.16 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:28:24,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-11-21 04:28:34,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203450 2023-11-21 04:28:42,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-21 04:28:55,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1356380.0, ans=0.1 2023-11-21 04:29:11,776 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11100, loss[loss=0.06479, simple_loss=0.07929, pruned_loss=0.01611, audio_tagging_loss=0.00903, over 14644.00 frames. ], tot_loss[loss=0.07776, simple_loss=0.1, pruned_loss=0.01805, audio_tagging_loss=0.009689, over 3050389.65 frames. ], batch size: 58, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:29:15,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1356513.3333333333, ans=0.0 2023-11-21 04:29:23,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1356580.0, ans=0.0 2023-11-21 04:29:32,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1356580.0, ans=0.0 2023-11-21 04:29:37,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1356646.6666666667, ans=0.2 2023-11-21 04:29:39,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203500 2023-11-21 04:29:45,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356646.6666666667, ans=0.1 2023-11-21 04:29:51,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.645e+01 8.437e+01 9.142e+01 9.871e+01 1.258e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-21 04:30:01,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1356713.3333333333, ans=0.2 2023-11-21 04:30:16,966 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11150, loss[loss=0.06548, simple_loss=0.08239, pruned_loss=0.01349, audio_tagging_loss=0.01079, over 14899.00 frames. ], tot_loss[loss=0.07712, simple_loss=0.09901, pruned_loss=0.01781, audio_tagging_loss=0.009803, over 3049111.10 frames. ], batch size: 57, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:30:44,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1356980.0, ans=0.2 2023-11-21 04:30:44,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1356980.0, ans=0.0 2023-11-21 04:30:45,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203550 2023-11-21 04:31:01,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1357046.6666666667, ans=0.1 2023-11-21 04:31:23,070 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11200, loss[loss=0.0767, simple_loss=0.08962, pruned_loss=0.0212, audio_tagging_loss=0.01069, over 14176.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09799, pruned_loss=0.0175, audio_tagging_loss=0.009878, over 3045425.43 frames. ], batch size: 54, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:31:39,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1357246.6666666667, ans=0.0 2023-11-21 04:31:51,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203600 2023-11-21 04:31:52,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-21 04:32:02,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.264e+01 8.069e+01 8.575e+01 9.412e+01 1.593e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-21 04:32:07,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357380.0, ans=0.1 2023-11-21 04:32:13,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1357380.0, ans=0.125 2023-11-21 04:32:29,588 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11250, loss[loss=0.07221, simple_loss=0.1012, pruned_loss=0.0136, audio_tagging_loss=0.007993, over 15437.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09675, pruned_loss=0.01715, audio_tagging_loss=0.009878, over 3047970.67 frames. ], batch size: 58, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:32:37,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1357513.3333333333, ans=0.125 2023-11-21 04:32:47,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1357580.0, ans=0.125 2023-11-21 04:32:49,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1357580.0, ans=0.07 2023-11-21 04:32:56,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203650 2023-11-21 04:33:05,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1357646.6666666667, ans=0.125 2023-11-21 04:33:23,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1357780.0, ans=0.125 2023-11-21 04:33:31,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-21 04:33:35,048 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11300, loss[loss=0.07728, simple_loss=0.1049, pruned_loss=0.01716, audio_tagging_loss=0.007658, over 15169.00 frames. ], tot_loss[loss=0.07541, simple_loss=0.09692, pruned_loss=0.01728, audio_tagging_loss=0.009666, over 3052196.66 frames. ], batch size: 55, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:33:40,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1357846.6666666667, ans=0.1 2023-11-21 04:33:46,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2023-11-21 04:33:51,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-21 04:34:02,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203700 2023-11-21 04:34:12,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357980.0, ans=0.1 2023-11-21 04:34:13,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-21 04:34:14,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.294e+01 8.174e+01 8.923e+01 9.788e+01 1.326e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-21 04:34:27,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-21 04:34:29,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.74 vs. limit=15.0 2023-11-21 04:34:36,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-21 04:34:39,456 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11350, loss[loss=0.08487, simple_loss=0.1061, pruned_loss=0.02301, audio_tagging_loss=0.008822, over 15615.00 frames. ], tot_loss[loss=0.07535, simple_loss=0.09685, pruned_loss=0.01728, audio_tagging_loss=0.009649, over 3049112.73 frames. ], batch size: 61, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:34:44,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1358180.0, ans=0.125 2023-11-21 04:34:46,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1358180.0, ans=0.0 2023-11-21 04:35:00,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1358246.6666666667, ans=0.125 2023-11-21 04:35:07,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203750 2023-11-21 04:35:42,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1358446.6666666667, ans=0.0 2023-11-21 04:35:45,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-21 04:35:46,328 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11400, loss[loss=0.05828, simple_loss=0.07398, pruned_loss=0.01075, audio_tagging_loss=0.01054, over 15062.00 frames. ], tot_loss[loss=0.07528, simple_loss=0.09671, pruned_loss=0.0172, audio_tagging_loss=0.009725, over 3045865.21 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:35:51,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1358513.3333333333, ans=0.125 2023-11-21 04:36:13,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203800 2023-11-21 04:36:24,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.066e+01 8.848e+01 9.491e+01 1.619e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 04:36:25,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1358713.3333333333, ans=0.125 2023-11-21 04:36:36,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-21 04:36:43,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1358780.0, ans=0.1 2023-11-21 04:36:45,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1358780.0, ans=0.125 2023-11-21 04:36:52,320 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11450, loss[loss=0.07369, simple_loss=0.09367, pruned_loss=0.01686, audio_tagging_loss=0.009984, over 15943.00 frames. ], tot_loss[loss=0.07534, simple_loss=0.09678, pruned_loss=0.01724, audio_tagging_loss=0.009708, over 3048970.03 frames. ], batch size: 59, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:36:57,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1358846.6666666667, ans=0.0 2023-11-21 04:36:59,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1358846.6666666667, ans=0.125 2023-11-21 04:37:07,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-21 04:37:18,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203850 2023-11-21 04:37:31,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1359046.6666666667, ans=0.0 2023-11-21 04:37:56,311 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11500, loss[loss=0.07884, simple_loss=0.1028, pruned_loss=0.01937, audio_tagging_loss=0.008088, over 15084.00 frames. ], tot_loss[loss=0.07582, simple_loss=0.09761, pruned_loss=0.01739, audio_tagging_loss=0.009622, over 3049422.21 frames. ], batch size: 58, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:38:09,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1359246.6666666667, ans=0.125 2023-11-21 04:38:23,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203900 2023-11-21 04:38:35,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.131e+01 8.864e+01 9.970e+01 1.435e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-21 04:38:48,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1359446.6666666667, ans=0.0 2023-11-21 04:39:01,270 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11550, loss[loss=0.08007, simple_loss=0.1074, pruned_loss=0.01644, audio_tagging_loss=0.009936, over 15512.00 frames. ], tot_loss[loss=0.07647, simple_loss=0.09869, pruned_loss=0.01755, audio_tagging_loss=0.009572, over 3055109.34 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:39:06,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-21 04:39:07,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1359513.3333333333, ans=0.125 2023-11-21 04:39:07,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=22.5 2023-11-21 04:39:28,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 203950 2023-11-21 04:39:36,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2023-11-21 04:39:37,825 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 04:40:04,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1359846.6666666667, ans=0.125 2023-11-21 04:40:05,778 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11600, loss[loss=0.07329, simple_loss=0.09229, pruned_loss=0.02031, audio_tagging_loss=0.006843, over 15141.00 frames. ], tot_loss[loss=0.07675, simple_loss=0.09895, pruned_loss=0.01773, audio_tagging_loss=0.00954, over 3056745.96 frames. ], batch size: 58, lr: 3.99e-03, grad_scale: 32.0 2023-11-21 04:40:07,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1359846.6666666667, ans=0.125 2023-11-21 04:40:27,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-21 04:40:32,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204000 2023-11-21 04:40:33,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1359980.0, ans=0.125 2023-11-21 04:40:40,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1359980.0, ans=0.125 2023-11-21 04:40:51,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.912e+01 8.095e+01 8.795e+01 9.597e+01 1.214e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 04:41:14,564 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11650, loss[loss=0.08501, simple_loss=0.1075, pruned_loss=0.0217, audio_tagging_loss=0.009572, over 14474.00 frames. ], tot_loss[loss=0.07684, simple_loss=0.09876, pruned_loss=0.0178, audio_tagging_loss=0.009658, over 3055765.97 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:41:42,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204050 2023-11-21 04:41:42,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1360313.3333333333, ans=0.0 2023-11-21 04:42:16,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 04:42:18,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1360513.3333333333, ans=0.0 2023-11-21 04:42:19,574 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11700, loss[loss=0.07728, simple_loss=0.1011, pruned_loss=0.0188, audio_tagging_loss=0.007928, over 15469.00 frames. ], tot_loss[loss=0.07632, simple_loss=0.09807, pruned_loss=0.01758, audio_tagging_loss=0.009701, over 3053207.12 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:42:20,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1360513.3333333333, ans=0.125 2023-11-21 04:42:35,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1360580.0, ans=0.125 2023-11-21 04:42:47,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204100 2023-11-21 04:43:00,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 7.797e+01 8.480e+01 9.195e+01 1.315e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-21 04:43:14,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1360780.0, ans=0.0 2023-11-21 04:43:24,820 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11750, loss[loss=0.07715, simple_loss=0.09708, pruned_loss=0.01657, audio_tagging_loss=0.01204, over 14987.00 frames. ], tot_loss[loss=0.07614, simple_loss=0.0976, pruned_loss=0.01756, audio_tagging_loss=0.009788, over 3055827.26 frames. ], batch size: 55, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:43:26,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-21 04:43:49,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1360980.0, ans=0.125 2023-11-21 04:43:50,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1360980.0, ans=0.0 2023-11-21 04:43:51,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204150 2023-11-21 04:44:01,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1360980.0, ans=0.125 2023-11-21 04:44:22,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1361113.3333333333, ans=0.125 2023-11-21 04:44:29,398 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11800, loss[loss=0.09056, simple_loss=0.1274, pruned_loss=0.0183, audio_tagging_loss=0.008541, over 15496.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09776, pruned_loss=0.01774, audio_tagging_loss=0.009799, over 3051023.66 frames. ], batch size: 55, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:44:44,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1361246.6666666667, ans=0.0 2023-11-21 04:44:56,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204200 2023-11-21 04:45:11,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.252e+01 9.002e+01 9.752e+01 1.410e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-21 04:45:13,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1361380.0, ans=0.2 2023-11-21 04:45:24,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1361446.6666666667, ans=0.125 2023-11-21 04:45:33,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1361513.3333333333, ans=0.0 2023-11-21 04:45:33,866 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11850, loss[loss=0.06874, simple_loss=0.09151, pruned_loss=0.01407, audio_tagging_loss=0.00892, over 15443.00 frames. ], tot_loss[loss=0.0766, simple_loss=0.09807, pruned_loss=0.01776, audio_tagging_loss=0.009803, over 3044917.89 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 16.0 2023-11-21 04:45:46,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1361580.0, ans=0.0 2023-11-21 04:45:49,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-21 04:46:02,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204250 2023-11-21 04:46:21,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1361713.3333333333, ans=0.125 2023-11-21 04:46:39,221 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11900, loss[loss=0.07212, simple_loss=0.08136, pruned_loss=0.01524, audio_tagging_loss=0.0162, over 15878.00 frames. ], tot_loss[loss=0.07654, simple_loss=0.09804, pruned_loss=0.01765, audio_tagging_loss=0.009867, over 3052784.81 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 16.0 2023-11-21 04:46:48,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1361846.6666666667, ans=0.0 2023-11-21 04:46:56,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1361913.3333333333, ans=0.0 2023-11-21 04:46:59,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1361913.3333333333, ans=0.0 2023-11-21 04:47:05,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-21 04:47:06,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204300 2023-11-21 04:47:06,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1361980.0, ans=0.125 2023-11-21 04:47:08,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-21 04:47:18,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1362046.6666666667, ans=0.2 2023-11-21 04:47:20,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.041e+01 8.628e+01 9.158e+01 1.228e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-21 04:47:23,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1362046.6666666667, ans=0.0 2023-11-21 04:47:25,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-21 04:47:29,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-11-21 04:47:35,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1362113.3333333333, ans=0.0 2023-11-21 04:47:41,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-21 04:47:44,167 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 11950, loss[loss=0.0891, simple_loss=0.1171, pruned_loss=0.02481, audio_tagging_loss=0.005733, over 15700.00 frames. ], tot_loss[loss=0.07698, simple_loss=0.09869, pruned_loss=0.01774, audio_tagging_loss=0.009898, over 3055953.21 frames. ], batch size: 56, lr: 3.98e-03, grad_scale: 16.0 2023-11-21 04:47:45,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1362180.0, ans=0.0 2023-11-21 04:47:49,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1362180.0, ans=0.95 2023-11-21 04:47:56,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1362246.6666666667, ans=0.0 2023-11-21 04:47:59,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=22.5 2023-11-21 04:48:04,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-21 04:48:10,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204350 2023-11-21 04:48:28,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2023-11-21 04:48:35,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1362446.6666666667, ans=0.0 2023-11-21 04:48:45,537 INFO [train_asr.py:1221] (1/4) Epoch 17, batch 12000, loss[loss=0.06784, simple_loss=0.08683, pruned_loss=0.01403, audio_tagging_loss=0.0104, over 14336.00 frames. ], tot_loss[loss=0.07669, simple_loss=0.09816, pruned_loss=0.0176, audio_tagging_loss=0.01001, over 3054912.65 frames. ], batch size: 54, lr: 3.98e-03, grad_scale: 32.0 2023-11-21 04:48:45,538 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 04:49:26,095 INFO [train_asr.py:1253] (1/4) Epoch 17, validation: loss=0.06069, simple_loss=0.05267, pruned_loss=0.005387, audio_tagging_loss=0.02896, over 4681554.00 frames. 2023-11-21 04:49:26,096 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 04:49:36,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1362513.3333333333, ans=0.125 2023-11-21 04:49:42,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1362580.0, ans=0.2 2023-11-21 04:49:51,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204400 2023-11-21 04:50:33,274 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 0, loss[loss=0.0656, simple_loss=0.06452, pruned_loss=0.009335, audio_tagging_loss=0.02401, over 15423.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.06452, pruned_loss=0.009335, audio_tagging_loss=0.02401, over 15423.00 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:50:33,275 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 04:51:08,562 INFO [train_asr.py:1253] (1/4) Epoch 18, validation: loss=0.05959, simple_loss=0.05266, pruned_loss=0.005405, audio_tagging_loss=0.02786, over 4681554.00 frames. 2023-11-21 04:51:08,563 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 04:51:19,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.498e+01 8.070e+01 8.803e+01 9.675e+01 1.246e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 04:51:27,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1362733.3333333333, ans=0.125 2023-11-21 04:51:34,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-21 04:51:38,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1362800.0, ans=0.2 2023-11-21 04:51:50,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1362866.6666666667, ans=0.125 2023-11-21 04:51:54,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1362866.6666666667, ans=0.2 2023-11-21 04:51:54,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1362866.6666666667, ans=0.0 2023-11-21 04:52:09,458 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204450 2023-11-21 04:52:11,800 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 50, loss[loss=0.07149, simple_loss=0.07206, pruned_loss=0.01409, audio_tagging_loss=0.02137, over 15749.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.09921, pruned_loss=0.01695, audio_tagging_loss=0.01895, over 686879.32 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:52:20,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1363000.0, ans=0.0 2023-11-21 04:52:36,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1363133.3333333333, ans=0.0 2023-11-21 04:53:13,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204500 2023-11-21 04:53:13,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1363266.6666666667, ans=0.125 2023-11-21 04:53:15,789 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 100, loss[loss=0.07744, simple_loss=0.09397, pruned_loss=0.01479, audio_tagging_loss=0.01566, over 15857.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.09862, pruned_loss=0.01717, audio_tagging_loss=0.01821, over 1211004.47 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:53:28,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.544e+01 9.245e+01 1.008e+02 1.490e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-21 04:53:36,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2023-11-21 04:53:38,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=22.5 2023-11-21 04:53:39,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1363400.0, ans=0.0 2023-11-21 04:54:07,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1363600.0, ans=0.125 2023-11-21 04:54:18,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204550 2023-11-21 04:54:21,057 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 150, loss[loss=0.06918, simple_loss=0.07305, pruned_loss=0.01861, audio_tagging_loss=0.01405, over 14952.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.09722, pruned_loss=0.01691, audio_tagging_loss=0.01638, over 1615343.92 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:54:22,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-21 04:54:31,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1363666.6666666667, ans=0.125 2023-11-21 04:54:42,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1363733.3333333333, ans=0.2 2023-11-21 04:54:42,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1363733.3333333333, ans=0.0 2023-11-21 04:54:44,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1363733.3333333333, ans=0.125 2023-11-21 04:55:04,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-21 04:55:23,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204600 2023-11-21 04:55:26,218 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 200, loss[loss=0.0589, simple_loss=0.07414, pruned_loss=0.0121, audio_tagging_loss=0.009733, over 15591.00 frames. ], tot_loss[loss=0.07942, simple_loss=0.09622, pruned_loss=0.01693, audio_tagging_loss=0.01438, over 1935734.59 frames. ], batch size: 61, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:55:28,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1364000.0, ans=0.035 2023-11-21 04:55:37,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.028e+01 8.648e+01 9.287e+01 1.143e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 04:55:41,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-21 04:55:46,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1364066.6666666667, ans=0.125 2023-11-21 04:55:50,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1364133.3333333333, ans=0.125 2023-11-21 04:56:26,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204650 2023-11-21 04:56:29,157 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 250, loss[loss=0.09235, simple_loss=0.1181, pruned_loss=0.02426, audio_tagging_loss=0.009022, over 15773.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.09753, pruned_loss=0.01704, audio_tagging_loss=0.01296, over 2191591.81 frames. ], batch size: 56, lr: 3.87e-03, grad_scale: 16.0 2023-11-21 04:56:30,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1364333.3333333333, ans=0.125 2023-11-21 04:56:37,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1364333.3333333333, ans=0.125 2023-11-21 04:56:39,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1364333.3333333333, ans=0.0 2023-11-21 04:56:39,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-21 04:57:20,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1364600.0, ans=0.125 2023-11-21 04:57:31,687 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204700 2023-11-21 04:57:34,631 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 300, loss[loss=0.07553, simple_loss=0.09249, pruned_loss=0.01813, audio_tagging_loss=0.01116, over 15206.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.0991, pruned_loss=0.01751, audio_tagging_loss=0.01203, over 2378259.42 frames. ], batch size: 56, lr: 3.87e-03, grad_scale: 16.0 2023-11-21 04:57:43,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1364666.6666666667, ans=0.2 2023-11-21 04:57:46,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.596e+01 8.107e+01 8.931e+01 9.351e+01 1.204e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-21 04:58:12,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1364866.6666666667, ans=0.0 2023-11-21 04:58:14,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1364866.6666666667, ans=0.0 2023-11-21 04:58:25,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1364933.3333333333, ans=0.1 2023-11-21 04:58:35,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204750 2023-11-21 04:58:37,519 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 350, loss[loss=0.08981, simple_loss=0.1192, pruned_loss=0.0224, audio_tagging_loss=0.007792, over 14926.00 frames. ], tot_loss[loss=0.07843, simple_loss=0.09918, pruned_loss=0.01755, audio_tagging_loss=0.01129, over 2529411.57 frames. ], batch size: 54, lr: 3.87e-03, grad_scale: 16.0 2023-11-21 04:58:45,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1365000.0, ans=0.125 2023-11-21 04:59:09,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1365133.3333333333, ans=0.125 2023-11-21 04:59:19,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1365200.0, ans=0.2 2023-11-21 04:59:25,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1365200.0, ans=0.125 2023-11-21 04:59:29,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1365266.6666666667, ans=0.125 2023-11-21 04:59:39,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204800 2023-11-21 04:59:42,280 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 400, loss[loss=0.07001, simple_loss=0.08997, pruned_loss=0.01386, audio_tagging_loss=0.01117, over 14824.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09779, pruned_loss=0.01729, audio_tagging_loss=0.01091, over 2643526.56 frames. ], batch size: 58, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 04:59:55,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.042e+01 8.695e+01 9.547e+01 1.167e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 05:00:28,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1365533.3333333333, ans=0.125 2023-11-21 05:00:44,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1365600.0, ans=0.04949747468305833 2023-11-21 05:00:45,034 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204850 2023-11-21 05:00:47,472 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 450, loss[loss=0.0692, simple_loss=0.08929, pruned_loss=0.01673, audio_tagging_loss=0.007828, over 15191.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09643, pruned_loss=0.01708, audio_tagging_loss=0.0106, over 2732784.70 frames. ], batch size: 57, lr: 3.87e-03, grad_scale: 32.0 2023-11-21 05:01:08,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2023-11-21 05:01:26,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1365866.6666666667, ans=0.1 2023-11-21 05:01:26,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1365866.6666666667, ans=0.1 2023-11-21 05:01:42,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-11-21 05:01:44,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1365933.3333333333, ans=0.0 2023-11-21 05:01:50,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204900 2023-11-21 05:01:50,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1365933.3333333333, ans=0.125 2023-11-21 05:01:52,753 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 500, loss[loss=0.07105, simple_loss=0.09115, pruned_loss=0.01589, audio_tagging_loss=0.009588, over 15382.00 frames. ], tot_loss[loss=0.07559, simple_loss=0.09612, pruned_loss=0.01711, audio_tagging_loss=0.01042, over 2792237.56 frames. ], batch size: 59, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:01:57,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1366000.0, ans=15.0 2023-11-21 05:02:00,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366000.0, ans=0.1 2023-11-21 05:02:06,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1366066.6666666667, ans=0.125 2023-11-21 05:02:07,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.867e+01 8.144e+01 8.900e+01 9.756e+01 1.828e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-21 05:02:12,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1366066.6666666667, ans=0.125 2023-11-21 05:02:23,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-21 05:02:30,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1366133.3333333333, ans=0.125 2023-11-21 05:02:47,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1366266.6666666667, ans=0.0 2023-11-21 05:02:51,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1366266.6666666667, ans=0.125 2023-11-21 05:02:55,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 204950 2023-11-21 05:02:57,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-21 05:02:57,716 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 550, loss[loss=0.07182, simple_loss=0.0892, pruned_loss=0.01826, audio_tagging_loss=0.008954, over 14581.00 frames. ], tot_loss[loss=0.07559, simple_loss=0.09625, pruned_loss=0.01712, audio_tagging_loss=0.01034, over 2848044.03 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:03:18,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1366400.0, ans=0.0 2023-11-21 05:03:26,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1366466.6666666667, ans=0.125 2023-11-21 05:03:30,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1366466.6666666667, ans=0.125 2023-11-21 05:03:57,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1366600.0, ans=0.125 2023-11-21 05:03:59,741 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205000 2023-11-21 05:04:02,510 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 600, loss[loss=0.08053, simple_loss=0.09938, pruned_loss=0.01759, audio_tagging_loss=0.01326, over 15409.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.09679, pruned_loss=0.01727, audio_tagging_loss=0.01012, over 2895406.87 frames. ], batch size: 55, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:04:03,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-21 05:04:16,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.071e+01 8.678e+01 9.459e+01 1.222e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 05:04:18,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1366733.3333333333, ans=0.125 2023-11-21 05:05:00,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-21 05:05:05,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205050 2023-11-21 05:05:05,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2023-11-21 05:05:07,383 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 650, loss[loss=0.07945, simple_loss=0.09385, pruned_loss=0.01978, audio_tagging_loss=0.01275, over 14764.00 frames. ], tot_loss[loss=0.07601, simple_loss=0.09721, pruned_loss=0.0173, audio_tagging_loss=0.0101, over 2929906.68 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:05:08,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1367000.0, ans=0.125 2023-11-21 05:05:11,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1367000.0, ans=0.125 2023-11-21 05:05:17,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-21 05:06:00,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1367266.6666666667, ans=0.04949747468305833 2023-11-21 05:06:05,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1367266.6666666667, ans=0.125 2023-11-21 05:06:10,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205100 2023-11-21 05:06:12,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-21 05:06:12,496 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 700, loss[loss=0.08189, simple_loss=0.1102, pruned_loss=0.01772, audio_tagging_loss=0.009084, over 15694.00 frames. ], tot_loss[loss=0.0756, simple_loss=0.09676, pruned_loss=0.01714, audio_tagging_loss=0.01008, over 2948555.07 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:06:19,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1367333.3333333333, ans=0.125 2023-11-21 05:06:22,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1367333.3333333333, ans=0.125 2023-11-21 05:06:27,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.155e+01 8.827e+01 9.700e+01 1.144e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-21 05:06:50,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1367466.6666666667, ans=0.0 2023-11-21 05:06:56,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1367533.3333333333, ans=0.1 2023-11-21 05:07:16,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205150 2023-11-21 05:07:19,602 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 750, loss[loss=0.06127, simple_loss=0.0769, pruned_loss=0.01388, audio_tagging_loss=0.008948, over 15271.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.09703, pruned_loss=0.01724, audio_tagging_loss=0.01003, over 2975402.82 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:07:22,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1367666.6666666667, ans=0.0 2023-11-21 05:07:30,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1367666.6666666667, ans=0.0 2023-11-21 05:07:32,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-21 05:07:34,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1367733.3333333333, ans=0.1 2023-11-21 05:07:39,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1367733.3333333333, ans=0.125 2023-11-21 05:07:42,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-21 05:07:52,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1367800.0, ans=22.5 2023-11-21 05:08:03,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1367866.6666666667, ans=0.2 2023-11-21 05:08:09,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1367866.6666666667, ans=15.0 2023-11-21 05:08:22,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205200 2023-11-21 05:08:25,705 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 800, loss[loss=0.07348, simple_loss=0.09514, pruned_loss=0.01714, audio_tagging_loss=0.008766, over 14748.00 frames. ], tot_loss[loss=0.07633, simple_loss=0.09776, pruned_loss=0.01749, audio_tagging_loss=0.00997, over 2996315.61 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:08:39,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.168e+01 8.753e+01 9.823e+01 1.363e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 05:08:46,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-11-21 05:08:47,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1368066.6666666667, ans=0.0 2023-11-21 05:08:51,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1368133.3333333333, ans=0.125 2023-11-21 05:08:55,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-21 05:08:57,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1368133.3333333333, ans=0.1 2023-11-21 05:09:27,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-21 05:09:28,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205250 2023-11-21 05:09:28,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2023-11-21 05:09:30,504 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 850, loss[loss=0.05415, simple_loss=0.06413, pruned_loss=0.01006, audio_tagging_loss=0.01202, over 15097.00 frames. ], tot_loss[loss=0.07622, simple_loss=0.09733, pruned_loss=0.0175, audio_tagging_loss=0.01005, over 3014232.53 frames. ], batch size: 61, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:09:32,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1368333.3333333333, ans=0.0 2023-11-21 05:09:32,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-11-21 05:09:33,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1368333.3333333333, ans=0.1 2023-11-21 05:09:37,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-21 05:09:50,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1368400.0, ans=0.125 2023-11-21 05:10:17,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1368533.3333333333, ans=0.2 2023-11-21 05:10:24,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1368600.0, ans=0.0 2023-11-21 05:10:24,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1368600.0, ans=0.1 2023-11-21 05:10:32,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205300 2023-11-21 05:10:35,123 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 900, loss[loss=0.06918, simple_loss=0.08623, pruned_loss=0.01411, audio_tagging_loss=0.01195, over 15505.00 frames. ], tot_loss[loss=0.0762, simple_loss=0.09737, pruned_loss=0.01743, audio_tagging_loss=0.01008, over 3020562.12 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:10:38,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1368666.6666666667, ans=0.07 2023-11-21 05:10:40,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1368666.6666666667, ans=0.0 2023-11-21 05:10:50,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.117e+01 8.694e+01 9.614e+01 1.237e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 05:11:18,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1368866.6666666667, ans=0.0 2023-11-21 05:11:21,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-21 05:11:35,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1368933.3333333333, ans=0.125 2023-11-21 05:11:39,302 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205350 2023-11-21 05:11:41,861 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 950, loss[loss=0.04738, simple_loss=0.0479, pruned_loss=0.01255, audio_tagging_loss=0.01088, over 13843.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09708, pruned_loss=0.01724, audio_tagging_loss=0.00997, over 3019607.67 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:12:28,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1369200.0, ans=0.1 2023-11-21 05:12:43,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205400 2023-11-21 05:12:46,431 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1000, loss[loss=0.08094, simple_loss=0.103, pruned_loss=0.01897, audio_tagging_loss=0.01049, over 15784.00 frames. ], tot_loss[loss=0.0753, simple_loss=0.09671, pruned_loss=0.01705, audio_tagging_loss=0.009887, over 3028244.96 frames. ], batch size: 60, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:12:46,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1369333.3333333333, ans=0.0 2023-11-21 05:12:57,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1369400.0, ans=0.125 2023-11-21 05:13:01,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.289e+01 9.122e+01 9.845e+01 1.226e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-21 05:13:14,392 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:13:17,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1369466.6666666667, ans=0.0 2023-11-21 05:13:21,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1369466.6666666667, ans=0.1 2023-11-21 05:13:36,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-21 05:13:48,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205450 2023-11-21 05:13:50,711 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1050, loss[loss=0.07314, simple_loss=0.08997, pruned_loss=0.01933, audio_tagging_loss=0.008821, over 14181.00 frames. ], tot_loss[loss=0.07533, simple_loss=0.09653, pruned_loss=0.01727, audio_tagging_loss=0.009796, over 3028532.74 frames. ], batch size: 54, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:13:54,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1369666.6666666667, ans=0.125 2023-11-21 05:14:12,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1369733.3333333333, ans=0.125 2023-11-21 05:14:55,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205500 2023-11-21 05:14:57,809 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1100, loss[loss=0.07286, simple_loss=0.09619, pruned_loss=0.01517, audio_tagging_loss=0.00959, over 14985.00 frames. ], tot_loss[loss=0.07478, simple_loss=0.09573, pruned_loss=0.01717, audio_tagging_loss=0.009739, over 3025582.77 frames. ], batch size: 54, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:15:01,447 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:15:06,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1370000.0, ans=0.125 2023-11-21 05:15:06,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1370000.0, ans=0.0 2023-11-21 05:15:12,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.447e+01 9.123e+01 9.824e+01 1.304e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-21 05:15:14,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1370066.6666666667, ans=0.0 2023-11-21 05:15:44,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1370200.0, ans=0.1 2023-11-21 05:15:53,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1370266.6666666667, ans=0.125 2023-11-21 05:15:59,774 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205550 2023-11-21 05:16:02,096 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1150, loss[loss=0.08792, simple_loss=0.109, pruned_loss=0.02296, audio_tagging_loss=0.01048, over 15034.00 frames. ], tot_loss[loss=0.075, simple_loss=0.09585, pruned_loss=0.01731, audio_tagging_loss=0.009758, over 3032014.43 frames. ], batch size: 56, lr: 3.86e-03, grad_scale: 16.0 2023-11-21 05:16:38,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1370466.6666666667, ans=0.0 2023-11-21 05:17:04,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205600 2023-11-21 05:17:07,011 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1200, loss[loss=0.07528, simple_loss=0.0898, pruned_loss=0.01679, audio_tagging_loss=0.01359, over 14848.00 frames. ], tot_loss[loss=0.0749, simple_loss=0.09588, pruned_loss=0.01729, audio_tagging_loss=0.009674, over 3038239.28 frames. ], batch size: 54, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:17:23,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.058e+01 8.835e+01 9.727e+01 1.276e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 05:17:31,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1370733.3333333333, ans=0.0 2023-11-21 05:17:39,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-21 05:17:39,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1370800.0, ans=0.125 2023-11-21 05:17:52,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2023-11-21 05:17:54,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1370866.6666666667, ans=0.2 2023-11-21 05:17:55,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1370866.6666666667, ans=0.0 2023-11-21 05:18:08,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1370933.3333333333, ans=0.125 2023-11-21 05:18:10,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205650 2023-11-21 05:18:11,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1371000.0, ans=0.0 2023-11-21 05:18:11,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1371000.0, ans=0.0 2023-11-21 05:18:12,815 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1250, loss[loss=0.06188, simple_loss=0.08673, pruned_loss=0.0101, audio_tagging_loss=0.008416, over 15230.00 frames. ], tot_loss[loss=0.07479, simple_loss=0.09601, pruned_loss=0.01707, audio_tagging_loss=0.009717, over 3041126.92 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:18:13,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1371000.0, ans=0.0 2023-11-21 05:18:59,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1371200.0, ans=0.5 2023-11-21 05:19:00,529 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:19:16,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205700 2023-11-21 05:19:19,134 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1300, loss[loss=0.07069, simple_loss=0.09157, pruned_loss=0.0157, audio_tagging_loss=0.009211, over 14852.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09608, pruned_loss=0.01707, audio_tagging_loss=0.009717, over 3036656.00 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:19:26,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1371333.3333333333, ans=0.0 2023-11-21 05:19:28,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1371333.3333333333, ans=0.125 2023-11-21 05:19:33,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 8.156e+01 8.683e+01 9.710e+01 1.341e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 05:19:36,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2023-11-21 05:20:01,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1371533.3333333333, ans=0.95 2023-11-21 05:20:02,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1371533.3333333333, ans=0.0 2023-11-21 05:20:21,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205750 2023-11-21 05:20:24,026 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1350, loss[loss=0.04903, simple_loss=0.04999, pruned_loss=0.01028, audio_tagging_loss=0.01375, over 14237.00 frames. ], tot_loss[loss=0.075, simple_loss=0.09634, pruned_loss=0.01717, audio_tagging_loss=0.00966, over 3039299.46 frames. ], batch size: 55, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:20:25,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1371666.6666666667, ans=0.0 2023-11-21 05:20:35,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1371666.6666666667, ans=0.125 2023-11-21 05:20:39,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1371733.3333333333, ans=0.04949747468305833 2023-11-21 05:21:11,025 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:21:26,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205800 2023-11-21 05:21:29,767 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1400, loss[loss=0.07372, simple_loss=0.08712, pruned_loss=0.01955, audio_tagging_loss=0.01061, over 15646.00 frames. ], tot_loss[loss=0.07507, simple_loss=0.09598, pruned_loss=0.01725, audio_tagging_loss=0.009824, over 3037380.65 frames. ], batch size: 65, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:21:40,432 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:21:45,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 8.095e+01 8.796e+01 9.675e+01 1.234e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 05:21:47,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1372066.6666666667, ans=0.0 2023-11-21 05:21:49,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1372066.6666666667, ans=0.125 2023-11-21 05:22:01,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1372133.3333333333, ans=0.1 2023-11-21 05:22:32,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205850 2023-11-21 05:22:35,540 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1450, loss[loss=0.06381, simple_loss=0.08187, pruned_loss=0.01101, audio_tagging_loss=0.01187, over 13966.00 frames. ], tot_loss[loss=0.07576, simple_loss=0.09697, pruned_loss=0.01743, audio_tagging_loss=0.009839, over 3037406.48 frames. ], batch size: 54, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:22:37,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1372333.3333333333, ans=0.2 2023-11-21 05:22:37,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1372333.3333333333, ans=0.125 2023-11-21 05:22:48,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1372400.0, ans=0.1 2023-11-21 05:22:59,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1372466.6666666667, ans=0.04949747468305833 2023-11-21 05:23:08,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2023-11-21 05:23:10,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2023-11-21 05:23:11,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1372466.6666666667, ans=0.0 2023-11-21 05:23:31,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1372600.0, ans=0.0 2023-11-21 05:23:37,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205900 2023-11-21 05:23:40,176 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1500, loss[loss=0.06472, simple_loss=0.08666, pruned_loss=0.01079, audio_tagging_loss=0.0106, over 14622.00 frames. ], tot_loss[loss=0.07577, simple_loss=0.09658, pruned_loss=0.01759, audio_tagging_loss=0.009894, over 3032136.26 frames. ], batch size: 56, lr: 3.86e-03, grad_scale: 32.0 2023-11-21 05:23:45,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1372666.6666666667, ans=0.1 2023-11-21 05:23:49,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1372666.6666666667, ans=0.2 2023-11-21 05:23:50,748 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:23:50,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1372666.6666666667, ans=0.125 2023-11-21 05:23:55,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.193e+01 8.878e+01 9.448e+01 1.243e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 05:23:58,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1372733.3333333333, ans=0.125 2023-11-21 05:24:00,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1372733.3333333333, ans=0.1 2023-11-21 05:24:17,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1372800.0, ans=0.1 2023-11-21 05:24:20,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1372866.6666666667, ans=0.0 2023-11-21 05:24:36,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1372933.3333333333, ans=0.1 2023-11-21 05:24:43,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 205950 2023-11-21 05:24:45,762 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1550, loss[loss=0.08286, simple_loss=0.1018, pruned_loss=0.0217, audio_tagging_loss=0.01025, over 14407.00 frames. ], tot_loss[loss=0.07653, simple_loss=0.09758, pruned_loss=0.01779, audio_tagging_loss=0.009949, over 3034686.56 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:24:50,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1373000.0, ans=0.0 2023-11-21 05:25:00,253 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:25:06,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1373066.6666666667, ans=0.0 2023-11-21 05:25:09,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1373066.6666666667, ans=0.0 2023-11-21 05:25:22,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1373133.3333333333, ans=0.0 2023-11-21 05:25:22,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-21 05:25:25,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1373200.0, ans=0.125 2023-11-21 05:25:27,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1373200.0, ans=0.035 2023-11-21 05:25:44,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1373266.6666666667, ans=0.0 2023-11-21 05:25:48,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206000 2023-11-21 05:25:51,191 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1600, loss[loss=0.09508, simple_loss=0.1311, pruned_loss=0.02113, audio_tagging_loss=0.008412, over 15096.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.09804, pruned_loss=0.01787, audio_tagging_loss=0.009979, over 3049153.23 frames. ], batch size: 54, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:26:07,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.599e+01 8.284e+01 8.965e+01 9.783e+01 1.317e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-21 05:26:11,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1373400.0, ans=0.125 2023-11-21 05:26:33,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1373533.3333333333, ans=0.0 2023-11-21 05:26:43,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1373600.0, ans=0.125 2023-11-21 05:26:44,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1373600.0, ans=0.07 2023-11-21 05:26:47,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1373600.0, ans=0.125 2023-11-21 05:26:55,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206050 2023-11-21 05:26:55,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-11-21 05:26:57,585 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1650, loss[loss=0.09977, simple_loss=0.1257, pruned_loss=0.02684, audio_tagging_loss=0.01008, over 15138.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09748, pruned_loss=0.01769, audio_tagging_loss=0.009994, over 3048158.59 frames. ], batch size: 55, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:27:09,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1373733.3333333333, ans=0.0 2023-11-21 05:27:13,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1373733.3333333333, ans=0.1 2023-11-21 05:27:29,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1373800.0, ans=0.125 2023-11-21 05:27:52,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1373933.3333333333, ans=0.2 2023-11-21 05:28:00,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206100 2023-11-21 05:28:03,617 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1700, loss[loss=0.07795, simple_loss=0.09081, pruned_loss=0.01982, audio_tagging_loss=0.01273, over 16298.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09712, pruned_loss=0.0175, audio_tagging_loss=0.01009, over 3056970.85 frames. ], batch size: 63, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:28:16,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2023-11-21 05:28:18,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.627e+01 8.074e+01 8.560e+01 9.664e+01 1.224e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 05:28:42,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-21 05:28:52,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1374200.0, ans=0.125 2023-11-21 05:29:06,275 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206150 2023-11-21 05:29:08,572 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1750, loss[loss=0.08195, simple_loss=0.1033, pruned_loss=0.02065, audio_tagging_loss=0.009643, over 14504.00 frames. ], tot_loss[loss=0.07544, simple_loss=0.09628, pruned_loss=0.01735, audio_tagging_loss=0.009949, over 3050498.55 frames. ], batch size: 54, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:29:08,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1374333.3333333333, ans=0.0 2023-11-21 05:29:12,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1374333.3333333333, ans=0.2 2023-11-21 05:29:23,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1374400.0, ans=0.04949747468305833 2023-11-21 05:29:23,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:29:42,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1374466.6666666667, ans=0.125 2023-11-21 05:29:45,259 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:29:53,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1374533.3333333333, ans=0.125 2023-11-21 05:29:55,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1374533.3333333333, ans=0.125 2023-11-21 05:29:57,181 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:30:10,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206200 2023-11-21 05:30:13,634 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1800, loss[loss=0.0992, simple_loss=0.1258, pruned_loss=0.02623, audio_tagging_loss=0.01007, over 15713.00 frames. ], tot_loss[loss=0.07593, simple_loss=0.09693, pruned_loss=0.01756, audio_tagging_loss=0.009905, over 3055248.67 frames. ], batch size: 61, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:30:29,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.350e+01 8.891e+01 9.779e+01 1.328e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 05:30:55,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1374866.6666666667, ans=0.1 2023-11-21 05:31:11,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1374933.3333333333, ans=0.1 2023-11-21 05:31:13,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1374933.3333333333, ans=0.125 2023-11-21 05:31:16,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206250 2023-11-21 05:31:18,491 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1850, loss[loss=0.07299, simple_loss=0.0931, pruned_loss=0.01854, audio_tagging_loss=0.007898, over 15666.00 frames. ], tot_loss[loss=0.07618, simple_loss=0.09778, pruned_loss=0.01755, audio_tagging_loss=0.009747, over 3056266.61 frames. ], batch size: 59, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:31:21,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1375000.0, ans=0.125 2023-11-21 05:31:21,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1375000.0, ans=0.125 2023-11-21 05:31:22,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1375000.0, ans=0.125 2023-11-21 05:31:52,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1375133.3333333333, ans=0.125 2023-11-21 05:32:02,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-11-21 05:32:21,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1375266.6666666667, ans=0.0 2023-11-21 05:32:22,531 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206300 2023-11-21 05:32:24,927 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1900, loss[loss=0.0663, simple_loss=0.09116, pruned_loss=0.01431, audio_tagging_loss=0.006416, over 15651.00 frames. ], tot_loss[loss=0.07553, simple_loss=0.09699, pruned_loss=0.01733, audio_tagging_loss=0.009707, over 3052187.95 frames. ], batch size: 59, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:32:27,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1375333.3333333333, ans=0.2 2023-11-21 05:32:32,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1375333.3333333333, ans=0.0 2023-11-21 05:32:40,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 7.949e+01 8.733e+01 9.558e+01 1.328e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 05:32:50,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1375466.6666666667, ans=0.125 2023-11-21 05:32:57,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1375466.6666666667, ans=0.125 2023-11-21 05:33:07,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-11-21 05:33:18,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1375600.0, ans=0.125 2023-11-21 05:33:20,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1375600.0, ans=0.2 2023-11-21 05:33:23,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1375600.0, ans=0.125 2023-11-21 05:33:27,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206350 2023-11-21 05:33:29,458 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 1950, loss[loss=0.07527, simple_loss=0.09409, pruned_loss=0.01728, audio_tagging_loss=0.01095, over 14624.00 frames. ], tot_loss[loss=0.07546, simple_loss=0.09674, pruned_loss=0.01737, audio_tagging_loss=0.00971, over 3050015.30 frames. ], batch size: 54, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:33:47,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:33:47,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375733.3333333333, ans=0.1 2023-11-21 05:33:56,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1375800.0, ans=0.0 2023-11-21 05:34:17,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1375866.6666666667, ans=0.0 2023-11-21 05:34:18,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1375866.6666666667, ans=0.0 2023-11-21 05:34:21,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1375933.3333333333, ans=0.0 2023-11-21 05:34:26,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1375933.3333333333, ans=0.2 2023-11-21 05:34:30,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206400 2023-11-21 05:34:34,513 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2000, loss[loss=0.07458, simple_loss=0.1026, pruned_loss=0.01512, audio_tagging_loss=0.00813, over 16331.00 frames. ], tot_loss[loss=0.07512, simple_loss=0.09628, pruned_loss=0.01726, audio_tagging_loss=0.009726, over 3040768.31 frames. ], batch size: 60, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:34:45,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1376000.0, ans=0.2 2023-11-21 05:34:51,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2023-11-21 05:34:53,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.010e+01 8.880e+01 9.739e+01 1.562e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 05:34:53,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1376066.6666666667, ans=0.125 2023-11-21 05:34:55,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1376066.6666666667, ans=0.0 2023-11-21 05:35:37,812 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206450 2023-11-21 05:35:40,811 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2050, loss[loss=0.05578, simple_loss=0.06861, pruned_loss=0.0109, audio_tagging_loss=0.01058, over 14630.00 frames. ], tot_loss[loss=0.07519, simple_loss=0.09642, pruned_loss=0.01728, audio_tagging_loss=0.009697, over 3035508.67 frames. ], batch size: 58, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:35:46,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-21 05:35:47,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1376333.3333333333, ans=15.0 2023-11-21 05:35:52,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=22.5 2023-11-21 05:35:58,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1376400.0, ans=0.0 2023-11-21 05:36:02,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1376400.0, ans=0.125 2023-11-21 05:36:35,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1376600.0, ans=0.0 2023-11-21 05:36:42,072 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206500 2023-11-21 05:36:44,393 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2100, loss[loss=0.08027, simple_loss=0.1119, pruned_loss=0.01703, audio_tagging_loss=0.007291, over 15899.00 frames. ], tot_loss[loss=0.07507, simple_loss=0.09641, pruned_loss=0.01723, audio_tagging_loss=0.009643, over 3033242.11 frames. ], batch size: 57, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:37:01,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.008e+01 8.892e+01 9.733e+01 1.759e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 05:37:04,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1376733.3333333333, ans=0.1 2023-11-21 05:37:10,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1376800.0, ans=0.125 2023-11-21 05:37:10,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2023-11-21 05:37:16,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1376800.0, ans=0.125 2023-11-21 05:37:37,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1376933.3333333333, ans=0.125 2023-11-21 05:37:44,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206550 2023-11-21 05:37:46,817 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2150, loss[loss=0.07796, simple_loss=0.1002, pruned_loss=0.02007, audio_tagging_loss=0.007782, over 13997.00 frames. ], tot_loss[loss=0.07546, simple_loss=0.09699, pruned_loss=0.01736, audio_tagging_loss=0.009611, over 3032698.20 frames. ], batch size: 55, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:37:49,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2023-11-21 05:38:01,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1377066.6666666667, ans=0.125 2023-11-21 05:38:01,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1377066.6666666667, ans=0.125 2023-11-21 05:38:15,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1377133.3333333333, ans=0.0 2023-11-21 05:38:26,395 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:38:29,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1377200.0, ans=0.0 2023-11-21 05:38:33,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1377200.0, ans=0.125 2023-11-21 05:38:38,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2023-11-21 05:38:38,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1377266.6666666667, ans=0.09899494936611666 2023-11-21 05:38:43,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1377266.6666666667, ans=0.0 2023-11-21 05:38:43,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1377266.6666666667, ans=0.0 2023-11-21 05:38:48,825 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206600 2023-11-21 05:38:51,609 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2200, loss[loss=0.05171, simple_loss=0.05973, pruned_loss=0.008815, audio_tagging_loss=0.01303, over 14472.00 frames. ], tot_loss[loss=0.07462, simple_loss=0.09586, pruned_loss=0.01698, audio_tagging_loss=0.009714, over 3036171.61 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:38:53,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1377333.3333333333, ans=0.0 2023-11-21 05:39:09,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 7.959e+01 8.463e+01 9.337e+01 1.181e+02, threshold=1.693e+02, percent-clipped=0.0 2023-11-21 05:39:54,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206650 2023-11-21 05:39:57,051 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2250, loss[loss=0.06843, simple_loss=0.08758, pruned_loss=0.01491, audio_tagging_loss=0.009724, over 14788.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09585, pruned_loss=0.01703, audio_tagging_loss=0.009709, over 3030699.26 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:40:03,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1377666.6666666667, ans=0.125 2023-11-21 05:40:30,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1377800.0, ans=0.125 2023-11-21 05:40:56,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1377933.3333333333, ans=0.07 2023-11-21 05:40:59,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206700 2023-11-21 05:41:01,596 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2300, loss[loss=0.05427, simple_loss=0.07107, pruned_loss=0.01033, audio_tagging_loss=0.008406, over 15081.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09597, pruned_loss=0.01696, audio_tagging_loss=0.009726, over 3033933.65 frames. ], batch size: 59, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:41:13,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1378000.0, ans=0.125 2023-11-21 05:41:20,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.169e+01 8.944e+01 9.700e+01 1.275e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-21 05:41:21,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1378066.6666666667, ans=0.125 2023-11-21 05:41:32,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1378133.3333333333, ans=0.2 2023-11-21 05:41:42,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1378200.0, ans=0.125 2023-11-21 05:41:43,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-21 05:42:00,192 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:42:04,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206750 2023-11-21 05:42:05,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-21 05:42:07,013 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2350, loss[loss=0.1059, simple_loss=0.1448, pruned_loss=0.02689, audio_tagging_loss=0.006604, over 15227.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.09577, pruned_loss=0.01701, audio_tagging_loss=0.009812, over 3030179.46 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:42:29,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1378400.0, ans=0.0 2023-11-21 05:42:54,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1378533.3333333333, ans=0.0 2023-11-21 05:43:09,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206800 2023-11-21 05:43:12,385 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2400, loss[loss=0.06864, simple_loss=0.08146, pruned_loss=0.0145, audio_tagging_loss=0.01341, over 14287.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09589, pruned_loss=0.01697, audio_tagging_loss=0.009879, over 3029704.83 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:43:15,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1378666.6666666667, ans=0.0 2023-11-21 05:43:16,341 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:43:27,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1378733.3333333333, ans=0.125 2023-11-21 05:43:29,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.194e+01 8.214e+01 8.717e+01 9.423e+01 1.179e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 05:43:40,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1378800.0, ans=0.07 2023-11-21 05:43:41,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1378800.0, ans=0.125 2023-11-21 05:43:52,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1378866.6666666667, ans=0.07 2023-11-21 05:44:13,302 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206850 2023-11-21 05:44:15,724 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2450, loss[loss=0.07723, simple_loss=0.1023, pruned_loss=0.01736, audio_tagging_loss=0.00871, over 14634.00 frames. ], tot_loss[loss=0.07538, simple_loss=0.09671, pruned_loss=0.01711, audio_tagging_loss=0.009917, over 3032671.84 frames. ], batch size: 55, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:44:25,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1379000.0, ans=0.125 2023-11-21 05:44:36,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379066.6666666667, ans=0.1 2023-11-21 05:44:38,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.75 vs. limit=10.0 2023-11-21 05:44:44,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1379133.3333333333, ans=0.05 2023-11-21 05:44:54,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-21 05:44:56,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-21 05:45:02,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2023-11-21 05:45:12,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2023-11-21 05:45:16,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206900 2023-11-21 05:45:18,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1379333.3333333333, ans=0.125 2023-11-21 05:45:19,794 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2500, loss[loss=0.06136, simple_loss=0.06916, pruned_loss=0.01495, audio_tagging_loss=0.01183, over 12973.00 frames. ], tot_loss[loss=0.07648, simple_loss=0.09811, pruned_loss=0.01753, audio_tagging_loss=0.009887, over 3037090.51 frames. ], batch size: 53, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:45:24,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1379333.3333333333, ans=0.07 2023-11-21 05:45:29,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1379333.3333333333, ans=0.125 2023-11-21 05:45:36,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1379400.0, ans=0.125 2023-11-21 05:45:38,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.201e+01 8.897e+01 9.730e+01 1.405e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-21 05:45:38,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1379400.0, ans=0.0 2023-11-21 05:45:49,616 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:45:50,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2023-11-21 05:46:20,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1379600.0, ans=0.015 2023-11-21 05:46:21,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1379600.0, ans=0.125 2023-11-21 05:46:22,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 206950 2023-11-21 05:46:25,212 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2550, loss[loss=0.07305, simple_loss=0.09404, pruned_loss=0.01523, audio_tagging_loss=0.0108, over 14780.00 frames. ], tot_loss[loss=0.07641, simple_loss=0.09804, pruned_loss=0.01755, audio_tagging_loss=0.009844, over 3032071.28 frames. ], batch size: 56, lr: 3.85e-03, grad_scale: 32.0 2023-11-21 05:46:39,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-21 05:46:42,468 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:46:42,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379733.3333333333, ans=0.125 2023-11-21 05:46:45,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2023-11-21 05:46:54,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-11-21 05:46:55,701 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 05:46:57,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-21 05:47:03,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379866.6666666667, ans=0.1 2023-11-21 05:47:05,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1379866.6666666667, ans=0.125 2023-11-21 05:47:22,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-21 05:47:25,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207000 2023-11-21 05:47:28,479 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2600, loss[loss=0.06298, simple_loss=0.08929, pruned_loss=0.009865, audio_tagging_loss=0.008472, over 15652.00 frames. ], tot_loss[loss=0.07514, simple_loss=0.09646, pruned_loss=0.01711, audio_tagging_loss=0.009805, over 3029556.70 frames. ], batch size: 60, lr: 3.85e-03, grad_scale: 16.0 2023-11-21 05:47:36,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1380000.0, ans=0.125 2023-11-21 05:47:47,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.166e+01 8.159e+01 8.772e+01 9.350e+01 1.184e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 05:48:12,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1380200.0, ans=0.125 2023-11-21 05:48:12,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1380200.0, ans=0.07 2023-11-21 05:48:20,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1380266.6666666667, ans=0.125 2023-11-21 05:48:27,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1380266.6666666667, ans=0.125 2023-11-21 05:48:28,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-21 05:48:29,794 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207050 2023-11-21 05:48:32,776 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2650, loss[loss=0.08312, simple_loss=0.1135, pruned_loss=0.01898, audio_tagging_loss=0.007384, over 15701.00 frames. ], tot_loss[loss=0.07559, simple_loss=0.09731, pruned_loss=0.0172, audio_tagging_loss=0.009735, over 3038749.49 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:48:33,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1380333.3333333333, ans=0.125 2023-11-21 05:48:45,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-21 05:48:50,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1380400.0, ans=0.125 2023-11-21 05:49:18,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1380533.3333333333, ans=0.0 2023-11-21 05:49:34,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207100 2023-11-21 05:49:35,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1380600.0, ans=0.2 2023-11-21 05:49:37,445 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2700, loss[loss=0.07287, simple_loss=0.09236, pruned_loss=0.01772, audio_tagging_loss=0.008975, over 14892.00 frames. ], tot_loss[loss=0.07507, simple_loss=0.09655, pruned_loss=0.01715, audio_tagging_loss=0.009652, over 3040581.68 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 8.0 2023-11-21 05:49:37,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1380666.6666666667, ans=0.2 2023-11-21 05:49:57,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 7.943e+01 8.665e+01 9.544e+01 1.191e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 05:50:09,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1380800.0, ans=0.1 2023-11-21 05:50:40,210 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207150 2023-11-21 05:50:42,607 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2750, loss[loss=0.0976, simple_loss=0.1164, pruned_loss=0.0278, audio_tagging_loss=0.01157, over 15547.00 frames. ], tot_loss[loss=0.07524, simple_loss=0.0967, pruned_loss=0.01724, audio_tagging_loss=0.009655, over 3036934.23 frames. ], batch size: 59, lr: 3.84e-03, grad_scale: 8.0 2023-11-21 05:51:40,423 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:51:43,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-21 05:51:45,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207200 2023-11-21 05:51:46,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1381333.3333333333, ans=0.1 2023-11-21 05:51:48,296 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2800, loss[loss=0.07215, simple_loss=0.087, pruned_loss=0.01739, audio_tagging_loss=0.01127, over 14940.00 frames. ], tot_loss[loss=0.07486, simple_loss=0.09611, pruned_loss=0.01708, audio_tagging_loss=0.009726, over 3029563.32 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:51:54,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1381333.3333333333, ans=0.0 2023-11-21 05:52:09,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 7.920e+01 8.497e+01 9.162e+01 1.236e+02, threshold=1.699e+02, percent-clipped=0.0 2023-11-21 05:52:28,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1381533.3333333333, ans=0.1 2023-11-21 05:52:28,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1381533.3333333333, ans=0.125 2023-11-21 05:52:51,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207250 2023-11-21 05:52:54,689 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2850, loss[loss=0.07284, simple_loss=0.08705, pruned_loss=0.01596, audio_tagging_loss=0.01336, over 16141.00 frames. ], tot_loss[loss=0.07521, simple_loss=0.09653, pruned_loss=0.01721, audio_tagging_loss=0.009731, over 3036216.34 frames. ], batch size: 60, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:53:00,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1381666.6666666667, ans=0.2 2023-11-21 05:53:18,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1381733.3333333333, ans=0.125 2023-11-21 05:53:40,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1381866.6666666667, ans=0.125 2023-11-21 05:53:47,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1381933.3333333333, ans=0.2 2023-11-21 05:53:57,023 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207300 2023-11-21 05:53:59,442 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2900, loss[loss=0.08015, simple_loss=0.1031, pruned_loss=0.01795, audio_tagging_loss=0.01064, over 15358.00 frames. ], tot_loss[loss=0.07477, simple_loss=0.09595, pruned_loss=0.01706, audio_tagging_loss=0.009738, over 3037582.13 frames. ], batch size: 59, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:54:09,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1382000.0, ans=0.0 2023-11-21 05:54:13,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1382066.6666666667, ans=0.125 2023-11-21 05:54:19,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1382066.6666666667, ans=0.125 2023-11-21 05:54:20,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.538e+01 7.887e+01 8.991e+01 9.698e+01 1.464e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-21 05:54:28,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-11-21 05:54:55,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-21 05:54:56,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1382266.6666666667, ans=0.1 2023-11-21 05:55:02,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207350 2023-11-21 05:55:04,497 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 2950, loss[loss=0.08359, simple_loss=0.1019, pruned_loss=0.02222, audio_tagging_loss=0.01043, over 14213.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09638, pruned_loss=0.01713, audio_tagging_loss=0.009754, over 3032256.94 frames. ], batch size: 54, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:55:10,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1382333.3333333333, ans=0.2 2023-11-21 05:55:47,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1382533.3333333333, ans=0.95 2023-11-21 05:55:50,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2023-11-21 05:56:07,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207400 2023-11-21 05:56:09,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1382666.6666666667, ans=0.1 2023-11-21 05:56:10,282 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3000, loss[loss=0.109, simple_loss=0.1398, pruned_loss=0.02798, audio_tagging_loss=0.01119, over 15735.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09734, pruned_loss=0.01737, audio_tagging_loss=0.009849, over 3041713.27 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:56:10,283 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 05:56:50,254 INFO [train_asr.py:1253] (1/4) Epoch 18, validation: loss=0.06024, simple_loss=0.05252, pruned_loss=0.00529, audio_tagging_loss=0.02869, over 4681554.00 frames. 2023-11-21 05:56:50,255 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 05:57:09,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1382733.3333333333, ans=0.125 2023-11-21 05:57:11,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.230e+01 8.309e+01 8.951e+01 9.724e+01 1.389e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-21 05:57:27,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1382800.0, ans=0.0 2023-11-21 05:57:34,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-21 05:57:36,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1382866.6666666667, ans=0.0 2023-11-21 05:57:44,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1382933.3333333333, ans=0.0 2023-11-21 05:57:52,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207450 2023-11-21 05:57:55,779 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3050, loss[loss=0.07931, simple_loss=0.1051, pruned_loss=0.01772, audio_tagging_loss=0.009041, over 15705.00 frames. ], tot_loss[loss=0.07609, simple_loss=0.09752, pruned_loss=0.0175, audio_tagging_loss=0.009831, over 3046139.05 frames. ], batch size: 59, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:58:01,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1383000.0, ans=0.0 2023-11-21 05:58:07,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1383066.6666666667, ans=0.0 2023-11-21 05:58:34,416 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 05:58:37,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1383200.0, ans=0.0 2023-11-21 05:58:58,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207500 2023-11-21 05:59:01,622 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3100, loss[loss=0.08691, simple_loss=0.1122, pruned_loss=0.02138, audio_tagging_loss=0.009451, over 14490.00 frames. ], tot_loss[loss=0.07619, simple_loss=0.09761, pruned_loss=0.01753, audio_tagging_loss=0.009857, over 3054016.26 frames. ], batch size: 53, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 05:59:21,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.746e+01 8.138e+01 8.768e+01 9.517e+01 1.436e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 05:59:31,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1383466.6666666667, ans=0.125 2023-11-21 05:59:38,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-21 05:59:42,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1383533.3333333333, ans=0.0 2023-11-21 05:59:53,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=22.5 2023-11-21 06:00:03,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207550 2023-11-21 06:00:06,327 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3150, loss[loss=0.102, simple_loss=0.1342, pruned_loss=0.0255, audio_tagging_loss=0.009366, over 14775.00 frames. ], tot_loss[loss=0.07586, simple_loss=0.09716, pruned_loss=0.01731, audio_tagging_loss=0.009965, over 3049957.20 frames. ], batch size: 53, lr: 3.84e-03, grad_scale: 16.0 2023-11-21 06:00:37,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1383800.0, ans=0.0 2023-11-21 06:00:42,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1383800.0, ans=0.125 2023-11-21 06:00:53,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-21 06:00:59,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-21 06:01:01,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1383933.3333333333, ans=0.0 2023-11-21 06:01:07,839 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207600 2023-11-21 06:01:10,786 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3200, loss[loss=0.06876, simple_loss=0.09221, pruned_loss=0.01525, audio_tagging_loss=0.007402, over 15367.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.09672, pruned_loss=0.01718, audio_tagging_loss=0.01002, over 3046686.33 frames. ], batch size: 58, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:01:28,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1384066.6666666667, ans=0.125 2023-11-21 06:01:32,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.077e+01 8.904e+01 9.561e+01 1.400e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-21 06:02:00,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1384200.0, ans=0.125 2023-11-21 06:02:08,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1384266.6666666667, ans=0.015 2023-11-21 06:02:13,601 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207650 2023-11-21 06:02:15,886 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3250, loss[loss=0.0691, simple_loss=0.08899, pruned_loss=0.01507, audio_tagging_loss=0.009538, over 14615.00 frames. ], tot_loss[loss=0.07567, simple_loss=0.09689, pruned_loss=0.01722, audio_tagging_loss=0.01002, over 3054859.71 frames. ], batch size: 55, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:02:24,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1384333.3333333333, ans=0.07 2023-11-21 06:02:29,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1384400.0, ans=0.125 2023-11-21 06:02:30,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1384400.0, ans=0.0 2023-11-21 06:03:12,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-21 06:03:14,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1384600.0, ans=0.125 2023-11-21 06:03:17,972 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207700 2023-11-21 06:03:20,330 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3300, loss[loss=0.05127, simple_loss=0.06118, pruned_loss=0.009679, audio_tagging_loss=0.011, over 13902.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.09575, pruned_loss=0.0169, audio_tagging_loss=0.01017, over 3052003.90 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:03:24,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-21 06:03:40,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.200e+01 8.829e+01 9.665e+01 1.279e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-21 06:03:45,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1384800.0, ans=0.125 2023-11-21 06:04:10,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1384933.3333333333, ans=0.2 2023-11-21 06:04:10,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1384933.3333333333, ans=0.09899494936611666 2023-11-21 06:04:19,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=1384933.3333333333, ans=15.0 2023-11-21 06:04:21,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207750 2023-11-21 06:04:24,072 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3350, loss[loss=0.05051, simple_loss=0.06139, pruned_loss=0.008847, audio_tagging_loss=0.01097, over 15004.00 frames. ], tot_loss[loss=0.07512, simple_loss=0.09626, pruned_loss=0.01698, audio_tagging_loss=0.01001, over 3054394.51 frames. ], batch size: 57, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:04:29,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1385000.0, ans=0.0 2023-11-21 06:04:36,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-21 06:04:42,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1385066.6666666667, ans=0.0 2023-11-21 06:04:46,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-21 06:04:52,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-21 06:05:00,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1385133.3333333333, ans=0.0 2023-11-21 06:05:27,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207800 2023-11-21 06:05:30,682 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3400, loss[loss=0.08473, simple_loss=0.1048, pruned_loss=0.02155, audio_tagging_loss=0.01079, over 15585.00 frames. ], tot_loss[loss=0.0751, simple_loss=0.09632, pruned_loss=0.01712, audio_tagging_loss=0.009826, over 3049513.76 frames. ], batch size: 60, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:05:45,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1385400.0, ans=0.1 2023-11-21 06:05:50,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.344e+01 9.058e+01 9.925e+01 1.172e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-21 06:05:52,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-21 06:06:01,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1385466.6666666667, ans=0.125 2023-11-21 06:06:01,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1385466.6666666667, ans=0.2 2023-11-21 06:06:32,990 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207850 2023-11-21 06:06:33,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1385600.0, ans=0.0 2023-11-21 06:06:34,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=22.5 2023-11-21 06:06:35,413 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3450, loss[loss=0.07659, simple_loss=0.09489, pruned_loss=0.01836, audio_tagging_loss=0.01079, over 16165.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09692, pruned_loss=0.01719, audio_tagging_loss=0.00975, over 3045632.38 frames. ], batch size: 60, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:06:35,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1385666.6666666667, ans=0.125 2023-11-21 06:06:39,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1385666.6666666667, ans=0.0 2023-11-21 06:06:49,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1385733.3333333333, ans=0.125 2023-11-21 06:06:50,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1385733.3333333333, ans=0.1 2023-11-21 06:07:17,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1385866.6666666667, ans=0.1 2023-11-21 06:07:18,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1385866.6666666667, ans=0.1 2023-11-21 06:07:25,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2023-11-21 06:07:31,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1385933.3333333333, ans=0.0 2023-11-21 06:07:37,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207900 2023-11-21 06:07:39,793 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3500, loss[loss=0.07664, simple_loss=0.1133, pruned_loss=0.01191, audio_tagging_loss=0.008114, over 15189.00 frames. ], tot_loss[loss=0.07529, simple_loss=0.0967, pruned_loss=0.01726, audio_tagging_loss=0.009683, over 3046095.15 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:08:02,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.378e+01 8.049e+01 8.836e+01 9.881e+01 1.284e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 06:08:06,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1386133.3333333333, ans=0.1 2023-11-21 06:08:14,490 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:08:16,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-21 06:08:32,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1386266.6666666667, ans=0.05 2023-11-21 06:08:38,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1386266.6666666667, ans=0.0 2023-11-21 06:08:43,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 207950 2023-11-21 06:08:45,808 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3550, loss[loss=0.1035, simple_loss=0.1356, pruned_loss=0.02681, audio_tagging_loss=0.008855, over 15279.00 frames. ], tot_loss[loss=0.07529, simple_loss=0.09646, pruned_loss=0.01731, audio_tagging_loss=0.009747, over 3044382.30 frames. ], batch size: 57, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:08:48,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1386333.3333333333, ans=0.125 2023-11-21 06:09:14,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1386466.6666666667, ans=0.0 2023-11-21 06:09:41,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=22.5 2023-11-21 06:09:49,111 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208000 2023-11-21 06:09:54,967 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3600, loss[loss=0.04385, simple_loss=0.05418, pruned_loss=0.005454, audio_tagging_loss=0.01131, over 14614.00 frames. ], tot_loss[loss=0.07533, simple_loss=0.09655, pruned_loss=0.01734, audio_tagging_loss=0.009711, over 3049880.88 frames. ], batch size: 57, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:10:10,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1386733.3333333333, ans=0.0 2023-11-21 06:10:12,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1386733.3333333333, ans=0.0 2023-11-21 06:10:14,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.348e+01 7.918e+01 8.568e+01 9.474e+01 1.185e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 06:10:36,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:10:56,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208050 2023-11-21 06:10:58,432 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3650, loss[loss=0.06756, simple_loss=0.08551, pruned_loss=0.01778, audio_tagging_loss=0.007027, over 15673.00 frames. ], tot_loss[loss=0.07594, simple_loss=0.09747, pruned_loss=0.01756, audio_tagging_loss=0.009649, over 3048375.51 frames. ], batch size: 60, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:11:04,768 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:11:37,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1387200.0, ans=0.025 2023-11-21 06:11:52,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1387266.6666666667, ans=0.125 2023-11-21 06:12:01,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208100 2023-11-21 06:12:02,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-21 06:12:03,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=22.5 2023-11-21 06:12:03,994 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3700, loss[loss=0.08598, simple_loss=0.1282, pruned_loss=0.01609, audio_tagging_loss=0.005815, over 15082.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09817, pruned_loss=0.01764, audio_tagging_loss=0.009645, over 3055272.01 frames. ], batch size: 54, lr: 3.84e-03, grad_scale: 32.0 2023-11-21 06:12:05,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1387333.3333333333, ans=0.1 2023-11-21 06:12:19,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1387400.0, ans=0.125 2023-11-21 06:12:25,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.414e+01 8.253e+01 8.797e+01 9.684e+01 1.136e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 06:12:26,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-21 06:12:29,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=22.5 2023-11-21 06:12:50,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1387533.3333333333, ans=0.0 2023-11-21 06:13:07,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208150 2023-11-21 06:13:08,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1387666.6666666667, ans=0.1 2023-11-21 06:13:09,730 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3750, loss[loss=0.07194, simple_loss=0.08764, pruned_loss=0.01809, audio_tagging_loss=0.01002, over 14626.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.09859, pruned_loss=0.01757, audio_tagging_loss=0.009723, over 3052676.85 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:13:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1387800.0, ans=0.125 2023-11-21 06:13:55,214 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:14:02,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1387933.3333333333, ans=0.1 2023-11-21 06:14:07,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1387933.3333333333, ans=0.0 2023-11-21 06:14:11,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208200 2023-11-21 06:14:14,582 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3800, loss[loss=0.08054, simple_loss=0.1094, pruned_loss=0.01831, audio_tagging_loss=0.007514, over 15242.00 frames. ], tot_loss[loss=0.07661, simple_loss=0.09857, pruned_loss=0.01756, audio_tagging_loss=0.009767, over 3056479.81 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:14:16,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1388000.0, ans=0.125 2023-11-21 06:14:21,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1388000.0, ans=0.025 2023-11-21 06:14:35,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.099e+01 8.755e+01 9.581e+01 1.226e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 06:14:49,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1388133.3333333333, ans=0.125 2023-11-21 06:15:02,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1388200.0, ans=0.125 2023-11-21 06:15:16,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208250 2023-11-21 06:15:19,741 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3850, loss[loss=0.08318, simple_loss=0.1121, pruned_loss=0.01747, audio_tagging_loss=0.009651, over 15704.00 frames. ], tot_loss[loss=0.07725, simple_loss=0.09953, pruned_loss=0.01774, audio_tagging_loss=0.009738, over 3063610.84 frames. ], batch size: 59, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:15:21,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1388333.3333333333, ans=0.125 2023-11-21 06:15:28,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1388333.3333333333, ans=0.125 2023-11-21 06:15:51,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1388466.6666666667, ans=0.2 2023-11-21 06:16:02,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1388533.3333333333, ans=0.0 2023-11-21 06:16:22,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208300 2023-11-21 06:16:24,945 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3900, loss[loss=0.06352, simple_loss=0.0791, pruned_loss=0.01084, audio_tagging_loss=0.01313, over 14561.00 frames. ], tot_loss[loss=0.07719, simple_loss=0.09925, pruned_loss=0.01771, audio_tagging_loss=0.009858, over 3059509.40 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:16:28,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1388666.6666666667, ans=0.0 2023-11-21 06:16:46,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-21 06:16:46,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.632e+01 8.288e+01 8.904e+01 9.716e+01 1.197e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-21 06:17:17,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1388933.3333333333, ans=0.04949747468305833 2023-11-21 06:17:27,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208350 2023-11-21 06:17:27,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1388933.3333333333, ans=0.0 2023-11-21 06:17:29,817 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 3950, loss[loss=0.07752, simple_loss=0.0963, pruned_loss=0.01913, audio_tagging_loss=0.01024, over 14675.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.1003, pruned_loss=0.01804, audio_tagging_loss=0.009771, over 3053566.06 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:17:34,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1389000.0, ans=0.125 2023-11-21 06:17:48,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:18:31,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208400 2023-11-21 06:18:34,772 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4000, loss[loss=0.063, simple_loss=0.07991, pruned_loss=0.01195, audio_tagging_loss=0.01109, over 14944.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.1004, pruned_loss=0.01809, audio_tagging_loss=0.00986, over 3048529.99 frames. ], batch size: 55, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:18:38,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-21 06:18:49,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1389400.0, ans=0.125 2023-11-21 06:18:54,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1389400.0, ans=0.0 2023-11-21 06:18:56,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.265e+01 8.108e+01 8.819e+01 9.792e+01 1.460e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 06:18:57,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1389400.0, ans=0.0 2023-11-21 06:19:11,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1389466.6666666667, ans=0.0 2023-11-21 06:19:14,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-21 06:19:20,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-21 06:19:37,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208450 2023-11-21 06:19:39,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-21 06:19:39,995 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4050, loss[loss=0.07506, simple_loss=0.09227, pruned_loss=0.01911, audio_tagging_loss=0.009813, over 14931.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.09952, pruned_loss=0.01787, audio_tagging_loss=0.009951, over 3050305.73 frames. ], batch size: 57, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:19:43,792 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:19:51,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1389733.3333333333, ans=12.0 2023-11-21 06:20:06,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1389800.0, ans=0.2 2023-11-21 06:20:13,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1389800.0, ans=0.125 2023-11-21 06:20:19,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2023-11-21 06:20:35,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1389933.3333333333, ans=0.07 2023-11-21 06:20:36,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1389933.3333333333, ans=0.04949747468305833 2023-11-21 06:20:38,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1389933.3333333333, ans=0.2 2023-11-21 06:20:41,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208500 2023-11-21 06:20:44,125 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4100, loss[loss=0.08294, simple_loss=0.1085, pruned_loss=0.01962, audio_tagging_loss=0.009056, over 15000.00 frames. ], tot_loss[loss=0.0768, simple_loss=0.09856, pruned_loss=0.01765, audio_tagging_loss=0.009869, over 3044620.93 frames. ], batch size: 54, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:20:49,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1390000.0, ans=0.125 2023-11-21 06:21:04,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1390066.6666666667, ans=0.0 2023-11-21 06:21:08,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.154e+01 8.779e+01 9.471e+01 2.037e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-21 06:21:31,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-21 06:21:32,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:21:44,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1390266.6666666667, ans=0.125 2023-11-21 06:21:46,731 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208550 2023-11-21 06:21:49,147 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4150, loss[loss=0.08762, simple_loss=0.1203, pruned_loss=0.02106, audio_tagging_loss=0.006436, over 15586.00 frames. ], tot_loss[loss=0.07617, simple_loss=0.09791, pruned_loss=0.01746, audio_tagging_loss=0.00975, over 3051663.07 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:22:02,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1390400.0, ans=0.2 2023-11-21 06:22:36,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1390533.3333333333, ans=0.125 2023-11-21 06:22:37,666 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:22:38,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-21 06:22:38,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-21 06:22:43,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1390600.0, ans=0.2 2023-11-21 06:22:50,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-21 06:22:52,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208600 2023-11-21 06:22:52,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1390600.0, ans=0.125 2023-11-21 06:22:55,639 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4200, loss[loss=0.09319, simple_loss=0.1166, pruned_loss=0.02524, audio_tagging_loss=0.009627, over 16227.00 frames. ], tot_loss[loss=0.07654, simple_loss=0.09895, pruned_loss=0.01758, audio_tagging_loss=0.009488, over 3046338.30 frames. ], batch size: 59, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:22:59,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1390666.6666666667, ans=0.125 2023-11-21 06:23:11,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-11-21 06:23:11,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-21 06:23:17,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.548e+01 7.993e+01 8.905e+01 1.014e+02 1.723e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-21 06:23:21,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1390800.0, ans=0.0 2023-11-21 06:23:50,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1390933.3333333333, ans=0.125 2023-11-21 06:23:57,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208650 2023-11-21 06:23:59,933 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4250, loss[loss=0.07289, simple_loss=0.0879, pruned_loss=0.01712, audio_tagging_loss=0.01183, over 14633.00 frames. ], tot_loss[loss=0.07711, simple_loss=0.09971, pruned_loss=0.01783, audio_tagging_loss=0.009424, over 3052591.44 frames. ], batch size: 57, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:24:32,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1391133.3333333333, ans=0.0 2023-11-21 06:24:35,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-11-21 06:24:45,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1391200.0, ans=0.04949747468305833 2023-11-21 06:25:01,433 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:25:02,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208700 2023-11-21 06:25:04,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1391333.3333333333, ans=0.1 2023-11-21 06:25:05,628 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4300, loss[loss=0.07871, simple_loss=0.09943, pruned_loss=0.01585, audio_tagging_loss=0.01315, over 15514.00 frames. ], tot_loss[loss=0.07663, simple_loss=0.09877, pruned_loss=0.01777, audio_tagging_loss=0.009472, over 3049653.01 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:25:30,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.945e+01 8.185e+01 9.001e+01 9.859e+01 1.198e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-21 06:25:36,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1391466.6666666667, ans=0.035 2023-11-21 06:25:39,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1391466.6666666667, ans=0.0 2023-11-21 06:25:52,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-11-21 06:25:53,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1391533.3333333333, ans=0.1 2023-11-21 06:26:09,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208750 2023-11-21 06:26:12,087 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4350, loss[loss=0.07545, simple_loss=0.09558, pruned_loss=0.01685, audio_tagging_loss=0.0108, over 15765.00 frames. ], tot_loss[loss=0.07677, simple_loss=0.09899, pruned_loss=0.0178, audio_tagging_loss=0.009469, over 3042951.34 frames. ], batch size: 59, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:26:27,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2023-11-21 06:26:33,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1391733.3333333333, ans=10.0 2023-11-21 06:26:33,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1391733.3333333333, ans=0.0 2023-11-21 06:27:10,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1391933.3333333333, ans=0.0 2023-11-21 06:27:14,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208800 2023-11-21 06:27:16,881 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4400, loss[loss=0.07218, simple_loss=0.09487, pruned_loss=0.01227, audio_tagging_loss=0.01248, over 14761.00 frames. ], tot_loss[loss=0.07666, simple_loss=0.09897, pruned_loss=0.0177, audio_tagging_loss=0.009474, over 3045431.28 frames. ], batch size: 54, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:27:40,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 8.121e+01 8.675e+01 9.482e+01 1.551e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 06:27:40,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1392066.6666666667, ans=0.125 2023-11-21 06:27:40,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1392066.6666666667, ans=0.0 2023-11-21 06:27:45,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2023-11-21 06:27:55,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1392200.0, ans=0.2 2023-11-21 06:28:07,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-21 06:28:18,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208850 2023-11-21 06:28:21,405 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4450, loss[loss=0.06801, simple_loss=0.0892, pruned_loss=0.01546, audio_tagging_loss=0.007948, over 15425.00 frames. ], tot_loss[loss=0.07674, simple_loss=0.09905, pruned_loss=0.01776, audio_tagging_loss=0.009458, over 3050971.03 frames. ], batch size: 57, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:28:24,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1392333.3333333333, ans=0.04949747468305833 2023-11-21 06:28:49,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2023-11-21 06:28:51,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1392466.6666666667, ans=0.09899494936611666 2023-11-21 06:28:59,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1392533.3333333333, ans=0.1 2023-11-21 06:29:11,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-21 06:29:23,800 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208900 2023-11-21 06:29:26,784 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4500, loss[loss=0.08439, simple_loss=0.1128, pruned_loss=0.02016, audio_tagging_loss=0.007835, over 16640.00 frames. ], tot_loss[loss=0.07685, simple_loss=0.09942, pruned_loss=0.0177, audio_tagging_loss=0.009439, over 3053341.41 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:29:35,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1392666.6666666667, ans=0.2 2023-11-21 06:29:49,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1392733.3333333333, ans=0.0 2023-11-21 06:29:49,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.149e+01 8.678e+01 9.442e+01 1.327e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 06:29:57,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1392800.0, ans=0.05 2023-11-21 06:30:16,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1392866.6666666667, ans=0.125 2023-11-21 06:30:29,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 208950 2023-11-21 06:30:32,275 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4550, loss[loss=0.09285, simple_loss=0.1229, pruned_loss=0.02604, audio_tagging_loss=0.005334, over 14474.00 frames. ], tot_loss[loss=0.07621, simple_loss=0.09841, pruned_loss=0.01754, audio_tagging_loss=0.009464, over 3046459.62 frames. ], batch size: 53, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:30:56,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1393133.3333333333, ans=0.125 2023-11-21 06:30:59,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-11-21 06:31:13,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1393200.0, ans=0.125 2023-11-21 06:31:17,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1393200.0, ans=0.05 2023-11-21 06:31:23,166 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:31:33,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1393266.6666666667, ans=0.125 2023-11-21 06:31:34,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209000 2023-11-21 06:31:37,001 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4600, loss[loss=0.06044, simple_loss=0.07138, pruned_loss=0.01326, audio_tagging_loss=0.01149, over 14112.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09796, pruned_loss=0.01752, audio_tagging_loss=0.009496, over 3040768.08 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:31:41,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1393333.3333333333, ans=0.0 2023-11-21 06:31:50,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1393400.0, ans=0.0 2023-11-21 06:31:50,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=12.0 2023-11-21 06:32:01,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.053e+01 8.667e+01 9.436e+01 2.145e+02, threshold=1.733e+02, percent-clipped=1.0 2023-11-21 06:32:12,907 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:32:18,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-21 06:32:27,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1393600.0, ans=0.015 2023-11-21 06:32:39,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209050 2023-11-21 06:32:41,476 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4650, loss[loss=0.07573, simple_loss=0.08638, pruned_loss=0.01942, audio_tagging_loss=0.01312, over 14515.00 frames. ], tot_loss[loss=0.07597, simple_loss=0.09778, pruned_loss=0.01748, audio_tagging_loss=0.009605, over 3045433.22 frames. ], batch size: 57, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:32:51,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1393666.6666666667, ans=0.125 2023-11-21 06:32:59,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1393733.3333333333, ans=0.125 2023-11-21 06:33:03,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1393733.3333333333, ans=0.1 2023-11-21 06:33:20,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1393866.6666666667, ans=0.0 2023-11-21 06:33:27,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1393866.6666666667, ans=0.2 2023-11-21 06:33:30,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-21 06:33:42,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=8.0 2023-11-21 06:33:44,837 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209100 2023-11-21 06:33:47,248 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4700, loss[loss=0.08838, simple_loss=0.1089, pruned_loss=0.02338, audio_tagging_loss=0.01054, over 15072.00 frames. ], tot_loss[loss=0.07606, simple_loss=0.09766, pruned_loss=0.01756, audio_tagging_loss=0.009678, over 3054329.98 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 32.0 2023-11-21 06:33:51,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1394000.0, ans=0.125 2023-11-21 06:34:06,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1394066.6666666667, ans=0.05 2023-11-21 06:34:10,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.569e+01 8.149e+01 8.884e+01 9.885e+01 2.074e+02, threshold=1.777e+02, percent-clipped=1.0 2023-11-21 06:34:24,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1394200.0, ans=0.0 2023-11-21 06:34:48,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209150 2023-11-21 06:34:51,272 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4750, loss[loss=0.05978, simple_loss=0.0676, pruned_loss=0.01557, audio_tagging_loss=0.0104, over 15605.00 frames. ], tot_loss[loss=0.07546, simple_loss=0.09654, pruned_loss=0.01738, audio_tagging_loss=0.00981, over 3051768.07 frames. ], batch size: 62, lr: 3.83e-03, grad_scale: 16.0 2023-11-21 06:34:55,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-11-21 06:34:59,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1394333.3333333333, ans=0.125 2023-11-21 06:35:06,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1394400.0, ans=22.5 2023-11-21 06:35:15,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-21 06:35:23,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1394466.6666666667, ans=0.125 2023-11-21 06:35:25,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-11-21 06:35:29,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1394466.6666666667, ans=0.0 2023-11-21 06:35:54,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209200 2023-11-21 06:35:57,089 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4800, loss[loss=0.08878, simple_loss=0.1085, pruned_loss=0.02558, audio_tagging_loss=0.008929, over 15760.00 frames. ], tot_loss[loss=0.07563, simple_loss=0.09671, pruned_loss=0.01734, audio_tagging_loss=0.009939, over 3055350.04 frames. ], batch size: 61, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:35:57,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-21 06:35:58,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1394666.6666666667, ans=15.0 2023-11-21 06:36:14,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1394733.3333333333, ans=0.125 2023-11-21 06:36:15,739 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:36:24,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.238e+01 8.873e+01 9.665e+01 1.246e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 06:36:25,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1394800.0, ans=0.2 2023-11-21 06:36:33,386 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:36:44,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1394866.6666666667, ans=0.1 2023-11-21 06:36:47,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1394866.6666666667, ans=0.0 2023-11-21 06:37:01,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209250 2023-11-21 06:37:03,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1395000.0, ans=0.2 2023-11-21 06:37:04,153 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4850, loss[loss=0.08357, simple_loss=0.1123, pruned_loss=0.01787, audio_tagging_loss=0.009538, over 14905.00 frames. ], tot_loss[loss=0.07591, simple_loss=0.09715, pruned_loss=0.01732, audio_tagging_loss=0.01001, over 3050393.42 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:37:07,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-21 06:37:54,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1395200.0, ans=0.125 2023-11-21 06:38:06,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209300 2023-11-21 06:38:08,807 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4900, loss[loss=0.08303, simple_loss=0.1075, pruned_loss=0.02075, audio_tagging_loss=0.00852, over 16320.00 frames. ], tot_loss[loss=0.07585, simple_loss=0.09709, pruned_loss=0.01732, audio_tagging_loss=0.009981, over 3049842.90 frames. ], batch size: 60, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:38:15,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1395333.3333333333, ans=0.1 2023-11-21 06:38:24,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-21 06:38:34,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.171e+01 8.744e+01 9.651e+01 1.565e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-21 06:38:35,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1395466.6666666667, ans=0.04949747468305833 2023-11-21 06:38:46,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1395466.6666666667, ans=0.0 2023-11-21 06:39:10,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209350 2023-11-21 06:39:13,579 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 4950, loss[loss=0.06467, simple_loss=0.08642, pruned_loss=0.01176, audio_tagging_loss=0.009694, over 14871.00 frames. ], tot_loss[loss=0.07492, simple_loss=0.0959, pruned_loss=0.01712, audio_tagging_loss=0.00985, over 3044128.78 frames. ], batch size: 57, lr: 3.82e-03, grad_scale: 8.0 2023-11-21 06:39:44,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1395800.0, ans=0.0 2023-11-21 06:39:59,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1395866.6666666667, ans=0.125 2023-11-21 06:39:59,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-21 06:40:17,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209400 2023-11-21 06:40:20,251 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5000, loss[loss=0.09178, simple_loss=0.1132, pruned_loss=0.02223, audio_tagging_loss=0.01297, over 14485.00 frames. ], tot_loss[loss=0.07504, simple_loss=0.09628, pruned_loss=0.01722, audio_tagging_loss=0.009685, over 3037061.86 frames. ], batch size: 53, lr: 3.82e-03, grad_scale: 8.0 2023-11-21 06:40:41,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-21 06:40:42,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1396066.6666666667, ans=0.125 2023-11-21 06:40:47,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.108e+01 8.729e+01 9.428e+01 1.080e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 06:40:56,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1396133.3333333333, ans=0.1 2023-11-21 06:41:03,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1396200.0, ans=0.125 2023-11-21 06:41:03,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2023-11-21 06:41:15,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1396266.6666666667, ans=0.125 2023-11-21 06:41:23,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209450 2023-11-21 06:41:25,441 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5050, loss[loss=0.06779, simple_loss=0.08417, pruned_loss=0.01618, audio_tagging_loss=0.00952, over 14446.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09751, pruned_loss=0.01736, audio_tagging_loss=0.009541, over 3038611.46 frames. ], batch size: 57, lr: 3.82e-03, grad_scale: 8.0 2023-11-21 06:41:27,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-21 06:41:28,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-11-21 06:41:29,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1396333.3333333333, ans=0.125 2023-11-21 06:41:46,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1396400.0, ans=0.1 2023-11-21 06:41:49,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1396466.6666666667, ans=0.0 2023-11-21 06:42:16,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1396600.0, ans=0.125 2023-11-21 06:42:21,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2023-11-21 06:42:27,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209500 2023-11-21 06:42:28,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1396666.6666666667, ans=0.0 2023-11-21 06:42:29,374 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5100, loss[loss=0.06621, simple_loss=0.08668, pruned_loss=0.01235, audio_tagging_loss=0.01052, over 14817.00 frames. ], tot_loss[loss=0.07587, simple_loss=0.09788, pruned_loss=0.01744, audio_tagging_loss=0.009493, over 3039790.52 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 8.0 2023-11-21 06:42:51,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1396733.3333333333, ans=0.125 2023-11-21 06:42:56,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1396800.0, ans=0.125 2023-11-21 06:42:57,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.138e+01 8.643e+01 9.208e+01 2.160e+02, threshold=1.729e+02, percent-clipped=1.0 2023-11-21 06:43:05,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1396800.0, ans=0.125 2023-11-21 06:43:12,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396866.6666666667, ans=0.1 2023-11-21 06:43:16,661 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:43:23,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1396933.3333333333, ans=0.125 2023-11-21 06:43:28,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1396933.3333333333, ans=0.125 2023-11-21 06:43:32,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209550 2023-11-21 06:43:35,665 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5150, loss[loss=0.0799, simple_loss=0.102, pruned_loss=0.01779, audio_tagging_loss=0.0111, over 16487.00 frames. ], tot_loss[loss=0.07539, simple_loss=0.09702, pruned_loss=0.01731, audio_tagging_loss=0.009571, over 3042874.48 frames. ], batch size: 61, lr: 3.82e-03, grad_scale: 8.0 2023-11-21 06:43:35,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1397000.0, ans=0.025 2023-11-21 06:43:43,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1397000.0, ans=0.125 2023-11-21 06:43:44,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1397000.0, ans=0.05 2023-11-21 06:43:46,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1397000.0, ans=0.1 2023-11-21 06:43:53,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1397066.6666666667, ans=0.125 2023-11-21 06:43:53,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1397066.6666666667, ans=0.04949747468305833 2023-11-21 06:44:06,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1397133.3333333333, ans=0.125 2023-11-21 06:44:06,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-11-21 06:44:38,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209600 2023-11-21 06:44:40,760 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5200, loss[loss=0.06043, simple_loss=0.0842, pruned_loss=0.01094, audio_tagging_loss=0.007383, over 15887.00 frames. ], tot_loss[loss=0.0762, simple_loss=0.09838, pruned_loss=0.0175, audio_tagging_loss=0.009515, over 3047028.54 frames. ], batch size: 58, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:44:54,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-21 06:45:07,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.190e+01 8.762e+01 9.315e+01 1.275e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 06:45:17,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1397466.6666666667, ans=0.125 2023-11-21 06:45:31,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2023-11-21 06:45:34,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1397600.0, ans=0.0 2023-11-21 06:45:41,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1397600.0, ans=0.1 2023-11-21 06:45:42,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209650 2023-11-21 06:45:45,295 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5250, loss[loss=0.07954, simple_loss=0.1004, pruned_loss=0.02031, audio_tagging_loss=0.009041, over 15521.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09835, pruned_loss=0.01754, audio_tagging_loss=0.009444, over 3044879.71 frames. ], batch size: 59, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:45:45,646 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:45:48,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1397666.6666666667, ans=0.125 2023-11-21 06:45:54,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1397666.6666666667, ans=0.0 2023-11-21 06:45:56,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1397666.6666666667, ans=0.0 2023-11-21 06:46:12,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2023-11-21 06:46:22,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1397866.6666666667, ans=0.0 2023-11-21 06:46:28,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1397866.6666666667, ans=0.125 2023-11-21 06:46:41,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1397933.3333333333, ans=0.125 2023-11-21 06:46:47,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209700 2023-11-21 06:46:50,786 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5300, loss[loss=0.06794, simple_loss=0.08205, pruned_loss=0.01553, audio_tagging_loss=0.01138, over 15527.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.09902, pruned_loss=0.01762, audio_tagging_loss=0.009453, over 3044811.44 frames. ], batch size: 61, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:47:08,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1398066.6666666667, ans=0.125 2023-11-21 06:47:17,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.345e+01 8.152e+01 8.842e+01 9.695e+01 1.214e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 06:47:35,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1398200.0, ans=0.0 2023-11-21 06:47:52,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209750 2023-11-21 06:47:55,000 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5350, loss[loss=0.09104, simple_loss=0.1198, pruned_loss=0.02467, audio_tagging_loss=0.006453, over 15694.00 frames. ], tot_loss[loss=0.07603, simple_loss=0.09793, pruned_loss=0.0175, audio_tagging_loss=0.009561, over 3041542.43 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:48:03,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-21 06:48:09,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1398400.0, ans=0.125 2023-11-21 06:48:22,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-21 06:48:25,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1398466.6666666667, ans=0.1 2023-11-21 06:48:39,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2023-11-21 06:48:41,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1398533.3333333333, ans=0.125 2023-11-21 06:48:45,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1398533.3333333333, ans=0.125 2023-11-21 06:48:50,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2023-11-21 06:48:53,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1398600.0, ans=0.0 2023-11-21 06:48:56,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1398600.0, ans=0.1 2023-11-21 06:48:57,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209800 2023-11-21 06:49:00,608 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5400, loss[loss=0.05993, simple_loss=0.07509, pruned_loss=0.009968, audio_tagging_loss=0.01242, over 15520.00 frames. ], tot_loss[loss=0.07587, simple_loss=0.09716, pruned_loss=0.01755, audio_tagging_loss=0.009743, over 3038953.30 frames. ], batch size: 59, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:49:28,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.049e+01 8.809e+01 9.277e+01 1.117e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 06:50:02,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209850 2023-11-21 06:50:02,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1398933.3333333333, ans=0.0 2023-11-21 06:50:04,917 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5450, loss[loss=0.04819, simple_loss=0.0706, pruned_loss=0.005995, audio_tagging_loss=0.006892, over 14065.00 frames. ], tot_loss[loss=0.07637, simple_loss=0.09765, pruned_loss=0.01774, audio_tagging_loss=0.00981, over 3039139.24 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:50:32,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-11-21 06:50:40,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1399133.3333333333, ans=0.0 2023-11-21 06:50:42,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-21 06:50:42,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1399200.0, ans=0.125 2023-11-21 06:50:48,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1399200.0, ans=0.125 2023-11-21 06:51:00,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1399266.6666666667, ans=0.125 2023-11-21 06:51:08,100 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209900 2023-11-21 06:51:10,472 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5500, loss[loss=0.07241, simple_loss=0.09531, pruned_loss=0.0139, audio_tagging_loss=0.01086, over 15231.00 frames. ], tot_loss[loss=0.07629, simple_loss=0.09746, pruned_loss=0.01773, audio_tagging_loss=0.009834, over 3035645.86 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:51:14,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1399333.3333333333, ans=0.0 2023-11-21 06:51:14,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1399333.3333333333, ans=0.1 2023-11-21 06:51:31,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1399400.0, ans=0.0 2023-11-21 06:51:31,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1399400.0, ans=0.2 2023-11-21 06:51:37,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.332e+01 7.977e+01 8.798e+01 9.625e+01 1.160e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 06:51:41,502 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:51:55,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-21 06:52:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1399600.0, ans=0.125 2023-11-21 06:52:08,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1399600.0, ans=0.0 2023-11-21 06:52:11,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 209950 2023-11-21 06:52:12,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-21 06:52:13,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1399666.6666666667, ans=0.1 2023-11-21 06:52:14,288 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5550, loss[loss=0.06882, simple_loss=0.08676, pruned_loss=0.01537, audio_tagging_loss=0.01008, over 15611.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09834, pruned_loss=0.01782, audio_tagging_loss=0.009874, over 3041546.03 frames. ], batch size: 60, lr: 3.82e-03, grad_scale: 16.0 2023-11-21 06:52:16,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:52:17,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1399666.6666666667, ans=0.125 2023-11-21 06:52:42,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-21 06:53:17,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210000 2023-11-21 06:53:20,215 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5600, loss[loss=0.0759, simple_loss=0.1068, pruned_loss=0.0146, audio_tagging_loss=0.007883, over 15222.00 frames. ], tot_loss[loss=0.07682, simple_loss=0.0985, pruned_loss=0.01771, audio_tagging_loss=0.009856, over 3036952.67 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:53:26,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1400000.0, ans=0.125 2023-11-21 06:53:33,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1400066.6666666667, ans=0.2 2023-11-21 06:53:33,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1400066.6666666667, ans=0.1 2023-11-21 06:53:38,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1400066.6666666667, ans=0.0 2023-11-21 06:53:47,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.103e+01 7.991e+01 8.686e+01 9.530e+01 1.132e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 06:54:07,378 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 06:54:23,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210050 2023-11-21 06:54:25,746 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5650, loss[loss=0.07667, simple_loss=0.09916, pruned_loss=0.01703, audio_tagging_loss=0.01006, over 15474.00 frames. ], tot_loss[loss=0.07676, simple_loss=0.09839, pruned_loss=0.01772, audio_tagging_loss=0.009845, over 3053687.82 frames. ], batch size: 57, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:54:33,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1400333.3333333333, ans=0.125 2023-11-21 06:54:48,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1400400.0, ans=0.0 2023-11-21 06:54:50,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400466.6666666667, ans=0.1 2023-11-21 06:55:16,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1400600.0, ans=0.125 2023-11-21 06:55:21,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2023-11-21 06:55:26,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210100 2023-11-21 06:55:27,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400666.6666666667, ans=0.1 2023-11-21 06:55:28,799 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5700, loss[loss=0.07891, simple_loss=0.09176, pruned_loss=0.02211, audio_tagging_loss=0.01093, over 15805.00 frames. ], tot_loss[loss=0.07657, simple_loss=0.09805, pruned_loss=0.01771, audio_tagging_loss=0.009825, over 3052671.77 frames. ], batch size: 62, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:55:35,254 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 06:55:37,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1400666.6666666667, ans=0.0 2023-11-21 06:55:57,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.120e+01 8.868e+01 9.513e+01 1.273e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 06:55:58,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1400800.0, ans=0.0 2023-11-21 06:56:13,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1400866.6666666667, ans=0.025 2023-11-21 06:56:30,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210150 2023-11-21 06:56:33,765 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5750, loss[loss=0.07867, simple_loss=0.0966, pruned_loss=0.01962, audio_tagging_loss=0.01074, over 15357.00 frames. ], tot_loss[loss=0.07657, simple_loss=0.09825, pruned_loss=0.01776, audio_tagging_loss=0.009688, over 3050770.49 frames. ], batch size: 57, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:57:18,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2023-11-21 06:57:18,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1401200.0, ans=0.125 2023-11-21 06:57:36,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210200 2023-11-21 06:57:37,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-21 06:57:39,668 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5800, loss[loss=0.06183, simple_loss=0.06937, pruned_loss=0.01673, audio_tagging_loss=0.01042, over 13870.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09689, pruned_loss=0.01762, audio_tagging_loss=0.009682, over 3042849.88 frames. ], batch size: 53, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:57:48,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1401333.3333333333, ans=0.125 2023-11-21 06:57:54,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1401400.0, ans=0.0 2023-11-21 06:58:01,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1401400.0, ans=0.2 2023-11-21 06:58:05,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.333e+01 8.998e+01 9.692e+01 1.329e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-21 06:58:06,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1401466.6666666667, ans=0.125 2023-11-21 06:58:11,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1401466.6666666667, ans=0.125 2023-11-21 06:58:29,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1401533.3333333333, ans=0.1 2023-11-21 06:58:41,996 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210250 2023-11-21 06:58:44,341 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5850, loss[loss=0.08934, simple_loss=0.1229, pruned_loss=0.0193, audio_tagging_loss=0.008604, over 15301.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09612, pruned_loss=0.01744, audio_tagging_loss=0.00965, over 3042418.54 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 32.0 2023-11-21 06:59:05,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1401733.3333333333, ans=0.1 2023-11-21 06:59:16,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1401800.0, ans=0.1 2023-11-21 06:59:21,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1401800.0, ans=0.125 2023-11-21 06:59:43,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1401933.3333333333, ans=0.2 2023-11-21 06:59:45,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210300 2023-11-21 06:59:47,875 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5900, loss[loss=0.1087, simple_loss=0.1413, pruned_loss=0.0308, audio_tagging_loss=0.007219, over 15223.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09588, pruned_loss=0.01724, audio_tagging_loss=0.009609, over 3036889.14 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:00:17,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.132e+01 8.306e+01 8.999e+01 9.965e+01 1.195e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-21 07:00:28,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1402200.0, ans=0.125 2023-11-21 07:00:41,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-11-21 07:00:44,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1402266.6666666667, ans=0.125 2023-11-21 07:00:51,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210350 2023-11-21 07:00:54,587 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 5950, loss[loss=0.06137, simple_loss=0.07934, pruned_loss=0.009989, audio_tagging_loss=0.01171, over 14457.00 frames. ], tot_loss[loss=0.07463, simple_loss=0.09577, pruned_loss=0.01714, audio_tagging_loss=0.009604, over 3036269.56 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:00:57,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2023-11-21 07:01:02,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2023-11-21 07:01:03,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1402333.3333333333, ans=0.0 2023-11-21 07:01:16,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1402400.0, ans=0.125 2023-11-21 07:01:38,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-11-21 07:01:43,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1402533.3333333333, ans=0.09899494936611666 2023-11-21 07:01:44,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1402600.0, ans=0.1 2023-11-21 07:01:55,466 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210400 2023-11-21 07:01:58,240 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6000, loss[loss=0.07449, simple_loss=0.09557, pruned_loss=0.01758, audio_tagging_loss=0.009124, over 14559.00 frames. ], tot_loss[loss=0.0759, simple_loss=0.09766, pruned_loss=0.01755, audio_tagging_loss=0.009521, over 3045654.26 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:01:58,241 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 07:02:22,271 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.8846, 4.1556, 3.8036, 3.0849], device='cuda:1') 2023-11-21 07:02:34,036 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9011, 4.0589, 3.8189, 3.1331], device='cuda:1') 2023-11-21 07:02:40,594 INFO [train_asr.py:1253] (1/4) Epoch 18, validation: loss=0.0604, simple_loss=0.05257, pruned_loss=0.00537, audio_tagging_loss=0.02874, over 4681554.00 frames. 2023-11-21 07:02:40,595 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 07:02:40,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1402666.6666666667, ans=0.2 2023-11-21 07:03:08,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=8.0 2023-11-21 07:03:09,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 7.991e+01 8.851e+01 9.572e+01 1.607e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 07:03:27,307 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 07:03:33,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1402933.3333333333, ans=0.07 2023-11-21 07:03:44,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210450 2023-11-21 07:03:47,027 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6050, loss[loss=0.08266, simple_loss=0.1053, pruned_loss=0.02176, audio_tagging_loss=0.008265, over 13885.00 frames. ], tot_loss[loss=0.07654, simple_loss=0.09867, pruned_loss=0.01773, audio_tagging_loss=0.009479, over 3047454.32 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:04:18,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1403133.3333333333, ans=10.0 2023-11-21 07:04:20,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=22.5 2023-11-21 07:04:25,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1403200.0, ans=0.0 2023-11-21 07:04:31,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1403200.0, ans=0.125 2023-11-21 07:04:33,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1403200.0, ans=0.1 2023-11-21 07:04:45,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1403266.6666666667, ans=0.0 2023-11-21 07:04:47,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1403266.6666666667, ans=0.0 2023-11-21 07:04:48,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210500 2023-11-21 07:04:51,006 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6100, loss[loss=0.06446, simple_loss=0.08247, pruned_loss=0.01248, audio_tagging_loss=0.01075, over 15778.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09852, pruned_loss=0.01765, audio_tagging_loss=0.009507, over 3048323.87 frames. ], batch size: 63, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:04:57,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1403333.3333333333, ans=22.5 2023-11-21 07:05:02,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=12.0 2023-11-21 07:05:20,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.632e+01 8.223e+01 8.598e+01 9.128e+01 4.094e+02, threshold=1.720e+02, percent-clipped=1.0 2023-11-21 07:05:33,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1403533.3333333333, ans=0.125 2023-11-21 07:05:35,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1403533.3333333333, ans=0.0 2023-11-21 07:05:37,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1403533.3333333333, ans=0.07 2023-11-21 07:05:50,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1403600.0, ans=0.1 2023-11-21 07:05:53,446 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210550 2023-11-21 07:05:53,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1403600.0, ans=0.95 2023-11-21 07:05:55,822 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6150, loss[loss=0.09452, simple_loss=0.1201, pruned_loss=0.0247, audio_tagging_loss=0.009785, over 16454.00 frames. ], tot_loss[loss=0.07669, simple_loss=0.09879, pruned_loss=0.01773, audio_tagging_loss=0.00957, over 3054317.53 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:05:56,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1403666.6666666667, ans=0.95 2023-11-21 07:06:15,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1403733.3333333333, ans=0.1 2023-11-21 07:06:31,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1403800.0, ans=0.95 2023-11-21 07:06:42,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1403866.6666666667, ans=0.125 2023-11-21 07:06:43,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1403866.6666666667, ans=0.2 2023-11-21 07:06:43,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-21 07:06:50,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-21 07:06:53,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1403933.3333333333, ans=0.125 2023-11-21 07:07:00,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210600 2023-11-21 07:07:03,403 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6200, loss[loss=0.06042, simple_loss=0.07783, pruned_loss=0.01171, audio_tagging_loss=0.009789, over 15316.00 frames. ], tot_loss[loss=0.07626, simple_loss=0.09802, pruned_loss=0.01762, audio_tagging_loss=0.009635, over 3054510.82 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:07:03,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1404000.0, ans=0.125 2023-11-21 07:07:03,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1404000.0, ans=0.1 2023-11-21 07:07:25,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1404066.6666666667, ans=0.125 2023-11-21 07:07:30,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-11-21 07:07:31,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.071e+01 8.570e+01 9.490e+01 1.156e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 07:07:40,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-21 07:08:06,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210650 2023-11-21 07:08:08,587 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6250, loss[loss=0.0656, simple_loss=0.08673, pruned_loss=0.01464, audio_tagging_loss=0.007592, over 15388.00 frames. ], tot_loss[loss=0.07565, simple_loss=0.09718, pruned_loss=0.01734, audio_tagging_loss=0.009717, over 3054298.87 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:08:13,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1404333.3333333333, ans=0.05 2023-11-21 07:08:16,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1404333.3333333333, ans=0.2 2023-11-21 07:08:18,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1404333.3333333333, ans=0.125 2023-11-21 07:08:26,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1404400.0, ans=0.025 2023-11-21 07:08:30,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1404400.0, ans=0.125 2023-11-21 07:08:35,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2023-11-21 07:08:41,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1404466.6666666667, ans=15.0 2023-11-21 07:08:44,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1404466.6666666667, ans=0.125 2023-11-21 07:08:44,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-21 07:09:10,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210700 2023-11-21 07:09:12,474 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6300, loss[loss=0.07738, simple_loss=0.09627, pruned_loss=0.0179, audio_tagging_loss=0.01134, over 15986.00 frames. ], tot_loss[loss=0.07602, simple_loss=0.09754, pruned_loss=0.0175, audio_tagging_loss=0.009752, over 3056111.78 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:09:20,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1404666.6666666667, ans=0.125 2023-11-21 07:09:22,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1404666.6666666667, ans=0.0 2023-11-21 07:09:25,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1404733.3333333333, ans=0.04949747468305833 2023-11-21 07:09:32,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-11-21 07:09:41,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.208e+01 8.088e+01 8.800e+01 9.638e+01 1.138e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 07:09:53,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1404866.6666666667, ans=0.125 2023-11-21 07:09:56,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1404866.6666666667, ans=0.2 2023-11-21 07:09:58,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-21 07:10:10,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1404933.3333333333, ans=0.125 2023-11-21 07:10:15,439 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210750 2023-11-21 07:10:15,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1404933.3333333333, ans=0.1 2023-11-21 07:10:18,537 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6350, loss[loss=0.07942, simple_loss=0.107, pruned_loss=0.01566, audio_tagging_loss=0.01024, over 15360.00 frames. ], tot_loss[loss=0.07545, simple_loss=0.09667, pruned_loss=0.01723, audio_tagging_loss=0.009885, over 3048974.21 frames. ], batch size: 58, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:10:30,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-21 07:10:45,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1405133.3333333333, ans=0.125 2023-11-21 07:10:45,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1405133.3333333333, ans=0.125 2023-11-21 07:11:05,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1405200.0, ans=0.125 2023-11-21 07:11:13,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1405266.6666666667, ans=0.0 2023-11-21 07:11:20,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210800 2023-11-21 07:11:23,323 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6400, loss[loss=0.09144, simple_loss=0.1202, pruned_loss=0.0238, audio_tagging_loss=0.007536, over 15235.00 frames. ], tot_loss[loss=0.07461, simple_loss=0.0956, pruned_loss=0.01687, audio_tagging_loss=0.009938, over 3059739.77 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:11:48,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1405466.6666666667, ans=0.0 2023-11-21 07:11:55,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.546e+01 8.103e+01 8.679e+01 9.633e+01 1.173e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 07:12:01,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1405466.6666666667, ans=0.0 2023-11-21 07:12:07,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-21 07:12:25,972 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210850 2023-11-21 07:12:28,236 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6450, loss[loss=0.09508, simple_loss=0.1211, pruned_loss=0.02299, audio_tagging_loss=0.01155, over 15043.00 frames. ], tot_loss[loss=0.07546, simple_loss=0.09658, pruned_loss=0.01722, audio_tagging_loss=0.009951, over 3055280.84 frames. ], batch size: 58, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:12:28,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:12:30,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-21 07:12:41,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1405733.3333333333, ans=0.125 2023-11-21 07:12:44,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405733.3333333333, ans=0.1 2023-11-21 07:13:32,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210900 2023-11-21 07:13:34,497 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6500, loss[loss=0.07302, simple_loss=0.1, pruned_loss=0.01436, audio_tagging_loss=0.008645, over 14309.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09656, pruned_loss=0.0172, audio_tagging_loss=0.00992, over 3052512.76 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:13:42,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1406000.0, ans=0.2 2023-11-21 07:13:49,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1406066.6666666667, ans=0.125 2023-11-21 07:14:05,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.017e+01 8.785e+01 9.599e+01 1.242e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 07:14:37,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 210950 2023-11-21 07:14:39,621 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6550, loss[loss=0.07238, simple_loss=0.09556, pruned_loss=0.01692, audio_tagging_loss=0.007681, over 16204.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09739, pruned_loss=0.01727, audio_tagging_loss=0.009699, over 3049254.45 frames. ], batch size: 61, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:14:48,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1406333.3333333333, ans=0.125 2023-11-21 07:14:55,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1406400.0, ans=0.125 2023-11-21 07:15:01,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1406400.0, ans=0.05 2023-11-21 07:15:24,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1406533.3333333333, ans=0.0 2023-11-21 07:15:31,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1406600.0, ans=0.0 2023-11-21 07:15:39,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1406600.0, ans=0.2 2023-11-21 07:15:41,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211000 2023-11-21 07:15:43,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1406666.6666666667, ans=0.125 2023-11-21 07:15:44,656 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6600, loss[loss=0.09672, simple_loss=0.1327, pruned_loss=0.02179, audio_tagging_loss=0.008566, over 15328.00 frames. ], tot_loss[loss=0.0759, simple_loss=0.09811, pruned_loss=0.01723, audio_tagging_loss=0.009608, over 3052877.93 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:15:46,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1406666.6666666667, ans=0.125 2023-11-21 07:15:57,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1406733.3333333333, ans=22.5 2023-11-21 07:16:16,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.176e+01 8.712e+01 9.397e+01 1.234e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-21 07:16:19,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2023-11-21 07:16:41,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406933.3333333333, ans=0.1 2023-11-21 07:16:47,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211050 2023-11-21 07:16:50,690 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6650, loss[loss=0.06826, simple_loss=0.08775, pruned_loss=0.01751, audio_tagging_loss=0.006883, over 15090.00 frames. ], tot_loss[loss=0.07504, simple_loss=0.09703, pruned_loss=0.01704, audio_tagging_loss=0.009481, over 3050029.18 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:17:18,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-21 07:17:43,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1407266.6666666667, ans=0.04949747468305833 2023-11-21 07:17:43,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1407266.6666666667, ans=0.0 2023-11-21 07:17:53,706 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211100 2023-11-21 07:17:55,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1407333.3333333333, ans=0.125 2023-11-21 07:17:56,137 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6700, loss[loss=0.09624, simple_loss=0.1295, pruned_loss=0.02582, audio_tagging_loss=0.005653, over 15758.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09798, pruned_loss=0.01734, audio_tagging_loss=0.009379, over 3050301.89 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:18:27,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 8.113e+01 8.605e+01 9.333e+01 1.253e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-21 07:18:27,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1407466.6666666667, ans=0.2 2023-11-21 07:18:41,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1407533.3333333333, ans=0.125 2023-11-21 07:18:43,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1407533.3333333333, ans=0.07 2023-11-21 07:18:44,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1407533.3333333333, ans=0.125 2023-11-21 07:18:46,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1407533.3333333333, ans=0.0 2023-11-21 07:18:52,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-21 07:18:58,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211150 2023-11-21 07:19:00,702 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6750, loss[loss=0.07361, simple_loss=0.09105, pruned_loss=0.01893, audio_tagging_loss=0.009156, over 15165.00 frames. ], tot_loss[loss=0.07484, simple_loss=0.09672, pruned_loss=0.01712, audio_tagging_loss=0.009368, over 3048495.39 frames. ], batch size: 57, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:19:28,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1407800.0, ans=0.2 2023-11-21 07:19:46,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1407866.6666666667, ans=0.2 2023-11-21 07:19:47,768 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:20:03,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211200 2023-11-21 07:20:04,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1408000.0, ans=0.0 2023-11-21 07:20:06,270 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6800, loss[loss=0.07772, simple_loss=0.1126, pruned_loss=0.01578, audio_tagging_loss=0.005652, over 14985.00 frames. ], tot_loss[loss=0.07478, simple_loss=0.09654, pruned_loss=0.01707, audio_tagging_loss=0.00943, over 3043229.79 frames. ], batch size: 54, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:20:09,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-21 07:20:11,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1408000.0, ans=0.125 2023-11-21 07:20:17,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1408000.0, ans=0.125 2023-11-21 07:20:37,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.078e+01 8.657e+01 9.285e+01 1.100e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 07:20:42,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1408133.3333333333, ans=0.0 2023-11-21 07:20:53,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1408200.0, ans=0.5 2023-11-21 07:20:54,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1408200.0, ans=10.0 2023-11-21 07:20:57,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1408266.6666666667, ans=0.125 2023-11-21 07:21:03,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1408266.6666666667, ans=0.07 2023-11-21 07:21:06,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1408266.6666666667, ans=0.125 2023-11-21 07:21:09,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211250 2023-11-21 07:21:11,850 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6850, loss[loss=0.05, simple_loss=0.06837, pruned_loss=0.006611, audio_tagging_loss=0.009204, over 15049.00 frames. ], tot_loss[loss=0.07459, simple_loss=0.09642, pruned_loss=0.01698, audio_tagging_loss=0.009397, over 3032160.40 frames. ], batch size: 57, lr: 3.81e-03, grad_scale: 32.0 2023-11-21 07:21:19,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1408333.3333333333, ans=0.125 2023-11-21 07:21:21,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1408333.3333333333, ans=0.0 2023-11-21 07:21:33,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2023-11-21 07:21:54,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1408533.3333333333, ans=0.2 2023-11-21 07:21:56,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1408533.3333333333, ans=0.2 2023-11-21 07:22:02,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1408533.3333333333, ans=0.125 2023-11-21 07:22:08,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1408600.0, ans=0.0 2023-11-21 07:22:14,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211300 2023-11-21 07:22:16,899 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6900, loss[loss=0.07498, simple_loss=0.09469, pruned_loss=0.01769, audio_tagging_loss=0.009939, over 15718.00 frames. ], tot_loss[loss=0.07466, simple_loss=0.09632, pruned_loss=0.017, audio_tagging_loss=0.009505, over 3029843.57 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:22:17,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1408666.6666666667, ans=0.1 2023-11-21 07:22:22,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1408666.6666666667, ans=0.125 2023-11-21 07:22:28,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1408733.3333333333, ans=0.125 2023-11-21 07:22:50,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.236e+01 8.770e+01 9.429e+01 1.247e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 07:23:06,764 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 07:23:16,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1408933.3333333333, ans=0.125 2023-11-21 07:23:18,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211350 2023-11-21 07:23:21,807 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 6950, loss[loss=0.06709, simple_loss=0.09158, pruned_loss=0.0118, audio_tagging_loss=0.009494, over 15590.00 frames. ], tot_loss[loss=0.07562, simple_loss=0.09761, pruned_loss=0.01735, audio_tagging_loss=0.009464, over 3036750.49 frames. ], batch size: 58, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:23:35,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=12.0 2023-11-21 07:24:14,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1409266.6666666667, ans=0.07 2023-11-21 07:24:25,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211400 2023-11-21 07:24:25,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1409266.6666666667, ans=0.125 2023-11-21 07:24:27,833 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7000, loss[loss=0.08058, simple_loss=0.1079, pruned_loss=0.01852, audio_tagging_loss=0.008114, over 14858.00 frames. ], tot_loss[loss=0.07537, simple_loss=0.09725, pruned_loss=0.01728, audio_tagging_loss=0.009454, over 3033851.66 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 16.0 2023-11-21 07:24:34,347 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:24:48,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1409400.0, ans=0.2 2023-11-21 07:24:53,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1409466.6666666667, ans=0.125 2023-11-21 07:24:59,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.727e+01 7.925e+01 8.553e+01 9.217e+01 1.576e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 07:25:05,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1409533.3333333333, ans=0.0 2023-11-21 07:25:06,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1409533.3333333333, ans=0.125 2023-11-21 07:25:12,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-21 07:25:23,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1409600.0, ans=0.125 2023-11-21 07:25:30,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211450 2023-11-21 07:25:32,873 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7050, loss[loss=0.06372, simple_loss=0.08174, pruned_loss=0.01306, audio_tagging_loss=0.009779, over 16438.00 frames. ], tot_loss[loss=0.07567, simple_loss=0.0977, pruned_loss=0.01733, audio_tagging_loss=0.00949, over 3027677.07 frames. ], batch size: 62, lr: 3.80e-03, grad_scale: 16.0 2023-11-21 07:25:42,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1409666.6666666667, ans=15.0 2023-11-21 07:25:50,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-21 07:26:06,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1409800.0, ans=0.0 2023-11-21 07:26:06,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-21 07:26:15,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1409866.6666666667, ans=0.125 2023-11-21 07:26:24,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-21 07:26:34,595 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211500 2023-11-21 07:26:36,959 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7100, loss[loss=0.06519, simple_loss=0.08287, pruned_loss=0.01455, audio_tagging_loss=0.009199, over 15363.00 frames. ], tot_loss[loss=0.07548, simple_loss=0.09739, pruned_loss=0.01725, audio_tagging_loss=0.009532, over 3040265.67 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 16.0 2023-11-21 07:26:51,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1410066.6666666667, ans=0.125 2023-11-21 07:26:51,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1410066.6666666667, ans=0.125 2023-11-21 07:27:03,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-11-21 07:27:10,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 8.252e+01 8.974e+01 1.013e+02 1.228e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-21 07:27:16,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1410200.0, ans=0.125 2023-11-21 07:27:20,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2023-11-21 07:27:23,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1410200.0, ans=0.125 2023-11-21 07:27:33,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2023-11-21 07:27:41,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211550 2023-11-21 07:27:43,607 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7150, loss[loss=0.08138, simple_loss=0.1146, pruned_loss=0.01628, audio_tagging_loss=0.007814, over 15149.00 frames. ], tot_loss[loss=0.07673, simple_loss=0.099, pruned_loss=0.01765, audio_tagging_loss=0.009577, over 3039212.28 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 16.0 2023-11-21 07:27:49,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-21 07:28:09,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1410466.6666666667, ans=0.125 2023-11-21 07:28:18,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1410466.6666666667, ans=0.125 2023-11-21 07:28:24,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-21 07:28:39,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1410600.0, ans=0.0 2023-11-21 07:28:45,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211600 2023-11-21 07:28:48,040 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7200, loss[loss=0.08643, simple_loss=0.1007, pruned_loss=0.02268, audio_tagging_loss=0.0134, over 15673.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.09896, pruned_loss=0.01769, audio_tagging_loss=0.009698, over 3035992.66 frames. ], batch size: 59, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:28:57,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-11-21 07:29:01,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1410733.3333333333, ans=0.125 2023-11-21 07:29:01,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1410733.3333333333, ans=0.125 2023-11-21 07:29:11,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1410733.3333333333, ans=0.0 2023-11-21 07:29:11,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1410733.3333333333, ans=0.0 2023-11-21 07:29:17,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1410800.0, ans=0.2 2023-11-21 07:29:19,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1410800.0, ans=0.1 2023-11-21 07:29:20,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 7.997e+01 8.820e+01 9.312e+01 1.213e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 07:29:35,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1410866.6666666667, ans=0.0 2023-11-21 07:29:37,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1410866.6666666667, ans=0.125 2023-11-21 07:29:50,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211650 2023-11-21 07:29:52,671 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7250, loss[loss=0.0757, simple_loss=0.09589, pruned_loss=0.01855, audio_tagging_loss=0.009203, over 13916.00 frames. ], tot_loss[loss=0.0771, simple_loss=0.09899, pruned_loss=0.01776, audio_tagging_loss=0.009848, over 3034143.08 frames. ], batch size: 53, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:30:10,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1411066.6666666667, ans=0.0 2023-11-21 07:30:37,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=22.5 2023-11-21 07:30:44,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1411266.6666666667, ans=0.0 2023-11-21 07:30:55,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211700 2023-11-21 07:30:58,996 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7300, loss[loss=0.06098, simple_loss=0.0689, pruned_loss=0.01381, audio_tagging_loss=0.01272, over 14749.00 frames. ], tot_loss[loss=0.07675, simple_loss=0.09874, pruned_loss=0.01765, audio_tagging_loss=0.009731, over 3030682.20 frames. ], batch size: 58, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:31:00,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1411333.3333333333, ans=0.125 2023-11-21 07:31:07,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1411333.3333333333, ans=0.0 2023-11-21 07:31:18,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:31:19,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1411400.0, ans=0.125 2023-11-21 07:31:25,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1411466.6666666667, ans=0.125 2023-11-21 07:31:30,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 8.326e+01 8.844e+01 9.526e+01 1.717e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 07:31:36,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1411533.3333333333, ans=0.025 2023-11-21 07:32:01,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211750 2023-11-21 07:32:03,688 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7350, loss[loss=0.08375, simple_loss=0.1148, pruned_loss=0.01904, audio_tagging_loss=0.007283, over 14908.00 frames. ], tot_loss[loss=0.07659, simple_loss=0.09895, pruned_loss=0.01759, audio_tagging_loss=0.009523, over 3034842.93 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:32:03,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1411666.6666666667, ans=0.125 2023-11-21 07:32:47,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-21 07:32:53,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-21 07:32:54,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1411933.3333333333, ans=0.0 2023-11-21 07:32:55,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1411933.3333333333, ans=0.125 2023-11-21 07:33:04,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211800 2023-11-21 07:33:07,571 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7400, loss[loss=0.06267, simple_loss=0.08246, pruned_loss=0.01267, audio_tagging_loss=0.008765, over 14934.00 frames. ], tot_loss[loss=0.07574, simple_loss=0.09788, pruned_loss=0.01731, audio_tagging_loss=0.009488, over 3036966.58 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:33:40,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.375e+01 7.896e+01 8.684e+01 9.314e+01 1.199e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 07:34:05,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-21 07:34:07,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2023-11-21 07:34:09,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211850 2023-11-21 07:34:12,024 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7450, loss[loss=0.07703, simple_loss=0.1043, pruned_loss=0.01585, audio_tagging_loss=0.009052, over 15472.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09788, pruned_loss=0.01732, audio_tagging_loss=0.009392, over 3038281.43 frames. ], batch size: 60, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:34:21,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-21 07:34:28,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1412400.0, ans=0.0 2023-11-21 07:34:41,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1412466.6666666667, ans=0.2 2023-11-21 07:34:47,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1412466.6666666667, ans=0.125 2023-11-21 07:34:53,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1412533.3333333333, ans=0.0 2023-11-21 07:34:57,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2023-11-21 07:35:04,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2023-11-21 07:35:06,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1412600.0, ans=0.1 2023-11-21 07:35:13,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211900 2023-11-21 07:35:14,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1412666.6666666667, ans=0.125 2023-11-21 07:35:15,892 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7500, loss[loss=0.08042, simple_loss=0.1089, pruned_loss=0.01875, audio_tagging_loss=0.007225, over 14987.00 frames. ], tot_loss[loss=0.07521, simple_loss=0.09695, pruned_loss=0.01727, audio_tagging_loss=0.009467, over 3038133.04 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:35:21,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1412666.6666666667, ans=0.1 2023-11-21 07:35:27,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1412733.3333333333, ans=0.125 2023-11-21 07:35:29,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1412733.3333333333, ans=0.125 2023-11-21 07:35:33,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1412733.3333333333, ans=0.125 2023-11-21 07:35:37,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2023-11-21 07:35:47,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.064e+01 8.532e+01 9.428e+01 1.202e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-21 07:36:17,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 211950 2023-11-21 07:36:20,295 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7550, loss[loss=0.07546, simple_loss=0.09232, pruned_loss=0.01952, audio_tagging_loss=0.009776, over 15262.00 frames. ], tot_loss[loss=0.07485, simple_loss=0.09639, pruned_loss=0.01714, audio_tagging_loss=0.009513, over 3036977.79 frames. ], batch size: 56, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:36:33,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1413066.6666666667, ans=0.0 2023-11-21 07:36:35,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1413066.6666666667, ans=10.0 2023-11-21 07:36:38,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1413066.6666666667, ans=10.0 2023-11-21 07:36:40,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1413066.6666666667, ans=0.125 2023-11-21 07:37:05,876 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:37:06,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-21 07:37:15,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1413266.6666666667, ans=0.2 2023-11-21 07:37:17,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1413266.6666666667, ans=0.1 2023-11-21 07:37:22,419 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212000 2023-11-21 07:37:28,039 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7600, loss[loss=0.07709, simple_loss=0.1027, pruned_loss=0.01707, audio_tagging_loss=0.008681, over 15451.00 frames. ], tot_loss[loss=0.07449, simple_loss=0.09585, pruned_loss=0.01702, audio_tagging_loss=0.009547, over 3040537.28 frames. ], batch size: 56, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:37:33,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-21 07:37:45,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1413400.0, ans=0.125 2023-11-21 07:38:00,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.727e+01 8.299e+01 8.757e+01 9.275e+01 1.161e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 07:38:30,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212050 2023-11-21 07:38:32,603 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7650, loss[loss=0.06915, simple_loss=0.09099, pruned_loss=0.01544, audio_tagging_loss=0.008211, over 14832.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09597, pruned_loss=0.01699, audio_tagging_loss=0.009563, over 3042933.75 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:38:54,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1413733.3333333333, ans=0.0 2023-11-21 07:39:06,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-21 07:39:34,879 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212100 2023-11-21 07:39:36,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1414000.0, ans=0.125 2023-11-21 07:39:37,889 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7700, loss[loss=0.09686, simple_loss=0.121, pruned_loss=0.02631, audio_tagging_loss=0.01004, over 14508.00 frames. ], tot_loss[loss=0.07486, simple_loss=0.09655, pruned_loss=0.0171, audio_tagging_loss=0.009484, over 3050993.43 frames. ], batch size: 54, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:39:52,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2023-11-21 07:40:10,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.127e+01 8.808e+01 9.594e+01 1.135e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 07:40:13,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-21 07:40:14,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-21 07:40:39,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1414266.6666666667, ans=0.125 2023-11-21 07:40:40,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212150 2023-11-21 07:40:42,654 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7750, loss[loss=0.07231, simple_loss=0.09414, pruned_loss=0.0155, audio_tagging_loss=0.009745, over 15891.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09669, pruned_loss=0.01696, audio_tagging_loss=0.009528, over 3045851.78 frames. ], batch size: 60, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:40:51,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1414333.3333333333, ans=0.07 2023-11-21 07:40:57,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1414400.0, ans=0.0 2023-11-21 07:40:57,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2023-11-21 07:40:59,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1414400.0, ans=0.2 2023-11-21 07:41:46,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212200 2023-11-21 07:41:49,590 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7800, loss[loss=0.06803, simple_loss=0.08187, pruned_loss=0.01586, audio_tagging_loss=0.01124, over 14532.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09703, pruned_loss=0.0171, audio_tagging_loss=0.009543, over 3046374.40 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:42:21,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.253e+01 8.940e+01 9.769e+01 1.668e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-21 07:42:34,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1414866.6666666667, ans=0.125 2023-11-21 07:42:47,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1414933.3333333333, ans=0.125 2023-11-21 07:42:51,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212250 2023-11-21 07:42:53,723 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7850, loss[loss=0.07483, simple_loss=0.0969, pruned_loss=0.01629, audio_tagging_loss=0.0101, over 13937.00 frames. ], tot_loss[loss=0.07545, simple_loss=0.09737, pruned_loss=0.01717, audio_tagging_loss=0.009599, over 3037294.45 frames. ], batch size: 54, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:42:57,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-21 07:42:57,713 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:42:59,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1415000.0, ans=0.0 2023-11-21 07:43:14,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-21 07:43:15,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1415066.6666666667, ans=0.125 2023-11-21 07:43:56,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212300 2023-11-21 07:43:59,270 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7900, loss[loss=0.06623, simple_loss=0.09473, pruned_loss=0.009688, audio_tagging_loss=0.009178, over 14811.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.09719, pruned_loss=0.01717, audio_tagging_loss=0.009787, over 3033599.95 frames. ], batch size: 56, lr: 3.80e-03, grad_scale: 16.0 2023-11-21 07:44:07,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1415333.3333333333, ans=0.125 2023-11-21 07:44:08,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-21 07:44:29,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1415466.6666666667, ans=0.125 2023-11-21 07:44:31,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.631e+01 8.051e+01 8.734e+01 9.440e+01 1.312e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 07:44:56,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1415600.0, ans=0.2 2023-11-21 07:44:58,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1415600.0, ans=0.2 2023-11-21 07:44:59,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-21 07:45:01,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212350 2023-11-21 07:45:03,396 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 7950, loss[loss=0.05946, simple_loss=0.07627, pruned_loss=0.01175, audio_tagging_loss=0.009566, over 14200.00 frames. ], tot_loss[loss=0.07528, simple_loss=0.09648, pruned_loss=0.01711, audio_tagging_loss=0.009933, over 3037843.99 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 16.0 2023-11-21 07:45:17,050 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 07:45:19,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1415733.3333333333, ans=0.125 2023-11-21 07:45:41,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1415866.6666666667, ans=0.125 2023-11-21 07:45:50,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1415866.6666666667, ans=0.125 2023-11-21 07:46:00,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1415933.3333333333, ans=0.2 2023-11-21 07:46:01,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1415933.3333333333, ans=0.0 2023-11-21 07:46:04,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212400 2023-11-21 07:46:06,784 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8000, loss[loss=0.08487, simple_loss=0.1127, pruned_loss=0.01939, audio_tagging_loss=0.009125, over 15431.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09531, pruned_loss=0.01682, audio_tagging_loss=0.01007, over 3036325.49 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:46:16,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1416000.0, ans=0.5 2023-11-21 07:46:23,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1416066.6666666667, ans=0.0 2023-11-21 07:46:40,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 7.990e+01 8.488e+01 9.256e+01 1.198e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-21 07:46:50,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-21 07:47:07,387 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212450 2023-11-21 07:47:09,799 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8050, loss[loss=0.06968, simple_loss=0.1031, pruned_loss=0.01078, audio_tagging_loss=0.007374, over 14671.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09496, pruned_loss=0.01677, audio_tagging_loss=0.01015, over 3031391.43 frames. ], batch size: 55, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:47:18,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1416333.3333333333, ans=0.2 2023-11-21 07:47:51,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1416533.3333333333, ans=0.125 2023-11-21 07:48:03,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-11-21 07:48:12,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-21 07:48:14,151 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212500 2023-11-21 07:48:14,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1416600.0, ans=0.1 2023-11-21 07:48:16,504 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8100, loss[loss=0.06235, simple_loss=0.07507, pruned_loss=0.01281, audio_tagging_loss=0.01201, over 14661.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.09445, pruned_loss=0.01667, audio_tagging_loss=0.01005, over 3028306.27 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 32.0 2023-11-21 07:48:18,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1416666.6666666667, ans=0.125 2023-11-21 07:48:28,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1416733.3333333333, ans=0.0 2023-11-21 07:48:48,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.732e+01 8.164e+01 8.831e+01 9.512e+01 1.309e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-21 07:49:08,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1416933.3333333333, ans=0.1 2023-11-21 07:49:18,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212550 2023-11-21 07:49:20,832 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8150, loss[loss=0.0796, simple_loss=0.1015, pruned_loss=0.01954, audio_tagging_loss=0.009328, over 14908.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09544, pruned_loss=0.01677, audio_tagging_loss=0.009912, over 3036655.75 frames. ], batch size: 58, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:49:47,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1417133.3333333333, ans=0.0 2023-11-21 07:50:21,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212600 2023-11-21 07:50:24,230 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8200, loss[loss=0.05878, simple_loss=0.07892, pruned_loss=0.01221, audio_tagging_loss=0.007113, over 13921.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09592, pruned_loss=0.01682, audio_tagging_loss=0.009759, over 3040749.82 frames. ], batch size: 53, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:50:24,288 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 07:50:44,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1417400.0, ans=0.125 2023-11-21 07:50:50,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-11-21 07:50:53,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-21 07:50:58,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.391e+01 9.092e+01 9.919e+01 1.241e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-21 07:51:00,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1417466.6666666667, ans=0.125 2023-11-21 07:51:11,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2023-11-21 07:51:23,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1417600.0, ans=0.125 2023-11-21 07:51:26,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212650 2023-11-21 07:51:29,797 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8250, loss[loss=0.1136, simple_loss=0.155, pruned_loss=0.03287, audio_tagging_loss=0.003289, over 15800.00 frames. ], tot_loss[loss=0.07462, simple_loss=0.09607, pruned_loss=0.01694, audio_tagging_loss=0.009643, over 3045900.51 frames. ], batch size: 55, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:51:30,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1417666.6666666667, ans=0.2 2023-11-21 07:51:45,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1417733.3333333333, ans=0.125 2023-11-21 07:52:32,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212700 2023-11-21 07:52:35,018 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8300, loss[loss=0.06098, simple_loss=0.07731, pruned_loss=0.0123, audio_tagging_loss=0.01002, over 15360.00 frames. ], tot_loss[loss=0.07472, simple_loss=0.09655, pruned_loss=0.01687, audio_tagging_loss=0.009577, over 3057857.44 frames. ], batch size: 60, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:52:41,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1418000.0, ans=0.2 2023-11-21 07:52:50,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=1418066.6666666667, ans=8.0 2023-11-21 07:52:55,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2023-11-21 07:53:06,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-21 07:53:10,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.350e+01 9.034e+01 9.751e+01 1.328e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 07:53:21,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-21 07:53:33,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1418266.6666666667, ans=0.1 2023-11-21 07:53:37,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212750 2023-11-21 07:53:39,697 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8350, loss[loss=0.07021, simple_loss=0.09791, pruned_loss=0.012, audio_tagging_loss=0.009259, over 15168.00 frames. ], tot_loss[loss=0.07497, simple_loss=0.09718, pruned_loss=0.01685, audio_tagging_loss=0.009539, over 3058894.57 frames. ], batch size: 58, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 07:54:11,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1418466.6666666667, ans=0.0 2023-11-21 07:54:12,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1418466.6666666667, ans=0.125 2023-11-21 07:54:15,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1418466.6666666667, ans=0.1 2023-11-21 07:54:21,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1418533.3333333333, ans=0.125 2023-11-21 07:54:27,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1418533.3333333333, ans=0.5 2023-11-21 07:54:35,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=15.0 2023-11-21 07:54:39,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1418600.0, ans=0.1 2023-11-21 07:54:42,458 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212800 2023-11-21 07:54:45,235 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8400, loss[loss=0.08778, simple_loss=0.1183, pruned_loss=0.02049, audio_tagging_loss=0.008114, over 14921.00 frames. ], tot_loss[loss=0.07504, simple_loss=0.09714, pruned_loss=0.01702, audio_tagging_loss=0.009452, over 3056462.00 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:54:50,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1418666.6666666667, ans=0.2 2023-11-21 07:54:55,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-21 07:54:57,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1418666.6666666667, ans=0.2 2023-11-21 07:55:06,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1418733.3333333333, ans=0.125 2023-11-21 07:55:11,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1418800.0, ans=0.0 2023-11-21 07:55:18,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1418800.0, ans=0.1 2023-11-21 07:55:21,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.837e+01 7.810e+01 8.454e+01 9.398e+01 1.177e+02, threshold=1.691e+02, percent-clipped=0.0 2023-11-21 07:55:32,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1418866.6666666667, ans=0.125 2023-11-21 07:55:49,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212850 2023-11-21 07:55:51,686 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8450, loss[loss=0.07256, simple_loss=0.09409, pruned_loss=0.01438, audio_tagging_loss=0.01114, over 15990.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.09666, pruned_loss=0.01705, audio_tagging_loss=0.009555, over 3056617.11 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:55:54,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1419000.0, ans=0.125 2023-11-21 07:55:54,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-21 07:56:24,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1419133.3333333333, ans=0.125 2023-11-21 07:56:32,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1419200.0, ans=0.0 2023-11-21 07:56:39,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-11-21 07:56:53,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212900 2023-11-21 07:56:55,736 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8500, loss[loss=0.07362, simple_loss=0.09075, pruned_loss=0.01587, audio_tagging_loss=0.01237, over 15507.00 frames. ], tot_loss[loss=0.0751, simple_loss=0.09652, pruned_loss=0.01717, audio_tagging_loss=0.009672, over 3070237.08 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:56:57,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1419333.3333333333, ans=0.2 2023-11-21 07:57:13,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1419400.0, ans=0.125 2023-11-21 07:57:14,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1419400.0, ans=0.2 2023-11-21 07:57:32,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.883e+01 8.044e+01 8.694e+01 9.368e+01 1.136e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 07:57:54,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1419600.0, ans=0.0 2023-11-21 07:57:54,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1419600.0, ans=0.2 2023-11-21 07:57:58,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 212950 2023-11-21 07:57:58,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1419600.0, ans=0.125 2023-11-21 07:58:01,291 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8550, loss[loss=0.06328, simple_loss=0.07714, pruned_loss=0.01426, audio_tagging_loss=0.01045, over 14725.00 frames. ], tot_loss[loss=0.07542, simple_loss=0.09695, pruned_loss=0.01724, audio_tagging_loss=0.009711, over 3067812.02 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:58:16,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1419733.3333333333, ans=0.125 2023-11-21 07:58:33,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1419800.0, ans=0.0 2023-11-21 07:59:03,687 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213000 2023-11-21 07:59:06,326 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8600, loss[loss=0.08573, simple_loss=0.1126, pruned_loss=0.01828, audio_tagging_loss=0.01113, over 15502.00 frames. ], tot_loss[loss=0.07615, simple_loss=0.09802, pruned_loss=0.01743, audio_tagging_loss=0.009716, over 3059045.30 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 07:59:06,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1420000.0, ans=0.2 2023-11-21 07:59:16,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:59:17,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1420000.0, ans=0.0 2023-11-21 07:59:18,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-21 07:59:24,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-21 07:59:26,957 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 07:59:36,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1420133.3333333333, ans=0.2 2023-11-21 07:59:38,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1420133.3333333333, ans=0.0 2023-11-21 07:59:40,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.787e+01 8.296e+01 8.926e+01 9.744e+01 1.392e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-21 07:59:50,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2023-11-21 07:59:53,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1420200.0, ans=0.125 2023-11-21 08:00:08,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213050 2023-11-21 08:00:10,635 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8650, loss[loss=0.06912, simple_loss=0.09276, pruned_loss=0.01446, audio_tagging_loss=0.008278, over 15063.00 frames. ], tot_loss[loss=0.07608, simple_loss=0.09782, pruned_loss=0.01736, audio_tagging_loss=0.00981, over 3053845.45 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:00:26,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1420400.0, ans=0.0 2023-11-21 08:00:40,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1420466.6666666667, ans=0.0 2023-11-21 08:00:46,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2023-11-21 08:00:50,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1420533.3333333333, ans=0.125 2023-11-21 08:00:56,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1420533.3333333333, ans=0.025 2023-11-21 08:01:04,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1420600.0, ans=0.2 2023-11-21 08:01:12,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213100 2023-11-21 08:01:15,608 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8700, loss[loss=0.07995, simple_loss=0.1045, pruned_loss=0.01915, audio_tagging_loss=0.00857, over 15226.00 frames. ], tot_loss[loss=0.07657, simple_loss=0.09836, pruned_loss=0.01759, audio_tagging_loss=0.009796, over 3053778.10 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:01:28,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1420733.3333333333, ans=0.125 2023-11-21 08:01:52,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.710e+01 8.075e+01 9.054e+01 1.029e+02 1.299e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-21 08:02:12,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-21 08:02:13,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1420933.3333333333, ans=0.125 2023-11-21 08:02:16,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1420933.3333333333, ans=0.0 2023-11-21 08:02:18,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213150 2023-11-21 08:02:19,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-21 08:02:21,299 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8750, loss[loss=0.07224, simple_loss=0.09352, pruned_loss=0.01716, audio_tagging_loss=0.008312, over 14639.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.09741, pruned_loss=0.01723, audio_tagging_loss=0.009851, over 3045572.88 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:02:23,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=8.0 2023-11-21 08:02:43,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1421066.6666666667, ans=0.125 2023-11-21 08:03:23,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213200 2023-11-21 08:03:26,485 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8800, loss[loss=0.05791, simple_loss=0.07963, pruned_loss=0.01063, audio_tagging_loss=0.007474, over 14769.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09653, pruned_loss=0.01682, audio_tagging_loss=0.009992, over 3038931.43 frames. ], batch size: 55, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 08:03:30,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1421333.3333333333, ans=0.1 2023-11-21 08:03:58,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1421466.6666666667, ans=0.125 2023-11-21 08:04:01,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1421466.6666666667, ans=0.125 2023-11-21 08:04:03,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.113e+01 8.751e+01 9.814e+01 1.127e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 08:04:17,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-21 08:04:27,674 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213250 2023-11-21 08:04:29,982 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8850, loss[loss=0.07811, simple_loss=0.1017, pruned_loss=0.01742, audio_tagging_loss=0.009857, over 14591.00 frames. ], tot_loss[loss=0.07489, simple_loss=0.09647, pruned_loss=0.0167, audio_tagging_loss=0.009945, over 3045582.72 frames. ], batch size: 54, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:04:43,103 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:04:53,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:05:16,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1421866.6666666667, ans=0.0 2023-11-21 08:05:33,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213300 2023-11-21 08:05:35,835 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8900, loss[loss=0.08059, simple_loss=0.1161, pruned_loss=0.01437, audio_tagging_loss=0.008173, over 15263.00 frames. ], tot_loss[loss=0.07512, simple_loss=0.09706, pruned_loss=0.01673, audio_tagging_loss=0.00986, over 3044535.45 frames. ], batch size: 55, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:06:04,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-21 08:06:08,850 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:06:12,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.536e+01 8.059e+01 8.546e+01 9.604e+01 1.318e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-21 08:06:27,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-21 08:06:34,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1422266.6666666667, ans=0.1 2023-11-21 08:06:37,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213350 2023-11-21 08:06:40,314 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 8950, loss[loss=0.07039, simple_loss=0.09961, pruned_loss=0.0121, audio_tagging_loss=0.008488, over 14747.00 frames. ], tot_loss[loss=0.07531, simple_loss=0.09753, pruned_loss=0.01683, audio_tagging_loss=0.009717, over 3044952.26 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:07:42,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213400 2023-11-21 08:07:45,032 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9000, loss[loss=0.08824, simple_loss=0.1098, pruned_loss=0.02415, audio_tagging_loss=0.009173, over 15382.00 frames. ], tot_loss[loss=0.07517, simple_loss=0.09713, pruned_loss=0.01692, audio_tagging_loss=0.009686, over 3040575.18 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:07:45,033 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 08:08:25,288 INFO [train_asr.py:1253] (1/4) Epoch 18, validation: loss=0.06098, simple_loss=0.05248, pruned_loss=0.005341, audio_tagging_loss=0.02939, over 4681554.00 frames. 2023-11-21 08:08:25,289 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 08:08:30,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1422666.6666666667, ans=0.0 2023-11-21 08:08:30,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1422666.6666666667, ans=0.1 2023-11-21 08:08:32,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1422666.6666666667, ans=0.125 2023-11-21 08:08:48,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1422733.3333333333, ans=0.0 2023-11-21 08:08:50,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-21 08:08:53,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1422800.0, ans=0.125 2023-11-21 08:09:01,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.253e+01 9.144e+01 1.038e+02 1.327e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-21 08:09:03,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1422866.6666666667, ans=0.2 2023-11-21 08:09:26,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213450 2023-11-21 08:09:28,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-21 08:09:29,104 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9050, loss[loss=0.08414, simple_loss=0.1133, pruned_loss=0.02113, audio_tagging_loss=0.006368, over 15012.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09797, pruned_loss=0.01708, audio_tagging_loss=0.009613, over 3046167.59 frames. ], batch size: 54, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:09:32,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2023-11-21 08:09:38,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1423000.0, ans=0.125 2023-11-21 08:09:42,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=22.5 2023-11-21 08:09:54,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1423133.3333333333, ans=0.0 2023-11-21 08:10:02,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-21 08:10:06,416 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:10:07,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1423200.0, ans=10.0 2023-11-21 08:10:10,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1423200.0, ans=0.05 2023-11-21 08:10:10,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-21 08:10:31,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213500 2023-11-21 08:10:33,718 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9100, loss[loss=0.06546, simple_loss=0.08869, pruned_loss=0.0127, audio_tagging_loss=0.008414, over 13453.00 frames. ], tot_loss[loss=0.07529, simple_loss=0.09779, pruned_loss=0.01693, audio_tagging_loss=0.009461, over 3050717.30 frames. ], batch size: 53, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:10:35,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1423333.3333333333, ans=0.125 2023-11-21 08:10:41,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-21 08:11:04,069 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:11:11,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.196e+01 8.647e+01 9.263e+01 1.186e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-21 08:11:27,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1423600.0, ans=0.125 2023-11-21 08:11:36,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213550 2023-11-21 08:11:40,099 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9150, loss[loss=0.07363, simple_loss=0.08978, pruned_loss=0.02112, audio_tagging_loss=0.007621, over 14830.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09749, pruned_loss=0.01698, audio_tagging_loss=0.009441, over 3046752.23 frames. ], batch size: 54, lr: 3.79e-03, grad_scale: 16.0 2023-11-21 08:12:12,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-21 08:12:19,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1423866.6666666667, ans=0.0 2023-11-21 08:12:19,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1423866.6666666667, ans=0.09899494936611666 2023-11-21 08:12:34,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1423933.3333333333, ans=0.125 2023-11-21 08:12:38,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-11-21 08:12:42,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213600 2023-11-21 08:12:45,262 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9200, loss[loss=0.08332, simple_loss=0.11, pruned_loss=0.01802, audio_tagging_loss=0.01029, over 15953.00 frames. ], tot_loss[loss=0.07497, simple_loss=0.09751, pruned_loss=0.01681, audio_tagging_loss=0.009401, over 3049061.28 frames. ], batch size: 59, lr: 3.79e-03, grad_scale: 32.0 2023-11-21 08:12:48,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1424000.0, ans=0.125 2023-11-21 08:13:05,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-21 08:13:14,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1424133.3333333333, ans=0.125 2023-11-21 08:13:23,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 7.966e+01 8.700e+01 9.351e+01 1.190e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 08:13:24,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424200.0, ans=0.1 2023-11-21 08:13:25,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1424200.0, ans=0.0 2023-11-21 08:13:40,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-21 08:13:47,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213650 2023-11-21 08:13:49,888 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9250, loss[loss=0.06679, simple_loss=0.08163, pruned_loss=0.01309, audio_tagging_loss=0.01288, over 15013.00 frames. ], tot_loss[loss=0.07474, simple_loss=0.09702, pruned_loss=0.0168, audio_tagging_loss=0.009434, over 3051989.60 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:14:07,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-21 08:14:13,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1424400.0, ans=0.125 2023-11-21 08:14:22,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-21 08:14:35,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1424533.3333333333, ans=0.2 2023-11-21 08:14:39,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424533.3333333333, ans=0.1 2023-11-21 08:14:49,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1424600.0, ans=0.0 2023-11-21 08:14:52,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213700 2023-11-21 08:14:54,991 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9300, loss[loss=0.08231, simple_loss=0.1082, pruned_loss=0.01624, audio_tagging_loss=0.01197, over 14717.00 frames. ], tot_loss[loss=0.07477, simple_loss=0.09734, pruned_loss=0.0167, audio_tagging_loss=0.009393, over 3049257.84 frames. ], batch size: 55, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:15:02,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424666.6666666667, ans=0.1 2023-11-21 08:15:12,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1424733.3333333333, ans=0.125 2023-11-21 08:15:23,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1424800.0, ans=0.125 2023-11-21 08:15:31,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.285e+01 7.980e+01 8.564e+01 9.426e+01 1.371e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 08:15:36,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1424866.6666666667, ans=0.125 2023-11-21 08:15:40,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1424866.6666666667, ans=0.125 2023-11-21 08:15:56,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1424933.3333333333, ans=0.125 2023-11-21 08:15:57,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213750 2023-11-21 08:15:59,933 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9350, loss[loss=0.07427, simple_loss=0.09786, pruned_loss=0.0172, audio_tagging_loss=0.008143, over 14742.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.09752, pruned_loss=0.01682, audio_tagging_loss=0.009355, over 3047652.30 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:16:01,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1425000.0, ans=0.125 2023-11-21 08:16:20,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1425066.6666666667, ans=0.125 2023-11-21 08:16:28,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1425133.3333333333, ans=0.0 2023-11-21 08:16:41,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1425200.0, ans=0.125 2023-11-21 08:16:50,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1425200.0, ans=0.0 2023-11-21 08:17:02,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213800 2023-11-21 08:17:02,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1425266.6666666667, ans=0.125 2023-11-21 08:17:05,130 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9400, loss[loss=0.06035, simple_loss=0.07474, pruned_loss=0.01294, audio_tagging_loss=0.01003, over 15777.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.09793, pruned_loss=0.0171, audio_tagging_loss=0.009483, over 3052224.08 frames. ], batch size: 60, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:17:06,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1425333.3333333333, ans=0.125 2023-11-21 08:17:10,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1425333.3333333333, ans=0.125 2023-11-21 08:17:14,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=12.0 2023-11-21 08:17:23,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-21 08:17:27,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1425400.0, ans=0.125 2023-11-21 08:17:40,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2023-11-21 08:17:43,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.689e+01 8.186e+01 8.902e+01 9.704e+01 1.344e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 08:17:43,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1425533.3333333333, ans=0.125 2023-11-21 08:17:47,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1425533.3333333333, ans=0.1 2023-11-21 08:17:48,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.96 vs. limit=22.5 2023-11-21 08:17:52,518 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:18:08,504 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:18:08,567 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213850 2023-11-21 08:18:10,868 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9450, loss[loss=0.05807, simple_loss=0.07319, pruned_loss=0.01071, audio_tagging_loss=0.01076, over 15728.00 frames. ], tot_loss[loss=0.07496, simple_loss=0.09697, pruned_loss=0.01694, audio_tagging_loss=0.009541, over 3054812.81 frames. ], batch size: 59, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:18:50,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=22.5 2023-11-21 08:18:55,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-11-21 08:19:06,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=12.0 2023-11-21 08:19:13,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213900 2023-11-21 08:19:16,320 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9500, loss[loss=0.07568, simple_loss=0.09402, pruned_loss=0.01901, audio_tagging_loss=0.009658, over 15920.00 frames. ], tot_loss[loss=0.07589, simple_loss=0.09789, pruned_loss=0.01734, audio_tagging_loss=0.009607, over 3050272.40 frames. ], batch size: 60, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:19:19,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1426000.0, ans=0.125 2023-11-21 08:19:19,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1426000.0, ans=0.2 2023-11-21 08:19:20,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1426000.0, ans=0.0 2023-11-21 08:19:38,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1426066.6666666667, ans=0.0 2023-11-21 08:19:42,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-21 08:19:44,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1426133.3333333333, ans=0.125 2023-11-21 08:19:53,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.280e+01 9.020e+01 9.623e+01 1.395e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-21 08:20:00,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1426200.0, ans=0.95 2023-11-21 08:20:17,997 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 213950 2023-11-21 08:20:20,261 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9550, loss[loss=0.07888, simple_loss=0.1053, pruned_loss=0.01341, audio_tagging_loss=0.0128, over 15364.00 frames. ], tot_loss[loss=0.07607, simple_loss=0.09832, pruned_loss=0.01729, audio_tagging_loss=0.009614, over 3051804.34 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:20:35,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-11-21 08:20:45,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1426400.0, ans=0.2 2023-11-21 08:20:52,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1426466.6666666667, ans=0.0 2023-11-21 08:21:02,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1426533.3333333333, ans=0.125 2023-11-21 08:21:20,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1426600.0, ans=0.035 2023-11-21 08:21:22,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214000 2023-11-21 08:21:25,344 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9600, loss[loss=0.07668, simple_loss=0.1134, pruned_loss=0.01362, audio_tagging_loss=0.006353, over 15677.00 frames. ], tot_loss[loss=0.07622, simple_loss=0.09854, pruned_loss=0.01731, audio_tagging_loss=0.009648, over 3048964.02 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:21:26,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1426666.6666666667, ans=0.125 2023-11-21 08:21:26,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1426666.6666666667, ans=0.2 2023-11-21 08:21:41,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-21 08:22:02,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.086e+01 8.937e+01 9.718e+01 1.374e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-21 08:22:03,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1426866.6666666667, ans=0.125 2023-11-21 08:22:03,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1426866.6666666667, ans=0.0 2023-11-21 08:22:11,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=1426866.6666666667, ans=12.0 2023-11-21 08:22:25,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1426933.3333333333, ans=0.0 2023-11-21 08:22:29,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214050 2023-11-21 08:22:31,861 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9650, loss[loss=0.09158, simple_loss=0.1246, pruned_loss=0.02192, audio_tagging_loss=0.00737, over 16971.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.09736, pruned_loss=0.01717, audio_tagging_loss=0.00971, over 3050415.19 frames. ], batch size: 63, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:22:45,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1427066.6666666667, ans=0.125 2023-11-21 08:23:13,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1427200.0, ans=0.125 2023-11-21 08:23:24,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1427266.6666666667, ans=0.02 2023-11-21 08:23:34,027 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214100 2023-11-21 08:23:36,415 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9700, loss[loss=0.0677, simple_loss=0.08713, pruned_loss=0.01368, audio_tagging_loss=0.01046, over 15141.00 frames. ], tot_loss[loss=0.07519, simple_loss=0.09699, pruned_loss=0.0171, audio_tagging_loss=0.009594, over 3045375.69 frames. ], batch size: 58, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:23:57,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1427400.0, ans=0.125 2023-11-21 08:24:10,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1427466.6666666667, ans=0.125 2023-11-21 08:24:13,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1427466.6666666667, ans=0.0 2023-11-21 08:24:16,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.525e+01 8.096e+01 8.550e+01 9.320e+01 1.224e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-21 08:24:21,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-21 08:24:35,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1427600.0, ans=0.125 2023-11-21 08:24:38,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214150 2023-11-21 08:24:39,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-21 08:24:41,418 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9750, loss[loss=0.08606, simple_loss=0.1147, pruned_loss=0.01981, audio_tagging_loss=0.008895, over 14876.00 frames. ], tot_loss[loss=0.07532, simple_loss=0.09721, pruned_loss=0.01724, audio_tagging_loss=0.009472, over 3046217.33 frames. ], batch size: 55, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:24:58,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1427733.3333333333, ans=0.0 2023-11-21 08:25:08,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1427800.0, ans=0.125 2023-11-21 08:25:19,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1427866.6666666667, ans=0.125 2023-11-21 08:25:32,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1427933.3333333333, ans=0.125 2023-11-21 08:25:39,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-21 08:25:45,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214200 2023-11-21 08:25:47,813 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9800, loss[loss=0.07594, simple_loss=0.0862, pruned_loss=0.02156, audio_tagging_loss=0.01128, over 16080.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09758, pruned_loss=0.01734, audio_tagging_loss=0.009529, over 3050383.80 frames. ], batch size: 60, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:25:48,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1428000.0, ans=15.0 2023-11-21 08:26:08,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1428066.6666666667, ans=0.2 2023-11-21 08:26:11,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1428066.6666666667, ans=0.0 2023-11-21 08:26:22,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-21 08:26:26,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.204e+01 8.669e+01 9.345e+01 1.280e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 08:26:27,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1428200.0, ans=0.125 2023-11-21 08:26:27,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1428200.0, ans=0.025 2023-11-21 08:26:44,290 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:26:50,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214250 2023-11-21 08:26:52,998 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9850, loss[loss=0.06743, simple_loss=0.08729, pruned_loss=0.01344, audio_tagging_loss=0.01035, over 15722.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09814, pruned_loss=0.01718, audio_tagging_loss=0.009429, over 3053106.58 frames. ], batch size: 58, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:27:29,961 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.258e-02 2023-11-21 08:27:45,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1428600.0, ans=0.1 2023-11-21 08:27:49,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2023-11-21 08:27:52,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1428600.0, ans=0.125 2023-11-21 08:27:55,016 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214300 2023-11-21 08:27:57,965 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9900, loss[loss=0.06501, simple_loss=0.08686, pruned_loss=0.01132, audio_tagging_loss=0.01026, over 15438.00 frames. ], tot_loss[loss=0.07595, simple_loss=0.09838, pruned_loss=0.01735, audio_tagging_loss=0.009414, over 3053201.88 frames. ], batch size: 58, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:28:09,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-11-21 08:28:28,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1428800.0, ans=0.125 2023-11-21 08:28:37,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.117e+01 8.820e+01 9.379e+01 1.222e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 08:29:00,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1428933.3333333333, ans=0.125 2023-11-21 08:29:01,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214350 2023-11-21 08:29:03,561 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 9950, loss[loss=0.05457, simple_loss=0.07029, pruned_loss=0.01069, audio_tagging_loss=0.008733, over 14092.00 frames. ], tot_loss[loss=0.07609, simple_loss=0.0984, pruned_loss=0.01734, audio_tagging_loss=0.009546, over 3054236.38 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:29:06,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1429000.0, ans=0.125 2023-11-21 08:29:23,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1429066.6666666667, ans=0.2 2023-11-21 08:29:30,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1429133.3333333333, ans=0.0 2023-11-21 08:29:42,289 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:29:47,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1429200.0, ans=0.125 2023-11-21 08:30:06,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214400 2023-11-21 08:30:08,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-21 08:30:08,806 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10000, loss[loss=0.06481, simple_loss=0.07768, pruned_loss=0.01848, audio_tagging_loss=0.007485, over 15079.00 frames. ], tot_loss[loss=0.07592, simple_loss=0.09824, pruned_loss=0.01725, audio_tagging_loss=0.009554, over 3062455.33 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 32.0 2023-11-21 08:30:27,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-21 08:30:47,791 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.046e+01 8.852e+01 9.481e+01 1.142e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 08:31:04,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1429600.0, ans=10.0 2023-11-21 08:31:10,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214450 2023-11-21 08:31:12,991 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10050, loss[loss=0.08276, simple_loss=0.1155, pruned_loss=0.0167, audio_tagging_loss=0.008307, over 14624.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09876, pruned_loss=0.01715, audio_tagging_loss=0.009476, over 3059317.44 frames. ], batch size: 54, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:31:35,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1429733.3333333333, ans=0.125 2023-11-21 08:31:39,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1429800.0, ans=0.0 2023-11-21 08:31:42,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1429800.0, ans=0.1 2023-11-21 08:31:47,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429800.0, ans=0.1 2023-11-21 08:32:01,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1429866.6666666667, ans=0.125 2023-11-21 08:32:15,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1429933.3333333333, ans=0.125 2023-11-21 08:32:16,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214500 2023-11-21 08:32:19,102 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10100, loss[loss=0.08284, simple_loss=0.1068, pruned_loss=0.0163, audio_tagging_loss=0.01312, over 16127.00 frames. ], tot_loss[loss=0.07615, simple_loss=0.09849, pruned_loss=0.01725, audio_tagging_loss=0.009653, over 3058026.47 frames. ], batch size: 58, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:32:55,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1430133.3333333333, ans=0.0 2023-11-21 08:32:58,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.203e+01 8.739e+01 9.472e+01 1.226e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 08:33:06,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-21 08:33:08,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1430200.0, ans=0.2 2023-11-21 08:33:09,627 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:33:20,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214550 2023-11-21 08:33:20,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1430266.6666666667, ans=0.125 2023-11-21 08:33:22,985 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10150, loss[loss=0.09075, simple_loss=0.1179, pruned_loss=0.02155, audio_tagging_loss=0.01027, over 15319.00 frames. ], tot_loss[loss=0.07623, simple_loss=0.09843, pruned_loss=0.0173, audio_tagging_loss=0.009716, over 3057851.77 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:33:23,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1430333.3333333333, ans=0.125 2023-11-21 08:33:32,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1430333.3333333333, ans=0.125 2023-11-21 08:33:51,631 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:33:52,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-21 08:34:07,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1430533.3333333333, ans=0.125 2023-11-21 08:34:15,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1430600.0, ans=0.125 2023-11-21 08:34:18,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1430600.0, ans=0.125 2023-11-21 08:34:25,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214600 2023-11-21 08:34:28,277 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10200, loss[loss=0.0822, simple_loss=0.09038, pruned_loss=0.02262, audio_tagging_loss=0.01439, over 14579.00 frames. ], tot_loss[loss=0.07618, simple_loss=0.09791, pruned_loss=0.01734, audio_tagging_loss=0.009882, over 3049311.72 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:34:44,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1430733.3333333333, ans=0.0 2023-11-21 08:34:51,437 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:34:54,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-21 08:34:55,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1430800.0, ans=0.125 2023-11-21 08:35:08,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 8.036e+01 8.614e+01 9.487e+01 2.302e+02, threshold=1.723e+02, percent-clipped=1.0 2023-11-21 08:35:17,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1430866.6666666667, ans=0.0 2023-11-21 08:35:17,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1430866.6666666667, ans=0.2 2023-11-21 08:35:25,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1430933.3333333333, ans=0.125 2023-11-21 08:35:27,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1430933.3333333333, ans=0.125 2023-11-21 08:35:31,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214650 2023-11-21 08:35:33,557 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10250, loss[loss=0.07192, simple_loss=0.1001, pruned_loss=0.01287, audio_tagging_loss=0.009015, over 15257.00 frames. ], tot_loss[loss=0.07619, simple_loss=0.09769, pruned_loss=0.01741, audio_tagging_loss=0.009932, over 3048666.02 frames. ], batch size: 55, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:35:47,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1431066.6666666667, ans=10.0 2023-11-21 08:36:06,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1431133.3333333333, ans=15.0 2023-11-21 08:36:36,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214700 2023-11-21 08:36:38,820 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10300, loss[loss=0.06885, simple_loss=0.08074, pruned_loss=0.01554, audio_tagging_loss=0.01294, over 14380.00 frames. ], tot_loss[loss=0.07595, simple_loss=0.09678, pruned_loss=0.01752, audio_tagging_loss=0.01004, over 3045059.60 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:36:44,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-21 08:37:07,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1431466.6666666667, ans=0.2 2023-11-21 08:37:19,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.199e+01 8.798e+01 9.773e+01 1.568e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 08:37:22,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1431533.3333333333, ans=0.125 2023-11-21 08:37:24,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1431533.3333333333, ans=0.1 2023-11-21 08:37:41,027 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214750 2023-11-21 08:37:43,443 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10350, loss[loss=0.0591, simple_loss=0.07971, pruned_loss=0.01125, audio_tagging_loss=0.007998, over 14253.00 frames. ], tot_loss[loss=0.0764, simple_loss=0.09765, pruned_loss=0.01753, audio_tagging_loss=0.01005, over 3042738.02 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 16.0 2023-11-21 08:37:43,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1431666.6666666667, ans=0.125 2023-11-21 08:37:43,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2023-11-21 08:37:49,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1431666.6666666667, ans=0.125 2023-11-21 08:37:57,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1431733.3333333333, ans=0.0 2023-11-21 08:38:11,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1431800.0, ans=15.0 2023-11-21 08:38:13,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1431800.0, ans=0.0 2023-11-21 08:38:24,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1431866.6666666667, ans=0.125 2023-11-21 08:38:28,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1431866.6666666667, ans=0.05 2023-11-21 08:38:28,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1431866.6666666667, ans=0.125 2023-11-21 08:38:44,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1431933.3333333333, ans=0.125 2023-11-21 08:38:44,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1431933.3333333333, ans=0.125 2023-11-21 08:38:46,815 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214800 2023-11-21 08:38:49,878 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10400, loss[loss=0.08396, simple_loss=0.1118, pruned_loss=0.01995, audio_tagging_loss=0.008093, over 14258.00 frames. ], tot_loss[loss=0.077, simple_loss=0.09849, pruned_loss=0.0177, audio_tagging_loss=0.01005, over 3044151.61 frames. ], batch size: 54, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 08:38:51,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-11-21 08:39:31,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.104e+01 8.738e+01 9.653e+01 1.281e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 08:39:52,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214850 2023-11-21 08:39:53,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1432333.3333333333, ans=0.1 2023-11-21 08:39:54,694 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10450, loss[loss=0.06513, simple_loss=0.08247, pruned_loss=0.01239, audio_tagging_loss=0.0115, over 15911.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09628, pruned_loss=0.01697, audio_tagging_loss=0.01004, over 3043995.07 frames. ], batch size: 60, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:39:56,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1432333.3333333333, ans=0.125 2023-11-21 08:40:00,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-21 08:40:05,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1432333.3333333333, ans=0.125 2023-11-21 08:40:11,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1432400.0, ans=0.2 2023-11-21 08:40:13,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1432400.0, ans=0.0 2023-11-21 08:40:14,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1432400.0, ans=0.0 2023-11-21 08:40:25,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1432466.6666666667, ans=0.125 2023-11-21 08:40:26,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-21 08:40:27,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1432466.6666666667, ans=10.0 2023-11-21 08:40:39,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1432533.3333333333, ans=0.125 2023-11-21 08:40:40,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1432533.3333333333, ans=0.04949747468305833 2023-11-21 08:40:44,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-21 08:40:50,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1432600.0, ans=0.125 2023-11-21 08:40:50,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1432600.0, ans=0.2 2023-11-21 08:40:56,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214900 2023-11-21 08:40:59,129 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10500, loss[loss=0.06611, simple_loss=0.08845, pruned_loss=0.01164, audio_tagging_loss=0.01025, over 13687.00 frames. ], tot_loss[loss=0.07497, simple_loss=0.09654, pruned_loss=0.01687, audio_tagging_loss=0.00983, over 3037467.98 frames. ], batch size: 51, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:41:03,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1432666.6666666667, ans=0.0 2023-11-21 08:41:19,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1432733.3333333333, ans=0.125 2023-11-21 08:41:23,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2023-11-21 08:41:26,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1432800.0, ans=0.125 2023-11-21 08:41:40,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 7.879e+01 8.434e+01 9.185e+01 1.259e+02, threshold=1.687e+02, percent-clipped=0.0 2023-11-21 08:42:01,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 214950 2023-11-21 08:42:01,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1432933.3333333333, ans=0.125 2023-11-21 08:42:03,528 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10550, loss[loss=0.07415, simple_loss=0.09802, pruned_loss=0.01741, audio_tagging_loss=0.007731, over 16107.00 frames. ], tot_loss[loss=0.07517, simple_loss=0.09727, pruned_loss=0.01689, audio_tagging_loss=0.009646, over 3036155.65 frames. ], batch size: 57, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:42:27,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1433066.6666666667, ans=0.1 2023-11-21 08:42:36,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-21 08:42:47,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-11-21 08:43:06,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215000 2023-11-21 08:43:07,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1433333.3333333333, ans=0.1 2023-11-21 08:43:08,940 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10600, loss[loss=0.0707, simple_loss=0.09163, pruned_loss=0.0149, audio_tagging_loss=0.009984, over 15432.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09706, pruned_loss=0.017, audio_tagging_loss=0.009617, over 3040965.02 frames. ], batch size: 58, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:43:16,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1433333.3333333333, ans=0.125 2023-11-21 08:43:18,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1433333.3333333333, ans=0.125 2023-11-21 08:43:29,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1433400.0, ans=0.125 2023-11-21 08:43:32,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2023-11-21 08:43:37,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1433466.6666666667, ans=0.05 2023-11-21 08:43:37,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1433466.6666666667, ans=0.2 2023-11-21 08:43:38,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1433466.6666666667, ans=0.05 2023-11-21 08:43:47,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1433533.3333333333, ans=0.125 2023-11-21 08:43:50,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1433533.3333333333, ans=0.2 2023-11-21 08:43:51,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.541e+01 8.274e+01 8.887e+01 9.806e+01 1.375e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 08:44:03,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1433600.0, ans=0.2 2023-11-21 08:44:07,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1433600.0, ans=0.09899494936611666 2023-11-21 08:44:09,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215050 2023-11-21 08:44:11,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-21 08:44:12,278 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10650, loss[loss=0.07577, simple_loss=0.09546, pruned_loss=0.01948, audio_tagging_loss=0.008558, over 15524.00 frames. ], tot_loss[loss=0.07528, simple_loss=0.09737, pruned_loss=0.01702, audio_tagging_loss=0.00958, over 3047633.84 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 8.0 2023-11-21 08:44:12,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1433666.6666666667, ans=0.2 2023-11-21 08:44:12,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.33 vs. limit=10.0 2023-11-21 08:44:18,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1433666.6666666667, ans=0.125 2023-11-21 08:44:20,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1433666.6666666667, ans=0.2 2023-11-21 08:44:23,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1433733.3333333333, ans=0.125 2023-11-21 08:44:26,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1433733.3333333333, ans=0.125 2023-11-21 08:45:09,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1433933.3333333333, ans=0.0 2023-11-21 08:45:14,995 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215100 2023-11-21 08:45:17,248 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10700, loss[loss=0.06774, simple_loss=0.09128, pruned_loss=0.01446, audio_tagging_loss=0.007636, over 15585.00 frames. ], tot_loss[loss=0.07548, simple_loss=0.09748, pruned_loss=0.01716, audio_tagging_loss=0.009574, over 3039936.59 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 8.0 2023-11-21 08:45:18,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1434000.0, ans=0.0 2023-11-21 08:45:33,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1434066.6666666667, ans=0.0 2023-11-21 08:46:00,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.074e+01 8.707e+01 9.700e+01 1.286e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-21 08:46:14,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1434266.6666666667, ans=0.125 2023-11-21 08:46:19,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1434266.6666666667, ans=0.125 2023-11-21 08:46:21,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215150 2023-11-21 08:46:23,784 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10750, loss[loss=0.08118, simple_loss=0.0998, pruned_loss=0.01962, audio_tagging_loss=0.01166, over 15317.00 frames. ], tot_loss[loss=0.07447, simple_loss=0.09587, pruned_loss=0.01698, audio_tagging_loss=0.009552, over 3037844.30 frames. ], batch size: 58, lr: 3.77e-03, grad_scale: 8.0 2023-11-21 08:46:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1434400.0, ans=0.0 2023-11-21 08:46:58,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1434466.6666666667, ans=0.0 2023-11-21 08:47:20,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1434600.0, ans=0.1 2023-11-21 08:47:23,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2023-11-21 08:47:25,621 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215200 2023-11-21 08:47:28,307 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10800, loss[loss=0.05956, simple_loss=0.07603, pruned_loss=0.01141, audio_tagging_loss=0.01013, over 14946.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09668, pruned_loss=0.017, audio_tagging_loss=0.009465, over 3046637.80 frames. ], batch size: 58, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:47:31,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1434666.6666666667, ans=0.2 2023-11-21 08:47:34,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-21 08:47:46,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1434733.3333333333, ans=0.0 2023-11-21 08:47:55,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2023-11-21 08:47:56,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1434800.0, ans=0.125 2023-11-21 08:48:11,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.512e+01 8.195e+01 8.985e+01 9.543e+01 1.112e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-21 08:48:11,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-21 08:48:28,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1434933.3333333333, ans=0.1 2023-11-21 08:48:29,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215250 2023-11-21 08:48:32,527 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10850, loss[loss=0.06677, simple_loss=0.08958, pruned_loss=0.01284, audio_tagging_loss=0.009142, over 14733.00 frames. ], tot_loss[loss=0.07437, simple_loss=0.0958, pruned_loss=0.01694, audio_tagging_loss=0.009526, over 3044349.01 frames. ], batch size: 57, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:48:39,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1435000.0, ans=0.1 2023-11-21 08:48:58,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1435133.3333333333, ans=0.1 2023-11-21 08:49:06,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1435133.3333333333, ans=0.125 2023-11-21 08:49:08,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1435133.3333333333, ans=0.125 2023-11-21 08:49:17,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1435200.0, ans=0.2 2023-11-21 08:49:20,435 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:49:23,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1435266.6666666667, ans=0.2 2023-11-21 08:49:25,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1435266.6666666667, ans=0.0 2023-11-21 08:49:33,035 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:49:35,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215300 2023-11-21 08:49:38,437 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10900, loss[loss=0.08961, simple_loss=0.1127, pruned_loss=0.02126, audio_tagging_loss=0.01202, over 14971.00 frames. ], tot_loss[loss=0.07445, simple_loss=0.09587, pruned_loss=0.01693, audio_tagging_loss=0.009583, over 3037900.35 frames. ], batch size: 54, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:49:41,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1435333.3333333333, ans=0.2 2023-11-21 08:49:52,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-21 08:50:05,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1435466.6666666667, ans=0.125 2023-11-21 08:50:05,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1435466.6666666667, ans=0.125 2023-11-21 08:50:07,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1435466.6666666667, ans=0.05 2023-11-21 08:50:20,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.007e+01 7.936e+01 8.755e+01 9.807e+01 1.231e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 08:50:24,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-11-21 08:50:38,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1435600.0, ans=0.125 2023-11-21 08:50:39,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1435600.0, ans=0.125 2023-11-21 08:50:40,613 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215350 2023-11-21 08:50:42,931 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 10950, loss[loss=0.08177, simple_loss=0.09322, pruned_loss=0.02317, audio_tagging_loss=0.012, over 15747.00 frames. ], tot_loss[loss=0.0741, simple_loss=0.09534, pruned_loss=0.01677, audio_tagging_loss=0.009657, over 3039856.86 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:50:51,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1435666.6666666667, ans=0.125 2023-11-21 08:50:56,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1435733.3333333333, ans=0.125 2023-11-21 08:51:14,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1435800.0, ans=0.0 2023-11-21 08:51:44,470 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215400 2023-11-21 08:51:47,121 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11000, loss[loss=0.06555, simple_loss=0.08529, pruned_loss=0.01248, audio_tagging_loss=0.01043, over 16421.00 frames. ], tot_loss[loss=0.07446, simple_loss=0.09583, pruned_loss=0.0169, audio_tagging_loss=0.009648, over 3045985.16 frames. ], batch size: 61, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:51:56,387 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 08:52:07,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1436066.6666666667, ans=0.2 2023-11-21 08:52:11,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1436066.6666666667, ans=0.09899494936611666 2023-11-21 08:52:30,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 8.054e+01 8.831e+01 9.756e+01 2.281e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-21 08:52:50,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215450 2023-11-21 08:52:52,530 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11050, loss[loss=0.06882, simple_loss=0.08137, pruned_loss=0.0169, audio_tagging_loss=0.01124, over 15801.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09493, pruned_loss=0.0167, audio_tagging_loss=0.009813, over 3040936.73 frames. ], batch size: 62, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:52:55,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1436333.3333333333, ans=0.125 2023-11-21 08:53:20,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1436466.6666666667, ans=0.125 2023-11-21 08:53:25,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1436466.6666666667, ans=0.125 2023-11-21 08:53:29,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1436533.3333333333, ans=0.125 2023-11-21 08:53:40,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1436533.3333333333, ans=0.2 2023-11-21 08:53:42,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-21 08:53:45,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1436600.0, ans=0.2 2023-11-21 08:53:54,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215500 2023-11-21 08:53:54,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-11-21 08:53:56,843 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11100, loss[loss=0.1033, simple_loss=0.1373, pruned_loss=0.02651, audio_tagging_loss=0.00807, over 14517.00 frames. ], tot_loss[loss=0.07457, simple_loss=0.09526, pruned_loss=0.01696, audio_tagging_loss=0.009984, over 3041378.63 frames. ], batch size: 53, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:53:57,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1436666.6666666667, ans=0.1 2023-11-21 08:53:58,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1436666.6666666667, ans=0.125 2023-11-21 08:53:59,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1436666.6666666667, ans=0.2 2023-11-21 08:53:59,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1436666.6666666667, ans=0.125 2023-11-21 08:54:06,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1436666.6666666667, ans=0.035 2023-11-21 08:54:09,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1436733.3333333333, ans=0.0 2023-11-21 08:54:18,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1436733.3333333333, ans=0.1 2023-11-21 08:54:18,375 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 08:54:39,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.330e+01 8.983e+01 9.923e+01 1.426e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-21 08:54:43,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1436866.6666666667, ans=0.2 2023-11-21 08:54:55,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1436933.3333333333, ans=0.125 2023-11-21 08:54:58,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215550 2023-11-21 08:55:00,449 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11150, loss[loss=0.07041, simple_loss=0.0907, pruned_loss=0.01268, audio_tagging_loss=0.01238, over 15442.00 frames. ], tot_loss[loss=0.0745, simple_loss=0.09507, pruned_loss=0.01692, audio_tagging_loss=0.01005, over 3039942.60 frames. ], batch size: 57, lr: 3.77e-03, grad_scale: 16.0 2023-11-21 08:55:02,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1437000.0, ans=0.125 2023-11-21 08:55:22,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1437066.6666666667, ans=0.125 2023-11-21 08:55:36,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1437133.3333333333, ans=0.0 2023-11-21 08:55:40,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1437200.0, ans=0.1 2023-11-21 08:55:54,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1437266.6666666667, ans=0.1 2023-11-21 08:56:02,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215600 2023-11-21 08:56:06,622 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11200, loss[loss=0.06956, simple_loss=0.09121, pruned_loss=0.01264, audio_tagging_loss=0.01131, over 16515.00 frames. ], tot_loss[loss=0.07547, simple_loss=0.09674, pruned_loss=0.01712, audio_tagging_loss=0.009986, over 3045201.11 frames. ], batch size: 62, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 08:56:16,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-21 08:56:43,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1437533.3333333333, ans=0.125 2023-11-21 08:56:45,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-21 08:56:48,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.237e+01 8.761e+01 9.554e+01 1.513e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 08:57:02,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-21 08:57:08,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215650 2023-11-21 08:57:10,699 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11250, loss[loss=0.08916, simple_loss=0.09812, pruned_loss=0.0283, audio_tagging_loss=0.0118, over 16525.00 frames. ], tot_loss[loss=0.07554, simple_loss=0.09703, pruned_loss=0.01709, audio_tagging_loss=0.009936, over 3050636.14 frames. ], batch size: 63, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 08:57:15,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1437666.6666666667, ans=0.0 2023-11-21 08:57:26,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1437733.3333333333, ans=0.0 2023-11-21 08:57:35,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2023-11-21 08:58:00,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-11-21 08:58:01,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1437933.3333333333, ans=0.125 2023-11-21 08:58:12,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215700 2023-11-21 08:58:14,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1438000.0, ans=0.125 2023-11-21 08:58:14,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1438000.0, ans=0.125 2023-11-21 08:58:15,348 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11300, loss[loss=0.0687, simple_loss=0.09093, pruned_loss=0.0156, audio_tagging_loss=0.00764, over 15972.00 frames. ], tot_loss[loss=0.07537, simple_loss=0.09687, pruned_loss=0.01708, audio_tagging_loss=0.00985, over 3049780.06 frames. ], batch size: 60, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 08:58:40,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2023-11-21 08:58:46,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1438133.3333333333, ans=0.0 2023-11-21 08:58:57,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.881e+01 8.122e+01 8.636e+01 9.244e+01 1.479e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-21 08:58:58,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1438200.0, ans=0.1 2023-11-21 08:58:58,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1438200.0, ans=0.2 2023-11-21 08:59:14,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1438266.6666666667, ans=0.125 2023-11-21 08:59:17,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215750 2023-11-21 08:59:20,441 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11350, loss[loss=0.07399, simple_loss=0.09868, pruned_loss=0.01714, audio_tagging_loss=0.007513, over 16547.00 frames. ], tot_loss[loss=0.07547, simple_loss=0.09708, pruned_loss=0.01725, audio_tagging_loss=0.009674, over 3041078.31 frames. ], batch size: 60, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 08:59:24,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1438333.3333333333, ans=0.125 2023-11-21 08:59:29,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1438333.3333333333, ans=0.125 2023-11-21 08:59:34,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1438400.0, ans=0.05 2023-11-21 08:59:36,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1438400.0, ans=0.125 2023-11-21 08:59:58,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1438533.3333333333, ans=0.2 2023-11-21 09:00:03,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1438533.3333333333, ans=0.125 2023-11-21 09:00:05,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1438533.3333333333, ans=0.5 2023-11-21 09:00:06,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1438533.3333333333, ans=0.0 2023-11-21 09:00:17,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1438600.0, ans=0.125 2023-11-21 09:00:22,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215800 2023-11-21 09:00:23,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-21 09:00:24,848 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11400, loss[loss=0.1011, simple_loss=0.1259, pruned_loss=0.03076, audio_tagging_loss=0.0074, over 14600.00 frames. ], tot_loss[loss=0.07514, simple_loss=0.09659, pruned_loss=0.01725, audio_tagging_loss=0.009593, over 3034445.52 frames. ], batch size: 54, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 09:00:43,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1438733.3333333333, ans=0.05 2023-11-21 09:00:45,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1438733.3333333333, ans=0.125 2023-11-21 09:00:53,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-21 09:01:07,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.283e+01 9.027e+01 9.488e+01 1.297e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-21 09:01:26,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215850 2023-11-21 09:01:29,064 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11450, loss[loss=0.059, simple_loss=0.07655, pruned_loss=0.01135, audio_tagging_loss=0.009378, over 15292.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.09672, pruned_loss=0.01734, audio_tagging_loss=0.009489, over 3035327.93 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 09:01:45,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1439066.6666666667, ans=0.0 2023-11-21 09:02:32,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215900 2023-11-21 09:02:34,705 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11500, loss[loss=0.09759, simple_loss=0.1281, pruned_loss=0.02543, audio_tagging_loss=0.008089, over 14785.00 frames. ], tot_loss[loss=0.07476, simple_loss=0.0963, pruned_loss=0.01707, audio_tagging_loss=0.009545, over 3036169.82 frames. ], batch size: 53, lr: 3.77e-03, grad_scale: 32.0 2023-11-21 09:03:05,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1439466.6666666667, ans=0.125 2023-11-21 09:03:15,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1439533.3333333333, ans=0.0 2023-11-21 09:03:17,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 7.996e+01 8.548e+01 9.424e+01 1.295e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-21 09:03:25,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2023-11-21 09:03:37,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 215950 2023-11-21 09:03:40,088 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11550, loss[loss=0.07601, simple_loss=0.09894, pruned_loss=0.01856, audio_tagging_loss=0.007975, over 15506.00 frames. ], tot_loss[loss=0.07414, simple_loss=0.09551, pruned_loss=0.0169, audio_tagging_loss=0.009485, over 3041570.30 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:03:40,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2023-11-21 09:04:00,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1439733.3333333333, ans=0.2 2023-11-21 09:04:06,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1439800.0, ans=0.0 2023-11-21 09:04:18,430 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 09:04:19,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1439866.6666666667, ans=0.125 2023-11-21 09:04:33,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1439933.3333333333, ans=0.125 2023-11-21 09:04:42,038 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216000 2023-11-21 09:04:47,532 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11600, loss[loss=0.08336, simple_loss=0.1088, pruned_loss=0.02161, audio_tagging_loss=0.007346, over 16120.00 frames. ], tot_loss[loss=0.07438, simple_loss=0.09561, pruned_loss=0.0171, audio_tagging_loss=0.009479, over 3050792.42 frames. ], batch size: 60, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:04:55,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1440000.0, ans=0.0 2023-11-21 09:04:58,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1440000.0, ans=0.125 2023-11-21 09:04:59,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1440066.6666666667, ans=0.125 2023-11-21 09:05:01,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-11-21 09:05:29,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-21 09:05:29,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.268e+01 8.853e+01 9.480e+01 1.180e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-21 09:05:40,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-21 09:05:48,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216050 2023-11-21 09:05:51,734 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11650, loss[loss=0.08112, simple_loss=0.1155, pruned_loss=0.01657, audio_tagging_loss=0.006822, over 14721.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09535, pruned_loss=0.01701, audio_tagging_loss=0.009594, over 3049325.77 frames. ], batch size: 55, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:05:54,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1440333.3333333333, ans=0.125 2023-11-21 09:06:05,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1440400.0, ans=15.0 2023-11-21 09:06:16,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1440466.6666666667, ans=0.1 2023-11-21 09:06:17,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-21 09:06:41,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1440533.3333333333, ans=0.125 2023-11-21 09:06:48,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-21 09:06:54,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216100 2023-11-21 09:06:56,684 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11700, loss[loss=0.07671, simple_loss=0.09597, pruned_loss=0.01962, audio_tagging_loss=0.009105, over 15563.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09526, pruned_loss=0.01686, audio_tagging_loss=0.009633, over 3050277.35 frames. ], batch size: 58, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:07:03,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1440666.6666666667, ans=0.125 2023-11-21 09:07:25,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1440800.0, ans=0.125 2023-11-21 09:07:29,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1440800.0, ans=0.125 2023-11-21 09:07:38,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.159e+01 8.719e+01 9.346e+01 1.199e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 09:07:40,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1440866.6666666667, ans=0.125 2023-11-21 09:07:41,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1440866.6666666667, ans=0.0 2023-11-21 09:07:52,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1440933.3333333333, ans=0.0 2023-11-21 09:07:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1440933.3333333333, ans=0.1 2023-11-21 09:07:57,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216150 2023-11-21 09:07:59,715 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11750, loss[loss=0.0766, simple_loss=0.09425, pruned_loss=0.01716, audio_tagging_loss=0.01231, over 16047.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09558, pruned_loss=0.01691, audio_tagging_loss=0.009618, over 3058909.97 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:08:05,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-11-21 09:08:12,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-11-21 09:08:32,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1441133.3333333333, ans=0.125 2023-11-21 09:08:44,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1441200.0, ans=0.0 2023-11-21 09:09:01,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216200 2023-11-21 09:09:02,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1441333.3333333333, ans=0.2 2023-11-21 09:09:04,079 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11800, loss[loss=0.06913, simple_loss=0.09042, pruned_loss=0.01278, audio_tagging_loss=0.01114, over 16390.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09541, pruned_loss=0.01703, audio_tagging_loss=0.009685, over 3053192.78 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 16.0 2023-11-21 09:09:08,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1441333.3333333333, ans=0.0 2023-11-21 09:09:20,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1441400.0, ans=0.0 2023-11-21 09:09:24,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1441400.0, ans=0.125 2023-11-21 09:09:24,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1441400.0, ans=0.125 2023-11-21 09:09:32,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1441466.6666666667, ans=0.125 2023-11-21 09:09:42,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1441533.3333333333, ans=0.125 2023-11-21 09:09:48,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.548e+01 8.145e+01 8.665e+01 9.541e+01 1.453e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 09:10:05,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1441600.0, ans=0.125 2023-11-21 09:10:07,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216250 2023-11-21 09:10:10,310 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11850, loss[loss=0.07353, simple_loss=0.08764, pruned_loss=0.01829, audio_tagging_loss=0.01142, over 15613.00 frames. ], tot_loss[loss=0.07478, simple_loss=0.0961, pruned_loss=0.01701, audio_tagging_loss=0.009721, over 3052638.93 frames. ], batch size: 60, lr: 3.76e-03, grad_scale: 16.0 2023-11-21 09:10:10,552 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:10:12,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-21 09:10:30,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1441733.3333333333, ans=0.125 2023-11-21 09:10:40,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1441800.0, ans=0.1 2023-11-21 09:10:46,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1441866.6666666667, ans=0.0 2023-11-21 09:11:04,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-11-21 09:11:11,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216300 2023-11-21 09:11:14,257 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11900, loss[loss=0.06836, simple_loss=0.08363, pruned_loss=0.01466, audio_tagging_loss=0.01189, over 15689.00 frames. ], tot_loss[loss=0.07484, simple_loss=0.096, pruned_loss=0.01702, audio_tagging_loss=0.009825, over 3051666.47 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 16.0 2023-11-21 09:11:52,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1442200.0, ans=0.0 2023-11-21 09:11:55,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442200.0, ans=0.1 2023-11-21 09:11:58,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.187e+01 8.956e+01 9.688e+01 1.192e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-21 09:11:59,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1442200.0, ans=0.05 2023-11-21 09:12:08,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1442266.6666666667, ans=0.0 2023-11-21 09:12:12,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1442266.6666666667, ans=0.125 2023-11-21 09:12:15,551 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216350 2023-11-21 09:12:17,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1442333.3333333333, ans=0.2 2023-11-21 09:12:17,996 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 11950, loss[loss=0.0823, simple_loss=0.1057, pruned_loss=0.01979, audio_tagging_loss=0.009671, over 15774.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.09634, pruned_loss=0.01689, audio_tagging_loss=0.009884, over 3044245.80 frames. ], batch size: 60, lr: 3.76e-03, grad_scale: 16.0 2023-11-21 09:12:18,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442333.3333333333, ans=0.1 2023-11-21 09:12:22,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2023-11-21 09:12:23,736 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:12:29,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-21 09:12:34,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1442400.0, ans=0.0 2023-11-21 09:12:42,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1442400.0, ans=0.0 2023-11-21 09:12:57,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1442533.3333333333, ans=0.0 2023-11-21 09:12:59,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1442533.3333333333, ans=0.2 2023-11-21 09:13:02,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1442533.3333333333, ans=0.0 2023-11-21 09:13:07,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1442600.0, ans=0.2 2023-11-21 09:13:17,287 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216400 2023-11-21 09:13:19,964 INFO [train_asr.py:1221] (1/4) Epoch 18, batch 12000, loss[loss=0.1145, simple_loss=0.1357, pruned_loss=0.03653, audio_tagging_loss=0.01006, over 14854.00 frames. ], tot_loss[loss=0.07512, simple_loss=0.09683, pruned_loss=0.01688, audio_tagging_loss=0.009825, over 3046832.87 frames. ], batch size: 56, lr: 3.76e-03, grad_scale: 32.0 2023-11-21 09:13:19,965 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 09:14:01,460 INFO [train_asr.py:1253] (1/4) Epoch 18, validation: loss=0.06063, simple_loss=0.05245, pruned_loss=0.005328, audio_tagging_loss=0.02908, over 4681554.00 frames. 2023-11-21 09:14:01,460 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 09:15:03,517 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 0, loss[loss=0.08284, simple_loss=0.08627, pruned_loss=0.01371, audio_tagging_loss=0.02599, over 15318.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.08627, pruned_loss=0.01371, audio_tagging_loss=0.02599, over 15318.00 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:15:03,518 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 09:15:38,999 INFO [train_asr.py:1253] (1/4) Epoch 19, validation: loss=0.05975, simple_loss=0.05244, pruned_loss=0.005316, audio_tagging_loss=0.02822, over 4681554.00 frames. 2023-11-21 09:15:39,001 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 09:15:45,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1442820.0, ans=0.125 2023-11-21 09:15:52,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.140e+01 8.995e+01 9.841e+01 1.386e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-21 09:16:08,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1442953.3333333333, ans=0.125 2023-11-21 09:16:09,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442953.3333333333, ans=0.1 2023-11-21 09:16:11,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216450 2023-11-21 09:16:12,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1442953.3333333333, ans=0.1 2023-11-21 09:16:35,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1443086.6666666667, ans=0.125 2023-11-21 09:16:38,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1443086.6666666667, ans=0.035 2023-11-21 09:16:38,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1443086.6666666667, ans=0.0 2023-11-21 09:16:43,266 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 50, loss[loss=0.09419, simple_loss=0.1124, pruned_loss=0.02191, audio_tagging_loss=0.0161, over 15245.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1014, pruned_loss=0.01769, audio_tagging_loss=0.01834, over 688624.64 frames. ], batch size: 56, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:16:49,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=1443153.3333333333, ans=12.0 2023-11-21 09:17:04,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1443220.0, ans=0.5 2023-11-21 09:17:07,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=22.5 2023-11-21 09:17:07,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-21 09:17:08,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1443286.6666666667, ans=0.05 2023-11-21 09:17:09,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=15.0 2023-11-21 09:17:15,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216500 2023-11-21 09:17:20,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-21 09:17:29,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1443353.3333333333, ans=0.2 2023-11-21 09:17:39,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1443420.0, ans=0.125 2023-11-21 09:17:47,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1443486.6666666667, ans=0.0 2023-11-21 09:17:49,144 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 100, loss[loss=0.07873, simple_loss=0.09945, pruned_loss=0.01582, audio_tagging_loss=0.01319, over 15087.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.09753, pruned_loss=0.01684, audio_tagging_loss=0.0179, over 1200416.74 frames. ], batch size: 55, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:17:59,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1443486.6666666667, ans=0.2 2023-11-21 09:18:02,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.866e+01 9.575e+01 1.050e+02 1.381e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-21 09:18:15,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1443620.0, ans=0.1 2023-11-21 09:18:19,843 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216550 2023-11-21 09:18:21,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1443620.0, ans=0.2 2023-11-21 09:18:40,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-21 09:18:44,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-21 09:18:49,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1443753.3333333333, ans=0.0 2023-11-21 09:18:52,659 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 150, loss[loss=0.08305, simple_loss=0.1088, pruned_loss=0.01809, audio_tagging_loss=0.01056, over 16092.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.09802, pruned_loss=0.01687, audio_tagging_loss=0.01579, over 1612403.38 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:19:12,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1443886.6666666667, ans=0.125 2023-11-21 09:19:17,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1443953.3333333333, ans=0.2 2023-11-21 09:19:19,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1443953.3333333333, ans=0.025 2023-11-21 09:19:25,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216600 2023-11-21 09:19:32,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1444020.0, ans=0.125 2023-11-21 09:19:32,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1444020.0, ans=0.125 2023-11-21 09:19:33,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1444020.0, ans=0.125 2023-11-21 09:19:36,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1444020.0, ans=0.0 2023-11-21 09:19:39,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-21 09:19:41,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444020.0, ans=0.1 2023-11-21 09:19:47,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2023-11-21 09:19:51,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1444086.6666666667, ans=0.0 2023-11-21 09:19:56,864 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 200, loss[loss=0.06285, simple_loss=0.08051, pruned_loss=0.0118, audio_tagging_loss=0.01079, over 15247.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.09834, pruned_loss=0.01695, audio_tagging_loss=0.014, over 1933668.97 frames. ], batch size: 56, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:20:12,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.223e+01 8.596e+01 9.461e+01 1.153e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-21 09:20:26,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-21 09:20:29,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216650 2023-11-21 09:20:37,313 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:20:59,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444420.0, ans=0.1 2023-11-21 09:21:02,343 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 250, loss[loss=0.08391, simple_loss=0.1128, pruned_loss=0.02138, audio_tagging_loss=0.006104, over 15327.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.09864, pruned_loss=0.01721, audio_tagging_loss=0.01273, over 2178678.18 frames. ], batch size: 57, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:21:33,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216700 2023-11-21 09:21:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1444620.0, ans=0.1 2023-11-21 09:21:43,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1444686.6666666667, ans=0.0 2023-11-21 09:22:00,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-21 09:22:06,602 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 300, loss[loss=0.06676, simple_loss=0.07797, pruned_loss=0.01733, audio_tagging_loss=0.01045, over 14091.00 frames. ], tot_loss[loss=0.0778, simple_loss=0.0978, pruned_loss=0.01704, audio_tagging_loss=0.01187, over 2370769.55 frames. ], batch size: 56, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:22:16,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1444820.0, ans=0.1 2023-11-21 09:22:19,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.255e+01 8.343e+01 8.887e+01 9.711e+01 1.312e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 09:22:29,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-21 09:22:38,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216750 2023-11-21 09:22:40,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1444953.3333333333, ans=0.125 2023-11-21 09:22:47,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1445020.0, ans=0.0 2023-11-21 09:22:59,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-21 09:23:09,895 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 350, loss[loss=0.08041, simple_loss=0.1044, pruned_loss=0.02025, audio_tagging_loss=0.007945, over 15730.00 frames. ], tot_loss[loss=0.07688, simple_loss=0.09748, pruned_loss=0.01695, audio_tagging_loss=0.01118, over 2523063.25 frames. ], batch size: 60, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:23:24,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1445220.0, ans=0.05 2023-11-21 09:23:26,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1445220.0, ans=0.2 2023-11-21 09:23:43,000 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216800 2023-11-21 09:23:43,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1445286.6666666667, ans=6.0 2023-11-21 09:23:44,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1445286.6666666667, ans=0.07 2023-11-21 09:24:15,661 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 400, loss[loss=0.06217, simple_loss=0.08417, pruned_loss=0.01138, audio_tagging_loss=0.0087, over 14467.00 frames. ], tot_loss[loss=0.07635, simple_loss=0.09733, pruned_loss=0.01691, audio_tagging_loss=0.01078, over 2646592.06 frames. ], batch size: 54, lr: 3.66e-03, grad_scale: 32.0 2023-11-21 09:24:29,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.137e+01 8.063e+01 8.732e+01 9.531e+01 1.428e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 09:24:35,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1445553.3333333333, ans=0.2 2023-11-21 09:24:47,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216850 2023-11-21 09:24:57,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1445686.6666666667, ans=0.04949747468305833 2023-11-21 09:25:14,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-21 09:25:19,746 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 450, loss[loss=0.0841, simple_loss=0.1086, pruned_loss=0.0194, audio_tagging_loss=0.0104, over 15188.00 frames. ], tot_loss[loss=0.07591, simple_loss=0.09724, pruned_loss=0.01686, audio_tagging_loss=0.01042, over 2733717.54 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:25:22,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1445820.0, ans=0.125 2023-11-21 09:25:22,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1445820.0, ans=0.2 2023-11-21 09:25:24,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1445820.0, ans=0.125 2023-11-21 09:25:33,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1445886.6666666667, ans=0.0 2023-11-21 09:25:35,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1445886.6666666667, ans=0.1 2023-11-21 09:25:39,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1445886.6666666667, ans=0.2 2023-11-21 09:25:43,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1445886.6666666667, ans=0.125 2023-11-21 09:25:53,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216900 2023-11-21 09:25:59,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446020.0, ans=0.1 2023-11-21 09:26:05,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-21 09:26:14,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2023-11-21 09:26:24,332 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 500, loss[loss=0.07325, simple_loss=0.09718, pruned_loss=0.0156, audio_tagging_loss=0.009063, over 15417.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.09677, pruned_loss=0.01692, audio_tagging_loss=0.01024, over 2803198.86 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:26:41,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.147e+01 8.789e+01 9.630e+01 1.210e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 09:26:44,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446220.0, ans=0.1 2023-11-21 09:26:53,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1446286.6666666667, ans=0.0 2023-11-21 09:26:57,004 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 216950 2023-11-21 09:27:00,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1446286.6666666667, ans=0.0 2023-11-21 09:27:09,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1446353.3333333333, ans=0.0 2023-11-21 09:27:12,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1446353.3333333333, ans=0.125 2023-11-21 09:27:24,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-21 09:27:29,707 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 550, loss[loss=0.08675, simple_loss=0.1049, pruned_loss=0.02395, audio_tagging_loss=0.01032, over 14318.00 frames. ], tot_loss[loss=0.0755, simple_loss=0.09665, pruned_loss=0.01707, audio_tagging_loss=0.01011, over 2847265.41 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:27:56,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1446620.0, ans=0.025 2023-11-21 09:27:56,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-21 09:28:00,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217000 2023-11-21 09:28:33,557 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 600, loss[loss=0.07543, simple_loss=0.1018, pruned_loss=0.01681, audio_tagging_loss=0.007728, over 15427.00 frames. ], tot_loss[loss=0.07546, simple_loss=0.09711, pruned_loss=0.01696, audio_tagging_loss=0.009949, over 2902244.23 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:28:40,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-21 09:28:42,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1446820.0, ans=0.02 2023-11-21 09:28:49,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.367e+01 7.902e+01 8.372e+01 9.097e+01 1.242e+02, threshold=1.674e+02, percent-clipped=0.0 2023-11-21 09:28:51,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1446886.6666666667, ans=0.125 2023-11-21 09:28:56,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-11-21 09:28:59,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2023-11-21 09:29:07,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217050 2023-11-21 09:29:09,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1446953.3333333333, ans=0.07 2023-11-21 09:29:28,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1447086.6666666667, ans=0.0 2023-11-21 09:29:33,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1447086.6666666667, ans=0.125 2023-11-21 09:29:38,236 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 650, loss[loss=0.06461, simple_loss=0.07912, pruned_loss=0.01186, audio_tagging_loss=0.01319, over 15704.00 frames. ], tot_loss[loss=0.07458, simple_loss=0.09565, pruned_loss=0.01675, audio_tagging_loss=0.009997, over 2936462.68 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:29:47,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1447153.3333333333, ans=0.2 2023-11-21 09:30:03,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1447286.6666666667, ans=0.125 2023-11-21 09:30:10,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217100 2023-11-21 09:30:22,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1447353.3333333333, ans=0.125 2023-11-21 09:30:24,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1447353.3333333333, ans=0.125 2023-11-21 09:30:32,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1447420.0, ans=0.0 2023-11-21 09:30:43,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1447486.6666666667, ans=0.125 2023-11-21 09:30:44,263 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 700, loss[loss=0.09523, simple_loss=0.1325, pruned_loss=0.02251, audio_tagging_loss=0.006453, over 15378.00 frames. ], tot_loss[loss=0.07489, simple_loss=0.09611, pruned_loss=0.01695, audio_tagging_loss=0.009884, over 2962945.69 frames. ], batch size: 54, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:30:44,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:30:55,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447553.3333333333, ans=0.1 2023-11-21 09:30:58,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.746e+01 8.199e+01 8.920e+01 9.625e+01 1.204e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-21 09:31:01,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1447553.3333333333, ans=0.125 2023-11-21 09:31:13,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1447620.0, ans=0.025 2023-11-21 09:31:14,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217150 2023-11-21 09:31:22,978 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:31:26,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1447686.6666666667, ans=0.125 2023-11-21 09:31:30,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1447686.6666666667, ans=0.0 2023-11-21 09:31:46,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1447820.0, ans=0.5 2023-11-21 09:31:47,776 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 750, loss[loss=0.08817, simple_loss=0.1124, pruned_loss=0.02185, audio_tagging_loss=0.01013, over 15371.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09637, pruned_loss=0.01685, audio_tagging_loss=0.009798, over 2988856.63 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:32:02,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1447886.6666666667, ans=0.1 2023-11-21 09:32:20,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217200 2023-11-21 09:32:50,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-21 09:32:51,812 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 800, loss[loss=0.07084, simple_loss=0.08891, pruned_loss=0.01603, audio_tagging_loss=0.01035, over 14654.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09713, pruned_loss=0.01698, audio_tagging_loss=0.009853, over 2999256.85 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:32:59,921 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:33:01,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1448153.3333333333, ans=0.125 2023-11-21 09:33:08,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.219e+01 8.917e+01 9.925e+01 1.296e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-21 09:33:22,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2023-11-21 09:33:24,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217250 2023-11-21 09:33:28,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1448286.6666666667, ans=0.1 2023-11-21 09:33:57,381 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 850, loss[loss=0.08686, simple_loss=0.1022, pruned_loss=0.02285, audio_tagging_loss=0.0129, over 14547.00 frames. ], tot_loss[loss=0.0749, simple_loss=0.09643, pruned_loss=0.0168, audio_tagging_loss=0.009887, over 2998442.86 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:33:59,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1448486.6666666667, ans=0.0 2023-11-21 09:33:59,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-21 09:34:07,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.10 vs. limit=5.0 2023-11-21 09:34:11,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1448553.3333333333, ans=0.125 2023-11-21 09:34:28,590 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217300 2023-11-21 09:34:33,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1448686.6666666667, ans=0.2 2023-11-21 09:34:33,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1448686.6666666667, ans=0.125 2023-11-21 09:34:36,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1448686.6666666667, ans=0.125 2023-11-21 09:34:39,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1448686.6666666667, ans=0.0 2023-11-21 09:34:42,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2023-11-21 09:34:50,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-11-21 09:34:52,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1448753.3333333333, ans=0.2 2023-11-21 09:34:52,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1448753.3333333333, ans=0.1 2023-11-21 09:34:53,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1448753.3333333333, ans=0.125 2023-11-21 09:35:01,270 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 900, loss[loss=0.06326, simple_loss=0.0845, pruned_loss=0.01243, audio_tagging_loss=0.008588, over 15641.00 frames. ], tot_loss[loss=0.07463, simple_loss=0.09596, pruned_loss=0.01666, audio_tagging_loss=0.009988, over 3011346.89 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:35:15,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.060e+01 7.979e+01 8.801e+01 9.598e+01 1.201e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 09:35:33,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217350 2023-11-21 09:35:41,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1449020.0, ans=0.1 2023-11-21 09:35:52,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1449086.6666666667, ans=0.0 2023-11-21 09:36:04,292 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 950, loss[loss=0.06666, simple_loss=0.08815, pruned_loss=0.01419, audio_tagging_loss=0.0084, over 14951.00 frames. ], tot_loss[loss=0.07506, simple_loss=0.09703, pruned_loss=0.0167, audio_tagging_loss=0.009847, over 3021139.29 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:36:09,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1449153.3333333333, ans=0.0 2023-11-21 09:36:17,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1449220.0, ans=0.04949747468305833 2023-11-21 09:36:17,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-21 09:36:36,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217400 2023-11-21 09:36:40,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1449286.6666666667, ans=0.2 2023-11-21 09:36:57,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1449420.0, ans=0.125 2023-11-21 09:37:08,046 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1000, loss[loss=0.06826, simple_loss=0.08467, pruned_loss=0.01643, audio_tagging_loss=0.009494, over 16185.00 frames. ], tot_loss[loss=0.07442, simple_loss=0.0963, pruned_loss=0.01655, audio_tagging_loss=0.009721, over 3019858.87 frames. ], batch size: 63, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:37:24,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1449553.3333333333, ans=0.0 2023-11-21 09:37:25,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.132e+01 7.890e+01 8.682e+01 9.562e+01 1.074e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 09:37:25,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1449553.3333333333, ans=0.0 2023-11-21 09:37:32,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-11-21 09:37:32,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1449620.0, ans=0.125 2023-11-21 09:37:35,084 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 09:37:39,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217450 2023-11-21 09:38:01,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1449753.3333333333, ans=0.125 2023-11-21 09:38:08,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1449753.3333333333, ans=0.125 2023-11-21 09:38:12,093 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1050, loss[loss=0.06561, simple_loss=0.07588, pruned_loss=0.01662, audio_tagging_loss=0.01105, over 14449.00 frames. ], tot_loss[loss=0.07365, simple_loss=0.09517, pruned_loss=0.01637, audio_tagging_loss=0.009695, over 3015103.78 frames. ], batch size: 54, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:38:28,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1449886.6666666667, ans=0.0 2023-11-21 09:38:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1449886.6666666667, ans=0.125 2023-11-21 09:38:40,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1449953.3333333333, ans=0.125 2023-11-21 09:38:43,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217500 2023-11-21 09:38:58,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1450020.0, ans=0.125 2023-11-21 09:39:14,695 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1100, loss[loss=0.08374, simple_loss=0.1092, pruned_loss=0.01855, audio_tagging_loss=0.01059, over 15438.00 frames. ], tot_loss[loss=0.07379, simple_loss=0.09527, pruned_loss=0.01655, audio_tagging_loss=0.009607, over 3023530.17 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:39:18,231 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 09:39:28,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1450220.0, ans=0.125 2023-11-21 09:39:31,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 8.109e+01 8.788e+01 9.339e+01 1.148e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 09:39:47,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217550 2023-11-21 09:39:59,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1450353.3333333333, ans=0.125 2023-11-21 09:40:18,392 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1150, loss[loss=0.08095, simple_loss=0.1094, pruned_loss=0.01969, audio_tagging_loss=0.006566, over 15081.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09701, pruned_loss=0.01701, audio_tagging_loss=0.009564, over 3026355.47 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:40:22,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1450486.6666666667, ans=0.2 2023-11-21 09:40:36,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1450553.3333333333, ans=0.035 2023-11-21 09:40:39,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1450553.3333333333, ans=0.0 2023-11-21 09:40:48,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1450620.0, ans=0.0 2023-11-21 09:40:51,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217600 2023-11-21 09:41:22,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1450753.3333333333, ans=0.125 2023-11-21 09:41:24,283 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1200, loss[loss=0.0795, simple_loss=0.1071, pruned_loss=0.01875, audio_tagging_loss=0.007194, over 15145.00 frames. ], tot_loss[loss=0.07487, simple_loss=0.09696, pruned_loss=0.01692, audio_tagging_loss=0.009465, over 3032289.45 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:41:40,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.126e+01 8.918e+01 9.759e+01 1.159e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-21 09:41:41,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.73 vs. limit=10.0 2023-11-21 09:41:55,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217650 2023-11-21 09:42:12,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1451020.0, ans=0.0 2023-11-21 09:42:18,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1451086.6666666667, ans=0.125 2023-11-21 09:42:28,266 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1250, loss[loss=0.07777, simple_loss=0.1137, pruned_loss=0.01274, audio_tagging_loss=0.008156, over 15714.00 frames. ], tot_loss[loss=0.07485, simple_loss=0.09692, pruned_loss=0.01692, audio_tagging_loss=0.009462, over 3038268.71 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:43:00,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217700 2023-11-21 09:43:09,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1451353.3333333333, ans=0.0 2023-11-21 09:43:12,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1451353.3333333333, ans=0.1 2023-11-21 09:43:14,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1451353.3333333333, ans=0.125 2023-11-21 09:43:31,364 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1300, loss[loss=0.07853, simple_loss=0.1014, pruned_loss=0.01992, audio_tagging_loss=0.007902, over 14113.00 frames. ], tot_loss[loss=0.07491, simple_loss=0.09701, pruned_loss=0.01698, audio_tagging_loss=0.009426, over 3035802.12 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:43:48,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 8.140e+01 8.673e+01 9.454e+01 1.388e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 09:43:54,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1451553.3333333333, ans=0.125 2023-11-21 09:44:03,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217750 2023-11-21 09:44:35,294 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1350, loss[loss=0.07865, simple_loss=0.1092, pruned_loss=0.01621, audio_tagging_loss=0.007839, over 15162.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.09725, pruned_loss=0.01711, audio_tagging_loss=0.009448, over 3039464.95 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:44:41,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-21 09:44:48,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1451886.6666666667, ans=0.1 2023-11-21 09:45:06,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217800 2023-11-21 09:45:21,524 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 09:45:28,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-21 09:45:36,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1452086.6666666667, ans=0.0 2023-11-21 09:45:38,987 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1400, loss[loss=0.06531, simple_loss=0.07954, pruned_loss=0.01678, audio_tagging_loss=0.008754, over 15144.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09735, pruned_loss=0.01715, audio_tagging_loss=0.009574, over 3047598.56 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:45:56,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.285e+01 9.066e+01 9.966e+01 1.256e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-21 09:46:05,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2023-11-21 09:46:08,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1452286.6666666667, ans=0.125 2023-11-21 09:46:10,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217850 2023-11-21 09:46:15,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1452353.3333333333, ans=0.125 2023-11-21 09:46:22,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1452353.3333333333, ans=0.0 2023-11-21 09:46:39,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1452420.0, ans=0.0 2023-11-21 09:46:41,622 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1450, loss[loss=0.08417, simple_loss=0.1091, pruned_loss=0.01939, audio_tagging_loss=0.01021, over 15993.00 frames. ], tot_loss[loss=0.07526, simple_loss=0.09721, pruned_loss=0.0171, audio_tagging_loss=0.009551, over 3052370.73 frames. ], batch size: 60, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:46:44,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-21 09:46:54,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1452553.3333333333, ans=0.2 2023-11-21 09:46:55,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452553.3333333333, ans=0.125 2023-11-21 09:47:07,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1452620.0, ans=0.2 2023-11-21 09:47:14,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217900 2023-11-21 09:47:46,026 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1500, loss[loss=0.06923, simple_loss=0.08498, pruned_loss=0.01606, audio_tagging_loss=0.01068, over 14088.00 frames. ], tot_loss[loss=0.07519, simple_loss=0.09705, pruned_loss=0.01701, audio_tagging_loss=0.009655, over 3050109.53 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:47:48,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1452820.0, ans=10.0 2023-11-21 09:48:03,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.762e+01 8.061e+01 8.622e+01 9.502e+01 1.412e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-21 09:48:17,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 217950 2023-11-21 09:48:48,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.43 vs. limit=15.0 2023-11-21 09:48:49,045 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1550, loss[loss=0.08302, simple_loss=0.1073, pruned_loss=0.0206, audio_tagging_loss=0.008754, over 14534.00 frames. ], tot_loss[loss=0.07446, simple_loss=0.09581, pruned_loss=0.01675, audio_tagging_loss=0.009802, over 3050742.61 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 16.0 2023-11-21 09:48:58,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1453153.3333333333, ans=0.125 2023-11-21 09:49:04,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1453220.0, ans=0.0 2023-11-21 09:49:04,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1453220.0, ans=0.125 2023-11-21 09:49:17,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-21 09:49:21,845 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218000 2023-11-21 09:49:35,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1453353.3333333333, ans=0.0 2023-11-21 09:49:46,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1453420.0, ans=0.125 2023-11-21 09:49:50,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1453420.0, ans=0.125 2023-11-21 09:49:53,169 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1600, loss[loss=0.06827, simple_loss=0.08768, pruned_loss=0.013, audio_tagging_loss=0.01143, over 15184.00 frames. ], tot_loss[loss=0.07504, simple_loss=0.09635, pruned_loss=0.01689, audio_tagging_loss=0.009973, over 3044873.26 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2023-11-21 09:50:11,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.287e+01 9.004e+01 9.532e+01 2.614e+02, threshold=1.801e+02, percent-clipped=1.0 2023-11-21 09:50:11,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1453553.3333333333, ans=0.0 2023-11-21 09:50:24,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1453620.0, ans=0.035 2023-11-21 09:50:25,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218050 2023-11-21 09:50:32,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1453686.6666666667, ans=0.95 2023-11-21 09:50:32,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1453686.6666666667, ans=0.125 2023-11-21 09:50:33,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1453686.6666666667, ans=0.125 2023-11-21 09:50:49,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1453753.3333333333, ans=0.125 2023-11-21 09:50:57,624 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1650, loss[loss=0.07875, simple_loss=0.09157, pruned_loss=0.02016, audio_tagging_loss=0.01281, over 14482.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09649, pruned_loss=0.01696, audio_tagging_loss=0.009953, over 3043082.21 frames. ], batch size: 54, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 09:50:57,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1453820.0, ans=0.2 2023-11-21 09:50:58,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1453820.0, ans=0.125 2023-11-21 09:51:15,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-21 09:51:21,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453953.3333333333, ans=0.1 2023-11-21 09:51:23,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1453953.3333333333, ans=0.1 2023-11-21 09:51:24,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1453953.3333333333, ans=0.2 2023-11-21 09:51:29,052 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218100 2023-11-21 09:51:50,604 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 09:52:01,485 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1700, loss[loss=0.06573, simple_loss=0.07651, pruned_loss=0.0158, audio_tagging_loss=0.01167, over 13738.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09547, pruned_loss=0.01666, audio_tagging_loss=0.009988, over 3044712.21 frames. ], batch size: 57, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 09:52:11,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1454153.3333333333, ans=0.2 2023-11-21 09:52:15,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-21 09:52:19,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.104e+01 8.594e+01 9.360e+01 1.130e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-21 09:52:34,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218150 2023-11-21 09:52:49,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1454353.3333333333, ans=0.2 2023-11-21 09:53:05,407 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1750, loss[loss=0.05803, simple_loss=0.07729, pruned_loss=0.01044, audio_tagging_loss=0.008952, over 14876.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09588, pruned_loss=0.01667, audio_tagging_loss=0.00982, over 3047050.64 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 09:53:08,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1454486.6666666667, ans=0.2 2023-11-21 09:53:09,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1454486.6666666667, ans=0.125 2023-11-21 09:53:31,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1454620.0, ans=0.0 2023-11-21 09:53:37,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218200 2023-11-21 09:53:55,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1454753.3333333333, ans=0.125 2023-11-21 09:54:04,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1454753.3333333333, ans=0.09899494936611666 2023-11-21 09:54:09,248 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1800, loss[loss=0.0826, simple_loss=0.1093, pruned_loss=0.01983, audio_tagging_loss=0.00813, over 15402.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09611, pruned_loss=0.01659, audio_tagging_loss=0.009743, over 3046518.64 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 09:54:28,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 7.963e+01 8.409e+01 9.319e+01 2.355e+02, threshold=1.682e+02, percent-clipped=0.0 2023-11-21 09:54:29,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-11-21 09:54:30,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1454886.6666666667, ans=0.125 2023-11-21 09:54:41,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218250 2023-11-21 09:55:10,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-21 09:55:13,321 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1850, loss[loss=0.08708, simple_loss=0.1157, pruned_loss=0.02068, audio_tagging_loss=0.008558, over 15576.00 frames. ], tot_loss[loss=0.07479, simple_loss=0.09675, pruned_loss=0.01683, audio_tagging_loss=0.009588, over 3047001.25 frames. ], batch size: 54, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 09:55:24,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1455220.0, ans=0.2 2023-11-21 09:55:41,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1455286.6666666667, ans=0.125 2023-11-21 09:55:45,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218300 2023-11-21 09:55:52,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1455353.3333333333, ans=0.1 2023-11-21 09:56:11,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2023-11-21 09:56:16,174 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1900, loss[loss=0.05629, simple_loss=0.07259, pruned_loss=0.01133, audio_tagging_loss=0.008662, over 14274.00 frames. ], tot_loss[loss=0.07458, simple_loss=0.09625, pruned_loss=0.01684, audio_tagging_loss=0.009611, over 3035037.68 frames. ], batch size: 55, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 09:56:16,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1455486.6666666667, ans=15.0 2023-11-21 09:56:35,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.400e+01 8.910e+01 9.667e+01 1.226e+02, threshold=1.782e+02, percent-clipped=1.0 2023-11-21 09:56:38,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1455553.3333333333, ans=0.2 2023-11-21 09:56:45,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1455620.0, ans=0.125 2023-11-21 09:56:48,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218350 2023-11-21 09:56:53,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1455686.6666666667, ans=0.125 2023-11-21 09:56:58,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455686.6666666667, ans=0.1 2023-11-21 09:57:08,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1455753.3333333333, ans=0.0 2023-11-21 09:57:10,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1455753.3333333333, ans=0.125 2023-11-21 09:57:20,916 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 1950, loss[loss=0.1096, simple_loss=0.145, pruned_loss=0.02895, audio_tagging_loss=0.008128, over 15024.00 frames. ], tot_loss[loss=0.07445, simple_loss=0.0961, pruned_loss=0.01683, audio_tagging_loss=0.009573, over 3031345.97 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 09:57:41,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1455886.6666666667, ans=0.2 2023-11-21 09:57:52,280 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218400 2023-11-21 09:57:53,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1455953.3333333333, ans=0.0 2023-11-21 09:58:11,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=22.5 2023-11-21 09:58:14,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-21 09:58:18,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-21 09:58:25,237 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2000, loss[loss=0.0691, simple_loss=0.08881, pruned_loss=0.01557, audio_tagging_loss=0.009128, over 16234.00 frames. ], tot_loss[loss=0.07396, simple_loss=0.09543, pruned_loss=0.01659, audio_tagging_loss=0.009654, over 3039745.03 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 09:58:43,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.117e+01 8.612e+01 9.115e+01 1.173e+02, threshold=1.722e+02, percent-clipped=0.0 2023-11-21 09:58:43,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1456220.0, ans=0.125 2023-11-21 09:58:57,079 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218450 2023-11-21 09:59:06,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1456353.3333333333, ans=0.125 2023-11-21 09:59:17,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1456420.0, ans=0.1 2023-11-21 09:59:22,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1456420.0, ans=0.125 2023-11-21 09:59:24,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-21 09:59:24,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1456420.0, ans=0.125 2023-11-21 09:59:25,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1456420.0, ans=0.0 2023-11-21 09:59:28,346 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2050, loss[loss=0.08865, simple_loss=0.1121, pruned_loss=0.02037, audio_tagging_loss=0.01222, over 15180.00 frames. ], tot_loss[loss=0.07457, simple_loss=0.09655, pruned_loss=0.01677, audio_tagging_loss=0.009523, over 3043380.29 frames. ], batch size: 57, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 09:59:31,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1456486.6666666667, ans=0.2 2023-11-21 09:59:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1456486.6666666667, ans=0.1 2023-11-21 09:59:46,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.30 vs. limit=22.5 2023-11-21 09:59:52,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1456553.3333333333, ans=0.0 2023-11-21 09:59:53,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1456620.0, ans=0.0 2023-11-21 09:59:58,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-21 10:00:00,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218500 2023-11-21 10:00:09,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1456686.6666666667, ans=0.125 2023-11-21 10:00:18,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1456753.3333333333, ans=0.0 2023-11-21 10:00:21,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1456753.3333333333, ans=0.125 2023-11-21 10:00:31,905 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2100, loss[loss=0.06193, simple_loss=0.08059, pruned_loss=0.01238, audio_tagging_loss=0.00925, over 14559.00 frames. ], tot_loss[loss=0.07445, simple_loss=0.09641, pruned_loss=0.01666, audio_tagging_loss=0.009579, over 3045203.55 frames. ], batch size: 55, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:00:48,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1456886.6666666667, ans=0.125 2023-11-21 10:00:52,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.577e+01 8.022e+01 8.596e+01 9.289e+01 1.125e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-21 10:01:04,077 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218550 2023-11-21 10:01:04,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1456953.3333333333, ans=0.1 2023-11-21 10:01:22,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1457086.6666666667, ans=0.0 2023-11-21 10:01:25,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1457086.6666666667, ans=0.125 2023-11-21 10:01:29,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1457086.6666666667, ans=0.2 2023-11-21 10:01:31,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1457086.6666666667, ans=0.2 2023-11-21 10:01:36,276 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2150, loss[loss=0.06135, simple_loss=0.08434, pruned_loss=0.008935, audio_tagging_loss=0.01025, over 13750.00 frames. ], tot_loss[loss=0.07537, simple_loss=0.09766, pruned_loss=0.01704, audio_tagging_loss=0.009502, over 3046847.79 frames. ], batch size: 54, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:01:52,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1457220.0, ans=0.2 2023-11-21 10:01:56,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1457220.0, ans=0.0 2023-11-21 10:01:57,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1457220.0, ans=0.09899494936611666 2023-11-21 10:02:01,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1457286.6666666667, ans=0.1 2023-11-21 10:02:06,991 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218600 2023-11-21 10:02:13,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1457353.3333333333, ans=0.5 2023-11-21 10:02:14,336 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:02:20,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-11-21 10:02:35,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1457420.0, ans=0.0 2023-11-21 10:02:39,315 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2200, loss[loss=0.05665, simple_loss=0.06602, pruned_loss=0.01275, audio_tagging_loss=0.01089, over 14920.00 frames. ], tot_loss[loss=0.07448, simple_loss=0.09628, pruned_loss=0.01672, audio_tagging_loss=0.009614, over 3046757.97 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:02:42,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1457486.6666666667, ans=0.125 2023-11-21 10:02:50,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=22.5 2023-11-21 10:02:59,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.151e+01 8.877e+01 9.686e+01 1.220e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 10:03:02,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1457553.3333333333, ans=0.125 2023-11-21 10:03:12,507 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218650 2023-11-21 10:03:27,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-21 10:03:43,230 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2250, loss[loss=0.06049, simple_loss=0.07784, pruned_loss=0.01354, audio_tagging_loss=0.008033, over 15891.00 frames. ], tot_loss[loss=0.0746, simple_loss=0.09623, pruned_loss=0.01684, audio_tagging_loss=0.009648, over 3041171.34 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:03:46,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1457820.0, ans=0.1 2023-11-21 10:04:16,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218700 2023-11-21 10:04:28,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1458020.0, ans=0.0 2023-11-21 10:04:28,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1458020.0, ans=0.04949747468305833 2023-11-21 10:04:49,110 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2300, loss[loss=0.05052, simple_loss=0.06024, pruned_loss=0.007542, audio_tagging_loss=0.01286, over 16237.00 frames. ], tot_loss[loss=0.07407, simple_loss=0.09544, pruned_loss=0.01662, audio_tagging_loss=0.009733, over 3044985.06 frames. ], batch size: 64, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:04:53,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2023-11-21 10:05:02,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1458220.0, ans=0.2 2023-11-21 10:05:09,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1458220.0, ans=0.125 2023-11-21 10:05:10,244 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.131e+01 8.990e+01 9.981e+01 1.492e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-21 10:05:20,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218750 2023-11-21 10:05:20,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-11-21 10:05:32,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-21 10:05:39,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-11-21 10:05:45,473 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:05:50,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2023-11-21 10:05:52,957 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2350, loss[loss=0.08283, simple_loss=0.1055, pruned_loss=0.02167, audio_tagging_loss=0.008397, over 15103.00 frames. ], tot_loss[loss=0.07464, simple_loss=0.09605, pruned_loss=0.01696, audio_tagging_loss=0.009657, over 3036841.95 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 8.0 2023-11-21 10:06:02,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-21 10:06:09,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1458553.3333333333, ans=0.0 2023-11-21 10:06:26,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218800 2023-11-21 10:06:41,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1458686.6666666667, ans=0.2 2023-11-21 10:06:43,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1458686.6666666667, ans=0.125 2023-11-21 10:06:49,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1458753.3333333333, ans=0.0 2023-11-21 10:06:57,866 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2400, loss[loss=0.07537, simple_loss=0.09818, pruned_loss=0.0157, audio_tagging_loss=0.01058, over 15502.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09671, pruned_loss=0.01705, audio_tagging_loss=0.009746, over 3040338.29 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:07:20,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 7.990e+01 8.660e+01 9.489e+01 1.183e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 10:07:30,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218850 2023-11-21 10:07:39,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1459020.0, ans=0.0 2023-11-21 10:08:03,731 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2450, loss[loss=0.083, simple_loss=0.1082, pruned_loss=0.01852, audio_tagging_loss=0.01036, over 15997.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.096, pruned_loss=0.01686, audio_tagging_loss=0.009842, over 3033137.84 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:08:14,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1459153.3333333333, ans=0.0 2023-11-21 10:08:35,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218900 2023-11-21 10:08:36,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-21 10:08:39,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1459286.6666666667, ans=0.125 2023-11-21 10:09:08,363 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2500, loss[loss=0.08607, simple_loss=0.1179, pruned_loss=0.02069, audio_tagging_loss=0.006411, over 14877.00 frames. ], tot_loss[loss=0.07431, simple_loss=0.09561, pruned_loss=0.01658, audio_tagging_loss=0.009925, over 3034917.02 frames. ], batch size: 53, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:09:29,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.472e+01 8.078e+01 8.779e+01 9.710e+01 1.406e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-21 10:09:33,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1459620.0, ans=0.125 2023-11-21 10:09:34,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1459620.0, ans=0.125 2023-11-21 10:09:41,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 218950 2023-11-21 10:09:45,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1459620.0, ans=0.125 2023-11-21 10:09:53,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1459686.6666666667, ans=0.09899494936611666 2023-11-21 10:10:11,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1459820.0, ans=0.05 2023-11-21 10:10:12,687 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2550, loss[loss=0.06225, simple_loss=0.07938, pruned_loss=0.01511, audio_tagging_loss=0.00745, over 14211.00 frames. ], tot_loss[loss=0.07416, simple_loss=0.09523, pruned_loss=0.01667, audio_tagging_loss=0.009871, over 3035296.64 frames. ], batch size: 55, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:10:45,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219000 2023-11-21 10:10:51,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1460020.0, ans=0.07 2023-11-21 10:11:17,685 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2600, loss[loss=0.07395, simple_loss=0.08984, pruned_loss=0.02023, audio_tagging_loss=0.008798, over 15100.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.09493, pruned_loss=0.01673, audio_tagging_loss=0.009739, over 3036617.42 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:11:32,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1460220.0, ans=0.125 2023-11-21 10:11:35,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1460220.0, ans=0.1 2023-11-21 10:11:39,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.043e+01 8.700e+01 9.465e+01 1.335e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 10:11:48,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1460286.6666666667, ans=0.125 2023-11-21 10:11:49,696 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219050 2023-11-21 10:11:55,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1460353.3333333333, ans=0.07 2023-11-21 10:12:22,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1460486.6666666667, ans=0.0 2023-11-21 10:12:22,988 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2650, loss[loss=0.06443, simple_loss=0.08496, pruned_loss=0.01355, audio_tagging_loss=0.008403, over 15857.00 frames. ], tot_loss[loss=0.07371, simple_loss=0.09497, pruned_loss=0.01664, audio_tagging_loss=0.00959, over 3041035.55 frames. ], batch size: 59, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:12:55,061 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219100 2023-11-21 10:12:59,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1460620.0, ans=0.125 2023-11-21 10:13:01,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1460686.6666666667, ans=0.95 2023-11-21 10:13:04,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1460686.6666666667, ans=0.0 2023-11-21 10:13:26,506 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2700, loss[loss=0.08344, simple_loss=0.11, pruned_loss=0.02026, audio_tagging_loss=0.0082, over 15867.00 frames. ], tot_loss[loss=0.07399, simple_loss=0.09557, pruned_loss=0.01676, audio_tagging_loss=0.00945, over 3052695.59 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:13:44,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2023-11-21 10:13:48,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.184e+01 8.729e+01 9.389e+01 1.078e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 10:13:49,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-21 10:13:55,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1460953.3333333333, ans=0.04949747468305833 2023-11-21 10:13:58,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219150 2023-11-21 10:14:31,249 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2750, loss[loss=0.09125, simple_loss=0.1098, pruned_loss=0.0251, audio_tagging_loss=0.01123, over 14162.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09642, pruned_loss=0.01713, audio_tagging_loss=0.009461, over 3045334.41 frames. ], batch size: 55, lr: 3.64e-03, grad_scale: 16.0 2023-11-21 10:14:39,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-11-21 10:15:02,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219200 2023-11-21 10:15:26,984 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:15:27,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1461420.0, ans=0.025 2023-11-21 10:15:35,657 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2800, loss[loss=0.07511, simple_loss=0.09764, pruned_loss=0.01656, audio_tagging_loss=0.009733, over 14231.00 frames. ], tot_loss[loss=0.07461, simple_loss=0.09595, pruned_loss=0.01716, audio_tagging_loss=0.009471, over 3045620.12 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 32.0 2023-11-21 10:15:42,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1461486.6666666667, ans=0.125 2023-11-21 10:15:47,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:15:56,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.792e+01 8.334e+01 9.153e+01 1.021e+02 1.703e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-21 10:16:07,016 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:16:08,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219250 2023-11-21 10:16:18,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1461686.6666666667, ans=0.125 2023-11-21 10:16:37,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1461753.3333333333, ans=0.125 2023-11-21 10:16:39,938 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2850, loss[loss=0.07548, simple_loss=0.09576, pruned_loss=0.0197, audio_tagging_loss=0.007902, over 15541.00 frames. ], tot_loss[loss=0.07423, simple_loss=0.09563, pruned_loss=0.01699, audio_tagging_loss=0.009419, over 3048619.88 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:16:55,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1461886.6666666667, ans=0.1 2023-11-21 10:16:58,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2023-11-21 10:17:12,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219300 2023-11-21 10:17:12,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1461953.3333333333, ans=0.035 2023-11-21 10:17:33,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1462086.6666666667, ans=0.0 2023-11-21 10:17:38,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1462086.6666666667, ans=0.2 2023-11-21 10:17:45,351 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2900, loss[loss=0.07939, simple_loss=0.1124, pruned_loss=0.01423, audio_tagging_loss=0.008958, over 14873.00 frames. ], tot_loss[loss=0.075, simple_loss=0.09663, pruned_loss=0.01722, audio_tagging_loss=0.009466, over 3048804.19 frames. ], batch size: 54, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:17:48,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1462153.3333333333, ans=0.95 2023-11-21 10:17:59,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1462220.0, ans=0.125 2023-11-21 10:18:03,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-21 10:18:08,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.282e+01 8.695e+01 9.389e+01 1.119e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 10:18:17,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219350 2023-11-21 10:18:22,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2023-11-21 10:18:25,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1462353.3333333333, ans=0.1 2023-11-21 10:18:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1462353.3333333333, ans=0.0 2023-11-21 10:18:39,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1462420.0, ans=0.0 2023-11-21 10:18:49,475 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 2950, loss[loss=0.07722, simple_loss=0.09577, pruned_loss=0.01995, audio_tagging_loss=0.009389, over 14439.00 frames. ], tot_loss[loss=0.07554, simple_loss=0.09746, pruned_loss=0.01733, audio_tagging_loss=0.009476, over 3043847.74 frames. ], batch size: 54, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:19:17,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1462620.0, ans=0.125 2023-11-21 10:19:21,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219400 2023-11-21 10:19:47,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1462753.3333333333, ans=0.0 2023-11-21 10:19:47,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1462753.3333333333, ans=0.125 2023-11-21 10:19:49,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1462753.3333333333, ans=0.125 2023-11-21 10:19:53,253 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3000, loss[loss=0.08278, simple_loss=0.1081, pruned_loss=0.02008, audio_tagging_loss=0.008636, over 16527.00 frames. ], tot_loss[loss=0.07599, simple_loss=0.09796, pruned_loss=0.01745, audio_tagging_loss=0.009562, over 3045062.73 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:19:53,253 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 10:20:15,790 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9424, 3.7768, 4.8269, 4.4388], device='cuda:1') 2023-11-21 10:20:32,674 INFO [train_asr.py:1253] (1/4) Epoch 19, validation: loss=0.05961, simple_loss=0.05235, pruned_loss=0.005214, audio_tagging_loss=0.02822, over 4681554.00 frames. 2023-11-21 10:20:32,675 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 10:20:44,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1462886.6666666667, ans=0.0 2023-11-21 10:20:44,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1462886.6666666667, ans=0.125 2023-11-21 10:20:52,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2023-11-21 10:20:55,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.194e+01 8.910e+01 9.878e+01 1.451e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-21 10:21:03,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219450 2023-11-21 10:21:05,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1462953.3333333333, ans=0.0 2023-11-21 10:21:19,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463020.0, ans=0.1 2023-11-21 10:21:28,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463086.6666666667, ans=0.1 2023-11-21 10:21:33,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-21 10:21:36,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1463153.3333333333, ans=0.2 2023-11-21 10:21:37,033 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3050, loss[loss=0.07469, simple_loss=0.09427, pruned_loss=0.01744, audio_tagging_loss=0.01012, over 16412.00 frames. ], tot_loss[loss=0.07526, simple_loss=0.09662, pruned_loss=0.01733, audio_tagging_loss=0.009625, over 3046721.74 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:21:42,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-21 10:21:44,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1463153.3333333333, ans=0.0 2023-11-21 10:21:49,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1463220.0, ans=0.125 2023-11-21 10:22:02,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1463286.6666666667, ans=0.125 2023-11-21 10:22:09,622 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219500 2023-11-21 10:22:12,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1463286.6666666667, ans=0.0 2023-11-21 10:22:15,124 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:22:21,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-21 10:22:29,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1463420.0, ans=0.125 2023-11-21 10:22:40,800 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3100, loss[loss=0.07644, simple_loss=0.09578, pruned_loss=0.01784, audio_tagging_loss=0.01071, over 15714.00 frames. ], tot_loss[loss=0.07564, simple_loss=0.09715, pruned_loss=0.01737, audio_tagging_loss=0.00969, over 3052669.97 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:23:03,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1463553.3333333333, ans=0.125 2023-11-21 10:23:04,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1463553.3333333333, ans=0.125 2023-11-21 10:23:05,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-21 10:23:05,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.202e+01 8.849e+01 9.632e+01 1.395e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 10:23:14,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219550 2023-11-21 10:23:17,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-21 10:23:44,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1463820.0, ans=0.125 2023-11-21 10:23:46,191 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3150, loss[loss=0.06711, simple_loss=0.086, pruned_loss=0.01345, audio_tagging_loss=0.01066, over 14659.00 frames. ], tot_loss[loss=0.07593, simple_loss=0.09767, pruned_loss=0.0174, audio_tagging_loss=0.009685, over 3058023.49 frames. ], batch size: 54, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:24:10,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-11-21 10:24:17,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219600 2023-11-21 10:24:17,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-21 10:24:24,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1464020.0, ans=22.5 2023-11-21 10:24:29,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-21 10:24:49,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=22.5 2023-11-21 10:24:50,999 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3200, loss[loss=0.06372, simple_loss=0.07971, pruned_loss=0.01174, audio_tagging_loss=0.01212, over 14711.00 frames. ], tot_loss[loss=0.07556, simple_loss=0.09704, pruned_loss=0.01723, audio_tagging_loss=0.009811, over 3056682.55 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:24:52,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1464153.3333333333, ans=0.125 2023-11-21 10:24:52,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1464153.3333333333, ans=0.0 2023-11-21 10:24:55,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1464153.3333333333, ans=0.125 2023-11-21 10:24:55,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2023-11-21 10:24:58,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-21 10:24:58,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1464153.3333333333, ans=0.125 2023-11-21 10:25:03,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1464220.0, ans=0.0 2023-11-21 10:25:06,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-21 10:25:13,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.202e+01 8.879e+01 9.509e+01 1.631e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 10:25:23,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219650 2023-11-21 10:25:53,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464420.0, ans=0.1 2023-11-21 10:25:55,611 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3250, loss[loss=0.06818, simple_loss=0.08475, pruned_loss=0.01429, audio_tagging_loss=0.01152, over 14449.00 frames. ], tot_loss[loss=0.07609, simple_loss=0.09788, pruned_loss=0.01727, audio_tagging_loss=0.009882, over 3053514.93 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:26:09,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1464553.3333333333, ans=0.0 2023-11-21 10:26:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1464553.3333333333, ans=0.125 2023-11-21 10:26:19,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-21 10:26:28,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219700 2023-11-21 10:26:28,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1464620.0, ans=10.0 2023-11-21 10:26:36,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2023-11-21 10:26:38,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1464686.6666666667, ans=0.2 2023-11-21 10:26:44,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=22.5 2023-11-21 10:26:49,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1464753.3333333333, ans=0.125 2023-11-21 10:26:58,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-21 10:26:58,994 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3300, loss[loss=0.05669, simple_loss=0.06958, pruned_loss=0.01063, audio_tagging_loss=0.01127, over 15444.00 frames. ], tot_loss[loss=0.07587, simple_loss=0.09756, pruned_loss=0.01713, audio_tagging_loss=0.009957, over 3054172.07 frames. ], batch size: 60, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:27:24,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.128e+01 8.736e+01 9.388e+01 1.641e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 10:27:32,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219750 2023-11-21 10:28:02,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1465086.6666666667, ans=0.0 2023-11-21 10:28:06,492 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3350, loss[loss=0.0669, simple_loss=0.08459, pruned_loss=0.01742, audio_tagging_loss=0.007185, over 14392.00 frames. ], tot_loss[loss=0.07616, simple_loss=0.09801, pruned_loss=0.01725, audio_tagging_loss=0.009902, over 3055970.63 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:28:07,997 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:28:13,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1465153.3333333333, ans=0.1 2023-11-21 10:28:24,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1465220.0, ans=0.0 2023-11-21 10:28:37,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219800 2023-11-21 10:28:45,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1465353.3333333333, ans=0.1 2023-11-21 10:28:47,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1465353.3333333333, ans=0.0 2023-11-21 10:28:51,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1465353.3333333333, ans=0.0 2023-11-21 10:29:11,214 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3400, loss[loss=0.0616, simple_loss=0.08325, pruned_loss=0.01257, audio_tagging_loss=0.007408, over 15287.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09802, pruned_loss=0.01722, audio_tagging_loss=0.009774, over 3053877.32 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:29:16,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1465486.6666666667, ans=0.125 2023-11-21 10:29:17,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1465486.6666666667, ans=10.0 2023-11-21 10:29:34,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1465553.3333333333, ans=0.025 2023-11-21 10:29:35,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.502e+01 9.051e+01 9.796e+01 1.205e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-21 10:29:45,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219850 2023-11-21 10:29:55,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-21 10:29:57,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1465686.6666666667, ans=0.2 2023-11-21 10:29:57,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1465686.6666666667, ans=0.125 2023-11-21 10:30:14,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1465820.0, ans=0.1 2023-11-21 10:30:15,862 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3450, loss[loss=0.05762, simple_loss=0.06711, pruned_loss=0.01403, audio_tagging_loss=0.01003, over 14940.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09766, pruned_loss=0.01715, audio_tagging_loss=0.009684, over 3049913.48 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:30:22,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1465820.0, ans=0.0 2023-11-21 10:30:26,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-21 10:30:43,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1465953.3333333333, ans=0.025 2023-11-21 10:30:49,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219900 2023-11-21 10:31:22,081 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3500, loss[loss=0.0961, simple_loss=0.1276, pruned_loss=0.02496, audio_tagging_loss=0.007346, over 15670.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.09648, pruned_loss=0.01691, audio_tagging_loss=0.009661, over 3049465.73 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:31:28,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.77 vs. limit=22.5 2023-11-21 10:31:35,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1466220.0, ans=0.125 2023-11-21 10:31:43,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1466220.0, ans=0.125 2023-11-21 10:31:44,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.587e+01 8.045e+01 8.820e+01 9.717e+01 1.244e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 10:31:52,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 219950 2023-11-21 10:31:53,996 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:31:57,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-21 10:32:00,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1466353.3333333333, ans=0.125 2023-11-21 10:32:05,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=12.0 2023-11-21 10:32:06,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1466353.3333333333, ans=0.0 2023-11-21 10:32:26,278 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3550, loss[loss=0.05407, simple_loss=0.06729, pruned_loss=0.01248, audio_tagging_loss=0.007941, over 15528.00 frames. ], tot_loss[loss=0.07459, simple_loss=0.09634, pruned_loss=0.01678, audio_tagging_loss=0.00964, over 3045360.70 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:32:34,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-21 10:32:40,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1466553.3333333333, ans=0.0 2023-11-21 10:32:46,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1466553.3333333333, ans=0.0 2023-11-21 10:32:55,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-21 10:32:57,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-21 10:32:58,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220000 2023-11-21 10:33:34,053 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3600, loss[loss=0.07865, simple_loss=0.1094, pruned_loss=0.01677, audio_tagging_loss=0.007177, over 16000.00 frames. ], tot_loss[loss=0.07413, simple_loss=0.09585, pruned_loss=0.01663, audio_tagging_loss=0.009567, over 3041967.44 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 32.0 2023-11-21 10:33:41,007 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:33:50,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2023-11-21 10:33:57,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2023-11-21 10:34:00,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 7.919e+01 8.702e+01 9.494e+01 1.216e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 10:34:07,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220050 2023-11-21 10:34:24,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1467020.0, ans=0.125 2023-11-21 10:34:33,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2023-11-21 10:34:38,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1467086.6666666667, ans=0.1 2023-11-21 10:34:40,911 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3650, loss[loss=0.1078, simple_loss=0.1446, pruned_loss=0.02752, audio_tagging_loss=0.007988, over 16092.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09726, pruned_loss=0.01699, audio_tagging_loss=0.009536, over 3050153.73 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:34:51,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1467153.3333333333, ans=0.1 2023-11-21 10:34:53,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1467220.0, ans=0.0 2023-11-21 10:35:00,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1467220.0, ans=0.125 2023-11-21 10:35:06,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1467286.6666666667, ans=0.125 2023-11-21 10:35:07,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-21 10:35:12,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220100 2023-11-21 10:35:15,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-21 10:35:30,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1467353.3333333333, ans=0.2 2023-11-21 10:35:34,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1467420.0, ans=0.125 2023-11-21 10:35:41,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1467420.0, ans=0.0 2023-11-21 10:35:41,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1467420.0, ans=0.125 2023-11-21 10:35:44,969 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3700, loss[loss=0.06698, simple_loss=0.08036, pruned_loss=0.01455, audio_tagging_loss=0.01225, over 14268.00 frames. ], tot_loss[loss=0.07523, simple_loss=0.09715, pruned_loss=0.01704, audio_tagging_loss=0.009617, over 3047043.39 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:35:57,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1467553.3333333333, ans=0.0 2023-11-21 10:36:08,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 8.151e+01 8.696e+01 9.414e+01 1.695e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 10:36:16,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1467620.0, ans=0.125 2023-11-21 10:36:17,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220150 2023-11-21 10:36:24,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1467686.6666666667, ans=0.125 2023-11-21 10:36:27,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2023-11-21 10:36:37,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1467753.3333333333, ans=0.125 2023-11-21 10:36:42,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=12.0 2023-11-21 10:36:49,681 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3750, loss[loss=0.09309, simple_loss=0.1268, pruned_loss=0.02171, audio_tagging_loss=0.007995, over 15047.00 frames. ], tot_loss[loss=0.07586, simple_loss=0.09789, pruned_loss=0.01727, audio_tagging_loss=0.009639, over 3042277.97 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:36:58,988 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:37:05,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1467886.6666666667, ans=0.0 2023-11-21 10:37:18,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1467953.3333333333, ans=0.125 2023-11-21 10:37:23,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220200 2023-11-21 10:37:35,333 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:37:35,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1468020.0, ans=0.95 2023-11-21 10:37:35,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2023-11-21 10:37:57,227 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3800, loss[loss=0.07305, simple_loss=0.0938, pruned_loss=0.01643, audio_tagging_loss=0.009723, over 15405.00 frames. ], tot_loss[loss=0.07564, simple_loss=0.09763, pruned_loss=0.01709, audio_tagging_loss=0.009735, over 3049055.24 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 8.0 2023-11-21 10:38:06,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1468153.3333333333, ans=0.07 2023-11-21 10:38:10,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1468220.0, ans=0.0 2023-11-21 10:38:12,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1468220.0, ans=0.2 2023-11-21 10:38:14,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-11-21 10:38:19,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1468220.0, ans=0.125 2023-11-21 10:38:22,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.173e+01 8.825e+01 9.624e+01 1.863e+02, threshold=1.765e+02, percent-clipped=1.0 2023-11-21 10:38:28,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220250 2023-11-21 10:38:42,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-21 10:38:48,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1468420.0, ans=0.125 2023-11-21 10:39:01,411 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3850, loss[loss=0.04163, simple_loss=0.04966, pruned_loss=0.006397, audio_tagging_loss=0.01041, over 14668.00 frames. ], tot_loss[loss=0.07596, simple_loss=0.09808, pruned_loss=0.01713, audio_tagging_loss=0.009797, over 3050118.76 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 8.0 2023-11-21 10:39:12,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-11-21 10:39:25,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1468620.0, ans=0.125 2023-11-21 10:39:31,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1468620.0, ans=0.125 2023-11-21 10:39:34,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220300 2023-11-21 10:39:35,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1468620.0, ans=0.1 2023-11-21 10:39:35,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1468620.0, ans=0.125 2023-11-21 10:39:44,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1468686.6666666667, ans=0.1 2023-11-21 10:39:59,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1468753.3333333333, ans=0.1 2023-11-21 10:40:05,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1468820.0, ans=0.95 2023-11-21 10:40:06,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2023-11-21 10:40:06,827 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3900, loss[loss=0.06656, simple_loss=0.08902, pruned_loss=0.01257, audio_tagging_loss=0.009479, over 16134.00 frames. ], tot_loss[loss=0.07611, simple_loss=0.09817, pruned_loss=0.01727, audio_tagging_loss=0.009758, over 3048579.07 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 8.0 2023-11-21 10:40:34,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.413e+01 7.996e+01 8.695e+01 9.569e+01 1.427e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 10:40:40,604 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220350 2023-11-21 10:41:01,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1469086.6666666667, ans=0.125 2023-11-21 10:41:13,880 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 3950, loss[loss=0.08383, simple_loss=0.1123, pruned_loss=0.0178, audio_tagging_loss=0.009887, over 15741.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.09739, pruned_loss=0.01718, audio_tagging_loss=0.009904, over 3046908.74 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 8.0 2023-11-21 10:41:24,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1469153.3333333333, ans=0.1 2023-11-21 10:41:45,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-21 10:41:46,260 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220400 2023-11-21 10:41:46,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1469286.6666666667, ans=0.125 2023-11-21 10:42:09,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1469420.0, ans=0.125 2023-11-21 10:42:19,409 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4000, loss[loss=0.08562, simple_loss=0.1119, pruned_loss=0.02059, audio_tagging_loss=0.009059, over 15907.00 frames. ], tot_loss[loss=0.07692, simple_loss=0.09884, pruned_loss=0.01759, audio_tagging_loss=0.009911, over 3045652.28 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 16.0 2023-11-21 10:42:44,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 8.206e+01 8.898e+01 9.673e+01 1.616e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 10:42:46,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1469620.0, ans=0.5 2023-11-21 10:42:52,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220450 2023-11-21 10:42:52,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1469620.0, ans=0.0 2023-11-21 10:42:52,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1469620.0, ans=0.0 2023-11-21 10:43:16,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1469753.3333333333, ans=0.015 2023-11-21 10:43:17,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1469753.3333333333, ans=0.125 2023-11-21 10:43:24,157 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4050, loss[loss=0.08006, simple_loss=0.1027, pruned_loss=0.01987, audio_tagging_loss=0.008857, over 15980.00 frames. ], tot_loss[loss=0.07712, simple_loss=0.09948, pruned_loss=0.01758, audio_tagging_loss=0.009795, over 3050837.64 frames. ], batch size: 62, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:43:26,792 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:43:56,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220500 2023-11-21 10:44:19,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1470086.6666666667, ans=0.2 2023-11-21 10:44:28,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1470153.3333333333, ans=0.2 2023-11-21 10:44:29,551 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4100, loss[loss=0.09234, simple_loss=0.1204, pruned_loss=0.02208, audio_tagging_loss=0.01007, over 13785.00 frames. ], tot_loss[loss=0.07707, simple_loss=0.09933, pruned_loss=0.01763, audio_tagging_loss=0.009782, over 3047519.61 frames. ], batch size: 50, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:44:55,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.123e+01 8.556e+01 9.611e+01 1.349e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 10:45:01,792 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220550 2023-11-21 10:45:15,019 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:45:34,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-11-21 10:45:35,362 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4150, loss[loss=0.06256, simple_loss=0.06843, pruned_loss=0.01813, audio_tagging_loss=0.01021, over 14053.00 frames. ], tot_loss[loss=0.07686, simple_loss=0.09931, pruned_loss=0.01752, audio_tagging_loss=0.009679, over 3044647.87 frames. ], batch size: 58, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:45:35,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1470486.6666666667, ans=10.0 2023-11-21 10:45:50,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1470553.3333333333, ans=0.125 2023-11-21 10:45:52,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1470553.3333333333, ans=0.1 2023-11-21 10:45:55,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-21 10:45:59,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1470620.0, ans=0.125 2023-11-21 10:46:07,506 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220600 2023-11-21 10:46:09,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1470620.0, ans=0.0 2023-11-21 10:46:11,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-21 10:46:22,589 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:46:39,650 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4200, loss[loss=0.07453, simple_loss=0.1002, pruned_loss=0.0158, audio_tagging_loss=0.00863, over 15684.00 frames. ], tot_loss[loss=0.07622, simple_loss=0.09877, pruned_loss=0.01731, audio_tagging_loss=0.009524, over 3046710.53 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:46:46,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1470820.0, ans=0.2 2023-11-21 10:46:58,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1470886.6666666667, ans=0.125 2023-11-21 10:47:00,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1470886.6666666667, ans=0.125 2023-11-21 10:47:06,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.111e+01 8.714e+01 9.250e+01 1.135e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 10:47:12,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220650 2023-11-21 10:47:15,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1470953.3333333333, ans=0.09899494936611666 2023-11-21 10:47:31,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-21 10:47:44,739 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4250, loss[loss=0.08911, simple_loss=0.1228, pruned_loss=0.02171, audio_tagging_loss=0.005977, over 15012.00 frames. ], tot_loss[loss=0.0759, simple_loss=0.09867, pruned_loss=0.01719, audio_tagging_loss=0.009384, over 3053342.14 frames. ], batch size: 54, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:47:44,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1471153.3333333333, ans=0.125 2023-11-21 10:48:01,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-21 10:48:17,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220700 2023-11-21 10:48:31,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-21 10:48:36,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1471420.0, ans=0.125 2023-11-21 10:48:50,614 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4300, loss[loss=0.09134, simple_loss=0.1225, pruned_loss=0.02013, audio_tagging_loss=0.009971, over 15702.00 frames. ], tot_loss[loss=0.0762, simple_loss=0.09891, pruned_loss=0.01733, audio_tagging_loss=0.009411, over 3049954.51 frames. ], batch size: 59, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:48:50,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1471486.6666666667, ans=0.0 2023-11-21 10:49:04,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1471553.3333333333, ans=0.125 2023-11-21 10:49:05,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1471553.3333333333, ans=0.125 2023-11-21 10:49:06,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1471553.3333333333, ans=0.125 2023-11-21 10:49:08,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-21 10:49:09,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1471553.3333333333, ans=0.125 2023-11-21 10:49:12,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1471553.3333333333, ans=0.2 2023-11-21 10:49:15,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.061e+01 8.689e+01 9.437e+01 1.140e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 10:49:22,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220750 2023-11-21 10:49:35,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1471686.6666666667, ans=0.0 2023-11-21 10:49:42,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1471753.3333333333, ans=0.125 2023-11-21 10:49:54,728 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4350, loss[loss=0.08616, simple_loss=0.1151, pruned_loss=0.0188, audio_tagging_loss=0.009794, over 15399.00 frames. ], tot_loss[loss=0.07574, simple_loss=0.09831, pruned_loss=0.01718, audio_tagging_loss=0.009402, over 3049252.82 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:49:55,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-11-21 10:50:18,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1471886.6666666667, ans=0.0 2023-11-21 10:50:24,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1471953.3333333333, ans=0.0 2023-11-21 10:50:28,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220800 2023-11-21 10:50:29,947 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:50:35,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1472020.0, ans=0.04949747468305833 2023-11-21 10:50:59,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1472153.3333333333, ans=0.125 2023-11-21 10:51:00,749 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4400, loss[loss=0.05339, simple_loss=0.06305, pruned_loss=0.008435, audio_tagging_loss=0.01343, over 14392.00 frames. ], tot_loss[loss=0.07552, simple_loss=0.09793, pruned_loss=0.01706, audio_tagging_loss=0.009497, over 3048084.59 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 10:51:12,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1472153.3333333333, ans=0.2 2023-11-21 10:51:27,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.151e+01 8.586e+01 9.427e+01 1.147e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-21 10:51:33,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220850 2023-11-21 10:51:37,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-21 10:51:58,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.34 vs. limit=10.0 2023-11-21 10:52:06,468 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4450, loss[loss=0.08386, simple_loss=0.1123, pruned_loss=0.01825, audio_tagging_loss=0.009458, over 13870.00 frames. ], tot_loss[loss=0.07561, simple_loss=0.09808, pruned_loss=0.01714, audio_tagging_loss=0.009422, over 3049866.44 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 10:52:08,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1472486.6666666667, ans=0.0 2023-11-21 10:52:34,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-21 10:52:34,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-21 10:52:38,093 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220900 2023-11-21 10:52:42,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-21 10:53:11,301 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4500, loss[loss=0.07856, simple_loss=0.1128, pruned_loss=0.01432, audio_tagging_loss=0.007859, over 15309.00 frames. ], tot_loss[loss=0.07566, simple_loss=0.09828, pruned_loss=0.01716, audio_tagging_loss=0.009365, over 3046397.82 frames. ], batch size: 54, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 10:53:23,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1472886.6666666667, ans=0.0 2023-11-21 10:53:29,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-21 10:53:38,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.114e+01 8.781e+01 9.396e+01 1.182e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-21 10:53:45,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 220950 2023-11-21 10:53:52,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2023-11-21 10:54:01,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1473020.0, ans=0.125 2023-11-21 10:54:12,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1473086.6666666667, ans=0.125 2023-11-21 10:54:15,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1473153.3333333333, ans=0.125 2023-11-21 10:54:16,250 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4550, loss[loss=0.05393, simple_loss=0.06295, pruned_loss=0.009133, audio_tagging_loss=0.01333, over 16017.00 frames. ], tot_loss[loss=0.07543, simple_loss=0.09766, pruned_loss=0.01707, audio_tagging_loss=0.009537, over 3049886.01 frames. ], batch size: 62, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 10:54:34,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1473220.0, ans=0.025 2023-11-21 10:54:44,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1473286.6666666667, ans=0.0 2023-11-21 10:54:49,111 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221000 2023-11-21 10:54:49,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1473286.6666666667, ans=0.0 2023-11-21 10:54:53,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-21 10:54:54,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1473353.3333333333, ans=0.125 2023-11-21 10:54:59,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-21 10:55:00,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1473353.3333333333, ans=0.0 2023-11-21 10:55:01,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1473353.3333333333, ans=0.125 2023-11-21 10:55:05,204 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 10:55:05,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-21 10:55:10,328 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:55:22,536 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4600, loss[loss=0.07642, simple_loss=0.08996, pruned_loss=0.02347, audio_tagging_loss=0.007967, over 14678.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.09665, pruned_loss=0.01688, audio_tagging_loss=0.009598, over 3052147.23 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:55:48,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.454e+01 8.163e+01 8.661e+01 9.433e+01 1.192e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 10:55:53,764 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221050 2023-11-21 10:55:56,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1473620.0, ans=0.0 2023-11-21 10:55:56,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1473620.0, ans=0.125 2023-11-21 10:56:04,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1473686.6666666667, ans=0.035 2023-11-21 10:56:07,368 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 10:56:17,416 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.336e-02 2023-11-21 10:56:27,005 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4650, loss[loss=0.05975, simple_loss=0.07392, pruned_loss=0.01283, audio_tagging_loss=0.009959, over 14302.00 frames. ], tot_loss[loss=0.07459, simple_loss=0.09615, pruned_loss=0.01676, audio_tagging_loss=0.009753, over 3048297.58 frames. ], batch size: 58, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:56:59,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221100 2023-11-21 10:57:18,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=15.0 2023-11-21 10:57:20,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1474086.6666666667, ans=0.2 2023-11-21 10:57:30,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1474153.3333333333, ans=0.1 2023-11-21 10:57:31,368 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4700, loss[loss=0.09859, simple_loss=0.1282, pruned_loss=0.02273, audio_tagging_loss=0.01176, over 15137.00 frames. ], tot_loss[loss=0.07497, simple_loss=0.09671, pruned_loss=0.01688, audio_tagging_loss=0.009731, over 3050929.92 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:57:39,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1474153.3333333333, ans=0.0 2023-11-21 10:57:45,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1474220.0, ans=0.2 2023-11-21 10:57:57,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-21 10:57:58,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474286.6666666667, ans=0.1 2023-11-21 10:57:59,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.412e+01 8.839e+01 9.940e+01 1.268e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 10:58:04,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221150 2023-11-21 10:58:07,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=1474286.6666666667, ans=12.0 2023-11-21 10:58:07,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-21 10:58:17,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1474353.3333333333, ans=0.125 2023-11-21 10:58:27,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474420.0, ans=0.1 2023-11-21 10:58:30,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1474420.0, ans=0.2 2023-11-21 10:58:37,194 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4750, loss[loss=0.07558, simple_loss=0.09722, pruned_loss=0.01789, audio_tagging_loss=0.009075, over 16398.00 frames. ], tot_loss[loss=0.07464, simple_loss=0.09633, pruned_loss=0.01681, audio_tagging_loss=0.009662, over 3049245.36 frames. ], batch size: 61, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 10:59:07,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-21 10:59:09,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221200 2023-11-21 10:59:43,337 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4800, loss[loss=0.07616, simple_loss=0.08542, pruned_loss=0.02028, audio_tagging_loss=0.01318, over 15111.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09649, pruned_loss=0.01688, audio_tagging_loss=0.009747, over 3057461.12 frames. ], batch size: 58, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 10:59:46,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1474820.0, ans=0.0 2023-11-21 10:59:54,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1474886.6666666667, ans=0.2 2023-11-21 10:59:57,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1474886.6666666667, ans=0.2 2023-11-21 10:59:59,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:00:01,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2023-11-21 11:00:10,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.162e+01 8.841e+01 9.779e+01 1.301e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 11:00:15,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221250 2023-11-21 11:00:24,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1475020.0, ans=0.0 2023-11-21 11:00:41,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1475086.6666666667, ans=0.125 2023-11-21 11:00:47,741 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4850, loss[loss=0.06619, simple_loss=0.07772, pruned_loss=0.01567, audio_tagging_loss=0.01166, over 15519.00 frames. ], tot_loss[loss=0.07507, simple_loss=0.09657, pruned_loss=0.01688, audio_tagging_loss=0.009903, over 3053161.14 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:00:57,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1475153.3333333333, ans=0.0 2023-11-21 11:01:05,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2023-11-21 11:01:21,613 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221300 2023-11-21 11:01:36,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1475353.3333333333, ans=0.125 2023-11-21 11:01:52,952 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4900, loss[loss=0.06911, simple_loss=0.08734, pruned_loss=0.01485, audio_tagging_loss=0.0106, over 14895.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.09608, pruned_loss=0.01682, audio_tagging_loss=0.009844, over 3049221.58 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:02:07,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=22.5 2023-11-21 11:02:09,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1475553.3333333333, ans=0.125 2023-11-21 11:02:19,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1475620.0, ans=0.0 2023-11-21 11:02:20,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.034e+01 8.624e+01 9.399e+01 1.492e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-21 11:02:24,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221350 2023-11-21 11:02:57,363 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 4950, loss[loss=0.09674, simple_loss=0.1289, pruned_loss=0.02498, audio_tagging_loss=0.007286, over 16530.00 frames. ], tot_loss[loss=0.07484, simple_loss=0.09653, pruned_loss=0.01687, audio_tagging_loss=0.009702, over 3049954.90 frames. ], batch size: 58, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:03:05,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1475820.0, ans=0.1 2023-11-21 11:03:13,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1475886.6666666667, ans=0.125 2023-11-21 11:03:16,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1475886.6666666667, ans=0.125 2023-11-21 11:03:18,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1475886.6666666667, ans=0.0 2023-11-21 11:03:29,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221400 2023-11-21 11:03:43,391 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:03:49,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2023-11-21 11:03:54,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1476086.6666666667, ans=0.125 2023-11-21 11:04:02,081 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5000, loss[loss=0.07619, simple_loss=0.1029, pruned_loss=0.01727, audio_tagging_loss=0.00747, over 15969.00 frames. ], tot_loss[loss=0.0757, simple_loss=0.09787, pruned_loss=0.01732, audio_tagging_loss=0.009453, over 3051406.64 frames. ], batch size: 62, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:04:27,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-21 11:04:31,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.334e+01 8.366e+01 9.036e+01 1.004e+02 1.239e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 11:04:35,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221450 2023-11-21 11:04:43,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1476353.3333333333, ans=0.1 2023-11-21 11:04:43,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1476353.3333333333, ans=0.1 2023-11-21 11:04:52,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1476353.3333333333, ans=0.0 2023-11-21 11:05:02,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1476420.0, ans=0.125 2023-11-21 11:05:07,106 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5050, loss[loss=0.05928, simple_loss=0.07384, pruned_loss=0.01378, audio_tagging_loss=0.008581, over 14256.00 frames. ], tot_loss[loss=0.07543, simple_loss=0.0975, pruned_loss=0.01728, audio_tagging_loss=0.009401, over 3046555.55 frames. ], batch size: 55, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:05:39,939 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221500 2023-11-21 11:05:48,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1476686.6666666667, ans=0.1 2023-11-21 11:06:01,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-21 11:06:07,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1476753.3333333333, ans=0.125 2023-11-21 11:06:12,996 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5100, loss[loss=0.05641, simple_loss=0.05412, pruned_loss=0.01424, audio_tagging_loss=0.01511, over 14146.00 frames. ], tot_loss[loss=0.07492, simple_loss=0.09662, pruned_loss=0.01715, audio_tagging_loss=0.009453, over 3049991.15 frames. ], batch size: 54, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:06:31,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1476886.6666666667, ans=0.125 2023-11-21 11:06:34,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1476886.6666666667, ans=0.125 2023-11-21 11:06:40,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.039e+01 8.516e+01 9.144e+01 1.339e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-21 11:06:42,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=12.0 2023-11-21 11:06:44,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1476953.3333333333, ans=0.2 2023-11-21 11:06:45,280 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221550 2023-11-21 11:06:50,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1476953.3333333333, ans=0.035 2023-11-21 11:07:00,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477020.0, ans=0.1 2023-11-21 11:07:04,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1477086.6666666667, ans=0.1 2023-11-21 11:07:06,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-21 11:07:09,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1477086.6666666667, ans=0.0 2023-11-21 11:07:18,247 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5150, loss[loss=0.07589, simple_loss=0.1021, pruned_loss=0.01603, audio_tagging_loss=0.008815, over 15585.00 frames. ], tot_loss[loss=0.07426, simple_loss=0.09576, pruned_loss=0.01693, audio_tagging_loss=0.009453, over 3041792.06 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 16.0 2023-11-21 11:07:26,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-21 11:07:28,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-21 11:07:51,174 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221600 2023-11-21 11:07:52,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1477286.6666666667, ans=0.95 2023-11-21 11:08:23,353 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5200, loss[loss=0.07017, simple_loss=0.09115, pruned_loss=0.01176, audio_tagging_loss=0.01284, over 16048.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09626, pruned_loss=0.01681, audio_tagging_loss=0.009452, over 3043007.30 frames. ], batch size: 60, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 11:08:41,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1477553.3333333333, ans=0.125 2023-11-21 11:08:43,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1477553.3333333333, ans=0.0 2023-11-21 11:08:45,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1477553.3333333333, ans=0.1 2023-11-21 11:08:50,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1477620.0, ans=0.125 2023-11-21 11:08:51,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-21 11:08:51,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.310e+01 8.141e+01 8.886e+01 9.437e+01 1.227e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 11:08:55,943 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221650 2023-11-21 11:09:04,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1477686.6666666667, ans=0.0 2023-11-21 11:09:10,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1477686.6666666667, ans=0.2 2023-11-21 11:09:15,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2023-11-21 11:09:28,811 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5250, loss[loss=0.06906, simple_loss=0.09004, pruned_loss=0.01343, audio_tagging_loss=0.01061, over 14038.00 frames. ], tot_loss[loss=0.07502, simple_loss=0.09696, pruned_loss=0.01708, audio_tagging_loss=0.009464, over 3045369.84 frames. ], batch size: 55, lr: 3.62e-03, grad_scale: 32.0 2023-11-21 11:09:34,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1477820.0, ans=0.125 2023-11-21 11:09:57,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477953.3333333333, ans=0.1 2023-11-21 11:10:00,078 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221700 2023-11-21 11:10:14,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1478020.0, ans=0.95 2023-11-21 11:10:15,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1478020.0, ans=0.1 2023-11-21 11:10:22,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1478086.6666666667, ans=0.125 2023-11-21 11:10:25,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-11-21 11:10:32,511 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5300, loss[loss=0.08292, simple_loss=0.11, pruned_loss=0.01756, audio_tagging_loss=0.01038, over 14738.00 frames. ], tot_loss[loss=0.07559, simple_loss=0.09768, pruned_loss=0.01731, audio_tagging_loss=0.009442, over 3043151.75 frames. ], batch size: 54, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:10:58,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-21 11:11:02,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.119e+01 8.657e+01 9.101e+01 1.165e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 11:11:06,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221750 2023-11-21 11:11:08,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1478286.6666666667, ans=0.0 2023-11-21 11:11:25,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1478420.0, ans=0.2 2023-11-21 11:11:29,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1478420.0, ans=0.125 2023-11-21 11:11:36,904 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5350, loss[loss=0.09726, simple_loss=0.1337, pruned_loss=0.02301, audio_tagging_loss=0.00739, over 15201.00 frames. ], tot_loss[loss=0.07564, simple_loss=0.09766, pruned_loss=0.0174, audio_tagging_loss=0.009415, over 3045881.77 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:11:59,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-11-21 11:12:00,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1478553.3333333333, ans=0.0 2023-11-21 11:12:09,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221800 2023-11-21 11:12:17,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1478686.6666666667, ans=0.0 2023-11-21 11:12:38,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1478753.3333333333, ans=0.0 2023-11-21 11:12:42,761 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5400, loss[loss=0.06774, simple_loss=0.083, pruned_loss=0.01668, audio_tagging_loss=0.00956, over 15050.00 frames. ], tot_loss[loss=0.07582, simple_loss=0.098, pruned_loss=0.01736, audio_tagging_loss=0.009453, over 3048528.12 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:12:46,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1478820.0, ans=0.125 2023-11-21 11:12:46,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1478820.0, ans=0.125 2023-11-21 11:13:04,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1478886.6666666667, ans=0.125 2023-11-21 11:13:11,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.155e+01 8.706e+01 9.355e+01 1.764e+02, threshold=1.741e+02, percent-clipped=1.0 2023-11-21 11:13:14,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221850 2023-11-21 11:13:25,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1479020.0, ans=0.07 2023-11-21 11:13:30,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-11-21 11:13:43,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1479086.6666666667, ans=0.125 2023-11-21 11:13:43,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1479086.6666666667, ans=0.0 2023-11-21 11:13:43,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-21 11:13:46,680 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5450, loss[loss=0.08604, simple_loss=0.1161, pruned_loss=0.01971, audio_tagging_loss=0.008268, over 16038.00 frames. ], tot_loss[loss=0.07645, simple_loss=0.09874, pruned_loss=0.01762, audio_tagging_loss=0.00946, over 3048444.01 frames. ], batch size: 61, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:13:59,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1479220.0, ans=0.0 2023-11-21 11:14:04,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1479220.0, ans=0.125 2023-11-21 11:14:09,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1479220.0, ans=0.0 2023-11-21 11:14:14,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1479286.6666666667, ans=0.0 2023-11-21 11:14:19,849 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221900 2023-11-21 11:14:42,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1479420.0, ans=0.2 2023-11-21 11:14:50,950 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5500, loss[loss=0.08665, simple_loss=0.1186, pruned_loss=0.01882, audio_tagging_loss=0.008553, over 14882.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.09909, pruned_loss=0.01745, audio_tagging_loss=0.009419, over 3047691.08 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:14:59,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1479486.6666666667, ans=0.125 2023-11-21 11:15:04,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:15:04,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479553.3333333333, ans=0.1 2023-11-21 11:15:21,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.123e+01 9.057e+01 9.951e+01 1.227e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-21 11:15:23,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 221950 2023-11-21 11:15:47,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1479753.3333333333, ans=0.0 2023-11-21 11:15:54,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1479753.3333333333, ans=0.125 2023-11-21 11:15:56,238 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5550, loss[loss=0.06118, simple_loss=0.0807, pruned_loss=0.01265, audio_tagging_loss=0.008183, over 15055.00 frames. ], tot_loss[loss=0.07651, simple_loss=0.09923, pruned_loss=0.01737, audio_tagging_loss=0.009522, over 3041271.04 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:16:27,024 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222000 2023-11-21 11:16:29,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1479953.3333333333, ans=0.0 2023-11-21 11:16:30,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1479953.3333333333, ans=0.125 2023-11-21 11:16:37,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-21 11:16:50,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1480086.6666666667, ans=0.125 2023-11-21 11:16:54,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-11-21 11:17:00,043 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5600, loss[loss=0.07114, simple_loss=0.08616, pruned_loss=0.01721, audio_tagging_loss=0.01085, over 14731.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09838, pruned_loss=0.01717, audio_tagging_loss=0.009632, over 3041459.12 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:17:01,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1480153.3333333333, ans=0.2 2023-11-21 11:17:11,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1480220.0, ans=0.2 2023-11-21 11:17:14,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1480220.0, ans=0.125 2023-11-21 11:17:18,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1480220.0, ans=0.0 2023-11-21 11:17:30,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.739e+01 7.938e+01 8.660e+01 9.316e+01 1.150e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 11:17:31,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222050 2023-11-21 11:17:32,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1480286.6666666667, ans=0.125 2023-11-21 11:17:44,706 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 11:17:47,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-21 11:18:02,764 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5650, loss[loss=0.07696, simple_loss=0.09776, pruned_loss=0.01607, audio_tagging_loss=0.01201, over 15651.00 frames. ], tot_loss[loss=0.07531, simple_loss=0.0972, pruned_loss=0.01692, audio_tagging_loss=0.009786, over 3035402.92 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:18:24,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-21 11:18:35,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222100 2023-11-21 11:18:39,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1480620.0, ans=0.125 2023-11-21 11:18:54,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-21 11:19:07,294 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5700, loss[loss=0.1017, simple_loss=0.1358, pruned_loss=0.02569, audio_tagging_loss=0.008119, over 15102.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.0976, pruned_loss=0.01721, audio_tagging_loss=0.009767, over 3036041.21 frames. ], batch size: 54, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:19:15,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1480820.0, ans=0.125 2023-11-21 11:19:23,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1480886.6666666667, ans=0.0 2023-11-21 11:19:35,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1480953.3333333333, ans=0.125 2023-11-21 11:19:37,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.400e+01 9.034e+01 9.970e+01 1.319e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 11:19:38,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222150 2023-11-21 11:20:08,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-21 11:20:11,594 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5750, loss[loss=0.0655, simple_loss=0.07937, pruned_loss=0.01848, audio_tagging_loss=0.007336, over 14944.00 frames. ], tot_loss[loss=0.07549, simple_loss=0.09718, pruned_loss=0.01723, audio_tagging_loss=0.00967, over 3041942.30 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:20:36,504 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:20:43,237 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222200 2023-11-21 11:21:03,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1481420.0, ans=0.1 2023-11-21 11:21:07,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1481420.0, ans=0.0 2023-11-21 11:21:14,841 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5800, loss[loss=0.08761, simple_loss=0.1223, pruned_loss=0.02075, audio_tagging_loss=0.00571, over 15244.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09648, pruned_loss=0.017, audio_tagging_loss=0.009558, over 3043924.50 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:21:15,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1481486.6666666667, ans=0.125 2023-11-21 11:21:41,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-21 11:21:46,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.053e+01 8.570e+01 9.375e+01 1.179e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 11:21:47,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222250 2023-11-21 11:21:58,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1481686.6666666667, ans=0.0 2023-11-21 11:22:01,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1481686.6666666667, ans=0.125 2023-11-21 11:22:02,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1481686.6666666667, ans=0.0 2023-11-21 11:22:18,586 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5850, loss[loss=0.08993, simple_loss=0.132, pruned_loss=0.02013, audio_tagging_loss=0.003784, over 15035.00 frames. ], tot_loss[loss=0.07482, simple_loss=0.09666, pruned_loss=0.01696, audio_tagging_loss=0.009533, over 3047536.34 frames. ], batch size: 55, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:22:23,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2023-11-21 11:22:49,117 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:22:50,186 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222300 2023-11-21 11:22:56,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1482020.0, ans=0.5 2023-11-21 11:23:02,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1482020.0, ans=0.0 2023-11-21 11:23:04,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1482020.0, ans=0.2 2023-11-21 11:23:20,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1482086.6666666667, ans=0.0 2023-11-21 11:23:22,441 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5900, loss[loss=0.08391, simple_loss=0.1044, pruned_loss=0.02388, audio_tagging_loss=0.007832, over 15922.00 frames. ], tot_loss[loss=0.07477, simple_loss=0.09665, pruned_loss=0.01696, audio_tagging_loss=0.009481, over 3046933.32 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:23:34,853 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:23:37,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-21 11:23:40,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1482220.0, ans=0.035 2023-11-21 11:23:51,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.554e+01 8.184e+01 8.891e+01 9.441e+01 1.396e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 11:23:52,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222350 2023-11-21 11:23:59,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1482353.3333333333, ans=0.1 2023-11-21 11:24:06,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1482353.3333333333, ans=0.1 2023-11-21 11:24:22,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1482420.0, ans=0.125 2023-11-21 11:24:24,459 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 5950, loss[loss=0.06833, simple_loss=0.08375, pruned_loss=0.01909, audio_tagging_loss=0.007364, over 15044.00 frames. ], tot_loss[loss=0.07447, simple_loss=0.09634, pruned_loss=0.0168, audio_tagging_loss=0.0095, over 3045566.91 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:24:29,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1482486.6666666667, ans=0.0 2023-11-21 11:24:31,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1482486.6666666667, ans=0.125 2023-11-21 11:24:47,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1482553.3333333333, ans=0.05 2023-11-21 11:24:57,004 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222400 2023-11-21 11:25:04,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1482686.6666666667, ans=0.0 2023-11-21 11:25:12,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1482686.6666666667, ans=0.125 2023-11-21 11:25:28,514 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6000, loss[loss=0.08714, simple_loss=0.1251, pruned_loss=0.01821, audio_tagging_loss=0.006399, over 16834.00 frames. ], tot_loss[loss=0.07421, simple_loss=0.09592, pruned_loss=0.01675, audio_tagging_loss=0.009502, over 3046707.23 frames. ], batch size: 60, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:25:28,515 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 11:25:52,022 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6697, 4.3113, 4.6319, 4.1117], device='cuda:1') 2023-11-21 11:26:10,009 INFO [train_asr.py:1253] (1/4) Epoch 19, validation: loss=0.05983, simple_loss=0.05236, pruned_loss=0.005293, audio_tagging_loss=0.02835, over 4681554.00 frames. 2023-11-21 11:26:10,010 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 11:26:21,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1482886.6666666667, ans=0.0 2023-11-21 11:26:24,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1482886.6666666667, ans=0.0 2023-11-21 11:26:29,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1482886.6666666667, ans=0.2 2023-11-21 11:26:36,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1482953.3333333333, ans=0.0 2023-11-21 11:26:40,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.474e+01 7.915e+01 8.568e+01 9.449e+01 1.142e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 11:26:41,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-21 11:26:41,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222450 2023-11-21 11:26:49,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1483020.0, ans=0.0 2023-11-21 11:26:55,894 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 11:27:12,933 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6050, loss[loss=0.1083, simple_loss=0.147, pruned_loss=0.02878, audio_tagging_loss=0.005973, over 15360.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09683, pruned_loss=0.01683, audio_tagging_loss=0.009421, over 3050847.90 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:27:17,012 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:27:32,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1483220.0, ans=0.125 2023-11-21 11:27:45,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222500 2023-11-21 11:27:50,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1483353.3333333333, ans=0.1 2023-11-21 11:28:16,883 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6100, loss[loss=0.08076, simple_loss=0.1148, pruned_loss=0.01706, audio_tagging_loss=0.006302, over 16089.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.09692, pruned_loss=0.01692, audio_tagging_loss=0.009428, over 3048987.90 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:28:17,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1483486.6666666667, ans=0.0 2023-11-21 11:28:35,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1483553.3333333333, ans=0.125 2023-11-21 11:28:47,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.060e+01 8.165e+01 8.686e+01 9.448e+01 1.325e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 11:28:48,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222550 2023-11-21 11:28:56,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1483686.6666666667, ans=0.0 2023-11-21 11:29:17,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1483753.3333333333, ans=0.125 2023-11-21 11:29:21,032 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6150, loss[loss=0.1277, simple_loss=0.1671, pruned_loss=0.03442, audio_tagging_loss=0.009691, over 16534.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.09689, pruned_loss=0.01675, audio_tagging_loss=0.009505, over 3045754.06 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:29:26,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1483820.0, ans=0.0 2023-11-21 11:29:39,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1483886.6666666667, ans=0.1 2023-11-21 11:29:52,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222600 2023-11-21 11:30:00,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1484020.0, ans=0.125 2023-11-21 11:30:05,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1484020.0, ans=0.125 2023-11-21 11:30:19,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1484086.6666666667, ans=0.0 2023-11-21 11:30:25,222 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6200, loss[loss=0.05852, simple_loss=0.07133, pruned_loss=0.01342, audio_tagging_loss=0.009435, over 14547.00 frames. ], tot_loss[loss=0.07434, simple_loss=0.09593, pruned_loss=0.01672, audio_tagging_loss=0.009656, over 3045499.66 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:30:39,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1484220.0, ans=10.0 2023-11-21 11:30:55,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.122e+01 8.795e+01 9.552e+01 1.328e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 11:30:57,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222650 2023-11-21 11:31:05,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-21 11:31:10,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1484353.3333333333, ans=0.1 2023-11-21 11:31:12,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1484353.3333333333, ans=0.04949747468305833 2023-11-21 11:31:28,746 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6250, loss[loss=0.06745, simple_loss=0.08708, pruned_loss=0.01341, audio_tagging_loss=0.0105, over 15060.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.09519, pruned_loss=0.01666, audio_tagging_loss=0.009705, over 3039286.23 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:31:34,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1484486.6666666667, ans=0.0 2023-11-21 11:31:34,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-21 11:31:56,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1484620.0, ans=0.1 2023-11-21 11:32:00,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1484620.0, ans=0.125 2023-11-21 11:32:01,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222700 2023-11-21 11:32:06,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1484686.6666666667, ans=0.125 2023-11-21 11:32:14,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2023-11-21 11:32:25,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1484753.3333333333, ans=0.0 2023-11-21 11:32:33,430 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6300, loss[loss=0.07631, simple_loss=0.0882, pruned_loss=0.01865, audio_tagging_loss=0.01356, over 13365.00 frames. ], tot_loss[loss=0.07372, simple_loss=0.09478, pruned_loss=0.01651, audio_tagging_loss=0.009823, over 3038848.15 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:32:35,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-21 11:32:38,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-21 11:33:02,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1484953.3333333333, ans=0.0 2023-11-21 11:33:03,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.093e+01 8.788e+01 9.540e+01 1.292e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 11:33:05,081 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222750 2023-11-21 11:33:18,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-11-21 11:33:21,489 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:33:37,369 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6350, loss[loss=0.07468, simple_loss=0.09562, pruned_loss=0.01609, audio_tagging_loss=0.01078, over 17387.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09475, pruned_loss=0.01646, audio_tagging_loss=0.009858, over 3042087.66 frames. ], batch size: 67, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:33:51,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1485220.0, ans=0.0 2023-11-21 11:34:03,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1485286.6666666667, ans=0.125 2023-11-21 11:34:09,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=15.0 2023-11-21 11:34:09,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222800 2023-11-21 11:34:15,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1485353.3333333333, ans=0.5 2023-11-21 11:34:31,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1485420.0, ans=15.0 2023-11-21 11:34:41,964 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6400, loss[loss=0.08053, simple_loss=0.09371, pruned_loss=0.02218, audio_tagging_loss=0.0115, over 15613.00 frames. ], tot_loss[loss=0.07435, simple_loss=0.09567, pruned_loss=0.01663, audio_tagging_loss=0.009889, over 3043184.68 frames. ], batch size: 63, lr: 3.61e-03, grad_scale: 32.0 2023-11-21 11:34:47,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1485486.6666666667, ans=0.1 2023-11-21 11:34:59,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1485553.3333333333, ans=0.0 2023-11-21 11:35:12,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.327e+01 9.131e+01 1.015e+02 1.534e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-21 11:35:13,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222850 2023-11-21 11:35:40,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1485753.3333333333, ans=0.2 2023-11-21 11:35:46,464 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6450, loss[loss=0.08873, simple_loss=0.1179, pruned_loss=0.02352, audio_tagging_loss=0.006245, over 14249.00 frames. ], tot_loss[loss=0.07511, simple_loss=0.0966, pruned_loss=0.01693, audio_tagging_loss=0.009875, over 3041543.38 frames. ], batch size: 54, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:36:18,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222900 2023-11-21 11:36:18,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1485953.3333333333, ans=0.125 2023-11-21 11:36:19,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1485953.3333333333, ans=0.0 2023-11-21 11:36:45,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1486086.6666666667, ans=0.0 2023-11-21 11:36:50,036 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6500, loss[loss=0.05961, simple_loss=0.07222, pruned_loss=0.01146, audio_tagging_loss=0.01205, over 13471.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09682, pruned_loss=0.01694, audio_tagging_loss=0.009812, over 3042666.41 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 16.0 2023-11-21 11:36:59,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1486153.3333333333, ans=0.0 2023-11-21 11:37:16,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1486286.6666666667, ans=0.035 2023-11-21 11:37:23,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.393e+01 8.000e+01 8.623e+01 9.429e+01 1.650e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-21 11:37:23,184 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 222950 2023-11-21 11:37:29,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1486353.3333333333, ans=0.125 2023-11-21 11:37:34,294 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:37:34,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1486353.3333333333, ans=0.04949747468305833 2023-11-21 11:37:36,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1486353.3333333333, ans=0.125 2023-11-21 11:37:43,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1486420.0, ans=0.125 2023-11-21 11:37:54,457 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6550, loss[loss=0.07583, simple_loss=0.09482, pruned_loss=0.01773, audio_tagging_loss=0.01069, over 15319.00 frames. ], tot_loss[loss=0.07536, simple_loss=0.09752, pruned_loss=0.01701, audio_tagging_loss=0.009599, over 3042519.53 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:38:08,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1486553.3333333333, ans=0.125 2023-11-21 11:38:17,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1486553.3333333333, ans=0.125 2023-11-21 11:38:26,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1486620.0, ans=0.125 2023-11-21 11:38:27,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223000 2023-11-21 11:38:27,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1486620.0, ans=0.1 2023-11-21 11:38:28,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486620.0, ans=0.1 2023-11-21 11:39:00,748 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6600, loss[loss=0.05815, simple_loss=0.07974, pruned_loss=0.007354, audio_tagging_loss=0.01093, over 14309.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09673, pruned_loss=0.01685, audio_tagging_loss=0.009517, over 3046026.44 frames. ], batch size: 53, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:39:01,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2023-11-21 11:39:12,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:39:13,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1486886.6666666667, ans=0.125 2023-11-21 11:39:20,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-21 11:39:24,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.04 vs. limit=22.5 2023-11-21 11:39:32,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.360e+01 8.932e+01 9.626e+01 1.286e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-21 11:39:32,300 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223050 2023-11-21 11:39:33,787 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:39:44,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1487020.0, ans=0.09899494936611666 2023-11-21 11:39:52,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-21 11:39:59,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2023-11-21 11:40:04,835 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6650, loss[loss=0.06439, simple_loss=0.08983, pruned_loss=0.01026, audio_tagging_loss=0.009213, over 15007.00 frames. ], tot_loss[loss=0.07436, simple_loss=0.0963, pruned_loss=0.01675, audio_tagging_loss=0.009461, over 3047464.66 frames. ], batch size: 55, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:40:18,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1487220.0, ans=0.0 2023-11-21 11:40:21,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1487220.0, ans=0.125 2023-11-21 11:40:21,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1487220.0, ans=0.125 2023-11-21 11:40:32,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2023-11-21 11:40:37,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223100 2023-11-21 11:40:38,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1487286.6666666667, ans=0.0 2023-11-21 11:40:54,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1487353.3333333333, ans=0.1 2023-11-21 11:40:57,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-21 11:41:09,293 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6700, loss[loss=0.09578, simple_loss=0.1293, pruned_loss=0.02023, audio_tagging_loss=0.01091, over 16052.00 frames. ], tot_loss[loss=0.07407, simple_loss=0.0958, pruned_loss=0.0167, audio_tagging_loss=0.009475, over 3043391.58 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:41:30,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1487553.3333333333, ans=0.125 2023-11-21 11:41:42,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.137e+01 8.757e+01 9.597e+01 1.374e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 11:41:42,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223150 2023-11-21 11:42:07,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-21 11:42:10,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1487753.3333333333, ans=0.125 2023-11-21 11:42:14,856 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6750, loss[loss=0.07907, simple_loss=0.101, pruned_loss=0.01735, audio_tagging_loss=0.01121, over 14733.00 frames. ], tot_loss[loss=0.0739, simple_loss=0.09542, pruned_loss=0.01667, audio_tagging_loss=0.009517, over 3037576.43 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:42:18,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1487820.0, ans=0.125 2023-11-21 11:42:19,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1487820.0, ans=0.0 2023-11-21 11:42:21,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1487820.0, ans=0.0 2023-11-21 11:42:22,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1487820.0, ans=0.0 2023-11-21 11:42:36,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.81 vs. limit=10.0 2023-11-21 11:42:46,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223200 2023-11-21 11:42:58,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1488020.0, ans=0.125 2023-11-21 11:43:15,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1488086.6666666667, ans=0.125 2023-11-21 11:43:20,358 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6800, loss[loss=0.05862, simple_loss=0.07213, pruned_loss=0.01128, audio_tagging_loss=0.01128, over 16594.00 frames. ], tot_loss[loss=0.07414, simple_loss=0.09552, pruned_loss=0.01686, audio_tagging_loss=0.009512, over 3038710.83 frames. ], batch size: 63, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:43:24,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2023-11-21 11:43:33,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1488220.0, ans=0.125 2023-11-21 11:43:38,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1488220.0, ans=0.09899494936611666 2023-11-21 11:43:38,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1488220.0, ans=0.0 2023-11-21 11:43:41,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1488220.0, ans=0.1 2023-11-21 11:43:51,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.121e+01 8.832e+01 9.588e+01 1.152e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-21 11:43:51,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223250 2023-11-21 11:43:52,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2023-11-21 11:43:56,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-11-21 11:43:57,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1488353.3333333333, ans=0.125 2023-11-21 11:44:23,447 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6850, loss[loss=0.0759, simple_loss=0.09863, pruned_loss=0.01751, audio_tagging_loss=0.009075, over 15719.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09586, pruned_loss=0.0168, audio_tagging_loss=0.009551, over 3038238.40 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:44:31,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1488486.6666666667, ans=0.0 2023-11-21 11:44:32,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-21 11:44:42,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1488553.3333333333, ans=0.1 2023-11-21 11:44:45,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1488553.3333333333, ans=0.125 2023-11-21 11:44:56,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223300 2023-11-21 11:45:04,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1488686.6666666667, ans=0.0 2023-11-21 11:45:14,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1488753.3333333333, ans=0.125 2023-11-21 11:45:22,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1488753.3333333333, ans=0.125 2023-11-21 11:45:27,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1488820.0, ans=0.1 2023-11-21 11:45:29,253 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6900, loss[loss=0.07759, simple_loss=0.09681, pruned_loss=0.0208, audio_tagging_loss=0.008392, over 13789.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09639, pruned_loss=0.01676, audio_tagging_loss=0.009454, over 3042273.44 frames. ], batch size: 54, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:45:30,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1488820.0, ans=0.125 2023-11-21 11:45:37,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-21 11:45:49,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1488886.6666666667, ans=0.1 2023-11-21 11:45:52,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1488886.6666666667, ans=0.0 2023-11-21 11:45:53,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1488953.3333333333, ans=0.2 2023-11-21 11:46:00,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.015e+01 8.658e+01 9.630e+01 1.322e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 11:46:00,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223350 2023-11-21 11:46:08,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1489020.0, ans=0.0 2023-11-21 11:46:19,850 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 11:46:20,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1489086.6666666667, ans=0.125 2023-11-21 11:46:34,100 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 6950, loss[loss=0.07595, simple_loss=0.1026, pruned_loss=0.01559, audio_tagging_loss=0.00908, over 15210.00 frames. ], tot_loss[loss=0.07459, simple_loss=0.09694, pruned_loss=0.01667, audio_tagging_loss=0.009453, over 3047408.00 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:46:37,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1489153.3333333333, ans=0.125 2023-11-21 11:46:53,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1489220.0, ans=0.125 2023-11-21 11:47:00,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1489286.6666666667, ans=0.1 2023-11-21 11:47:06,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223400 2023-11-21 11:47:15,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1489353.3333333333, ans=0.125 2023-11-21 11:47:25,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-11-21 11:47:38,462 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7000, loss[loss=0.06567, simple_loss=0.08304, pruned_loss=0.01343, audio_tagging_loss=0.01071, over 15542.00 frames. ], tot_loss[loss=0.07526, simple_loss=0.09784, pruned_loss=0.01686, audio_tagging_loss=0.009483, over 3045364.00 frames. ], batch size: 59, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:47:51,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1489553.3333333333, ans=0.125 2023-11-21 11:48:06,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1489620.0, ans=0.125 2023-11-21 11:48:10,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 7.954e+01 8.531e+01 9.123e+01 1.113e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-21 11:48:10,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223450 2023-11-21 11:48:20,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1489686.6666666667, ans=0.0 2023-11-21 11:48:29,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1489753.3333333333, ans=0.125 2023-11-21 11:48:42,243 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7050, loss[loss=0.05526, simple_loss=0.06713, pruned_loss=0.0104, audio_tagging_loss=0.01129, over 14253.00 frames. ], tot_loss[loss=0.07419, simple_loss=0.0961, pruned_loss=0.01657, audio_tagging_loss=0.009572, over 3036899.01 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:48:44,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1489820.0, ans=0.2 2023-11-21 11:49:00,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1489886.6666666667, ans=0.0 2023-11-21 11:49:09,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1489953.3333333333, ans=0.1 2023-11-21 11:49:13,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223500 2023-11-21 11:49:15,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2023-11-21 11:49:46,551 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7100, loss[loss=0.08508, simple_loss=0.1209, pruned_loss=0.01841, audio_tagging_loss=0.006202, over 15154.00 frames. ], tot_loss[loss=0.07357, simple_loss=0.09516, pruned_loss=0.01632, audio_tagging_loss=0.009667, over 3042940.26 frames. ], batch size: 54, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:49:52,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1490153.3333333333, ans=0.07 2023-11-21 11:50:00,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2023-11-21 11:50:09,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1490286.6666666667, ans=0.125 2023-11-21 11:50:10,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1490286.6666666667, ans=0.1 2023-11-21 11:50:11,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-21 11:50:15,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1490286.6666666667, ans=0.125 2023-11-21 11:50:17,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223550 2023-11-21 11:50:19,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 8.140e+01 8.741e+01 9.619e+01 1.480e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 11:50:19,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1490286.6666666667, ans=0.04949747468305833 2023-11-21 11:50:31,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1490353.3333333333, ans=0.0 2023-11-21 11:50:36,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1490420.0, ans=0.125 2023-11-21 11:50:37,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1490420.0, ans=0.2 2023-11-21 11:50:38,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1490420.0, ans=0.2 2023-11-21 11:50:46,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-11-21 11:50:49,607 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7150, loss[loss=0.07769, simple_loss=0.1065, pruned_loss=0.01653, audio_tagging_loss=0.007912, over 15664.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09644, pruned_loss=0.01656, audio_tagging_loss=0.009652, over 3040389.70 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:51:02,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1490553.3333333333, ans=0.125 2023-11-21 11:51:22,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223600 2023-11-21 11:51:50,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1490753.3333333333, ans=0.125 2023-11-21 11:51:53,994 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7200, loss[loss=0.06531, simple_loss=0.08944, pruned_loss=0.01255, audio_tagging_loss=0.008033, over 15705.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09615, pruned_loss=0.01644, audio_tagging_loss=0.009725, over 3046928.91 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:52:05,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2023-11-21 11:52:19,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1490953.3333333333, ans=0.125 2023-11-21 11:52:26,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223650 2023-11-21 11:52:26,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-21 11:52:27,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-21 11:52:27,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.414e+01 8.953e+01 9.739e+01 1.219e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-21 11:52:31,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1491020.0, ans=0.1 2023-11-21 11:52:58,683 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7250, loss[loss=0.08136, simple_loss=0.09623, pruned_loss=0.02241, audio_tagging_loss=0.01083, over 14526.00 frames. ], tot_loss[loss=0.07486, simple_loss=0.09673, pruned_loss=0.01677, audio_tagging_loss=0.009728, over 3043579.14 frames. ], batch size: 55, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:53:07,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1491153.3333333333, ans=0.125 2023-11-21 11:53:07,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1491153.3333333333, ans=0.125 2023-11-21 11:53:08,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1491153.3333333333, ans=0.125 2023-11-21 11:53:10,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1491220.0, ans=0.1 2023-11-21 11:53:17,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1491220.0, ans=0.0 2023-11-21 11:53:30,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223700 2023-11-21 11:53:40,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1491353.3333333333, ans=0.1 2023-11-21 11:53:56,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1491420.0, ans=0.125 2023-11-21 11:54:02,169 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7300, loss[loss=0.1027, simple_loss=0.1483, pruned_loss=0.02485, audio_tagging_loss=0.003664, over 16180.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09638, pruned_loss=0.01661, audio_tagging_loss=0.00961, over 3046943.26 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:54:21,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1491553.3333333333, ans=0.0 2023-11-21 11:54:34,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223750 2023-11-21 11:54:35,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.733e+01 8.195e+01 8.693e+01 9.496e+01 1.146e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 11:54:37,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2023-11-21 11:54:38,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1491620.0, ans=0.0 2023-11-21 11:54:45,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1491686.6666666667, ans=0.125 2023-11-21 11:55:01,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1491753.3333333333, ans=0.125 2023-11-21 11:55:05,915 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7350, loss[loss=0.06994, simple_loss=0.09124, pruned_loss=0.01638, audio_tagging_loss=0.007939, over 14724.00 frames. ], tot_loss[loss=0.07495, simple_loss=0.09728, pruned_loss=0.01688, audio_tagging_loss=0.009432, over 3044708.02 frames. ], batch size: 53, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 11:55:11,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1491820.0, ans=0.125 2023-11-21 11:55:16,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1491820.0, ans=0.125 2023-11-21 11:55:20,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1491886.6666666667, ans=0.125 2023-11-21 11:55:27,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1491886.6666666667, ans=0.125 2023-11-21 11:55:30,619 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 11:55:38,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223800 2023-11-21 11:55:42,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1491953.3333333333, ans=0.0 2023-11-21 11:55:54,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1492020.0, ans=0.125 2023-11-21 11:55:59,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1492086.6666666667, ans=0.2 2023-11-21 11:56:11,500 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7400, loss[loss=0.07008, simple_loss=0.0875, pruned_loss=0.01525, audio_tagging_loss=0.01108, over 13834.00 frames. ], tot_loss[loss=0.07429, simple_loss=0.09639, pruned_loss=0.0167, audio_tagging_loss=0.009393, over 3040561.36 frames. ], batch size: 52, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:56:15,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1492153.3333333333, ans=0.125 2023-11-21 11:56:43,069 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223850 2023-11-21 11:56:45,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 7.994e+01 8.788e+01 9.602e+01 2.162e+02, threshold=1.758e+02, percent-clipped=1.0 2023-11-21 11:56:46,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2023-11-21 11:56:58,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1492353.3333333333, ans=0.1 2023-11-21 11:57:12,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-11-21 11:57:15,933 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7450, loss[loss=0.07964, simple_loss=0.1051, pruned_loss=0.01738, audio_tagging_loss=0.009736, over 14975.00 frames. ], tot_loss[loss=0.07419, simple_loss=0.09621, pruned_loss=0.0167, audio_tagging_loss=0.009382, over 3033849.98 frames. ], batch size: 55, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:57:19,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1492486.6666666667, ans=0.125 2023-11-21 11:57:29,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1492553.3333333333, ans=0.0 2023-11-21 11:57:39,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-21 11:57:47,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1492620.0, ans=0.125 2023-11-21 11:57:48,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223900 2023-11-21 11:58:11,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1492753.3333333333, ans=0.0 2023-11-21 11:58:19,354 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7500, loss[loss=0.06372, simple_loss=0.07989, pruned_loss=0.01297, audio_tagging_loss=0.0108, over 14006.00 frames. ], tot_loss[loss=0.07423, simple_loss=0.09623, pruned_loss=0.01674, audio_tagging_loss=0.009386, over 3036013.64 frames. ], batch size: 53, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:58:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1492820.0, ans=0.125 2023-11-21 11:58:34,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1492886.6666666667, ans=0.1 2023-11-21 11:58:37,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1492886.6666666667, ans=0.2 2023-11-21 11:58:40,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1492886.6666666667, ans=0.125 2023-11-21 11:58:51,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 223950 2023-11-21 11:58:54,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.004e+01 8.893e+01 9.495e+01 1.256e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-21 11:59:16,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1493086.6666666667, ans=0.125 2023-11-21 11:59:24,178 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7550, loss[loss=0.07529, simple_loss=0.09846, pruned_loss=0.01629, audio_tagging_loss=0.009772, over 15866.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.09479, pruned_loss=0.01646, audio_tagging_loss=0.009453, over 3039614.97 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 16.0 2023-11-21 11:59:44,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1493220.0, ans=0.0 2023-11-21 11:59:45,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1493220.0, ans=0.125 2023-11-21 11:59:55,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224000 2023-11-21 12:00:10,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1493353.3333333333, ans=0.125 2023-11-21 12:00:14,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1493353.3333333333, ans=0.2 2023-11-21 12:00:29,970 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7600, loss[loss=0.07843, simple_loss=0.106, pruned_loss=0.01445, audio_tagging_loss=0.01097, over 13560.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09521, pruned_loss=0.01661, audio_tagging_loss=0.009451, over 3034816.35 frames. ], batch size: 53, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 12:00:43,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-21 12:00:47,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-21 12:00:57,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=8.0 2023-11-21 12:01:02,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224050 2023-11-21 12:01:05,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.694e+01 8.103e+01 8.665e+01 9.260e+01 1.158e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 12:01:11,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1493686.6666666667, ans=0.125 2023-11-21 12:01:21,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1493753.3333333333, ans=0.0 2023-11-21 12:01:27,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1493753.3333333333, ans=0.0 2023-11-21 12:01:28,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1493753.3333333333, ans=0.95 2023-11-21 12:01:28,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1493753.3333333333, ans=0.0 2023-11-21 12:01:30,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1493753.3333333333, ans=0.125 2023-11-21 12:01:32,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2023-11-21 12:01:34,037 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7650, loss[loss=0.05728, simple_loss=0.06981, pruned_loss=0.0131, audio_tagging_loss=0.009269, over 14544.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09485, pruned_loss=0.01648, audio_tagging_loss=0.009529, over 3039116.82 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 12:01:41,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1493820.0, ans=0.125 2023-11-21 12:01:43,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1493820.0, ans=0.1 2023-11-21 12:02:00,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1493953.3333333333, ans=0.0 2023-11-21 12:02:06,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224100 2023-11-21 12:02:10,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1493953.3333333333, ans=0.125 2023-11-21 12:02:12,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1494020.0, ans=0.125 2023-11-21 12:02:32,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1494086.6666666667, ans=0.125 2023-11-21 12:02:38,302 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7700, loss[loss=0.06488, simple_loss=0.08606, pruned_loss=0.01264, audio_tagging_loss=0.009215, over 15528.00 frames. ], tot_loss[loss=0.07348, simple_loss=0.09487, pruned_loss=0.01651, audio_tagging_loss=0.009536, over 3036340.98 frames. ], batch size: 57, lr: 3.60e-03, grad_scale: 32.0 2023-11-21 12:03:09,887 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224150 2023-11-21 12:03:12,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.308e+01 8.854e+01 9.589e+01 1.258e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-21 12:03:12,589 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:03:28,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1494420.0, ans=0.0 2023-11-21 12:03:41,680 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7750, loss[loss=0.06673, simple_loss=0.08774, pruned_loss=0.01196, audio_tagging_loss=0.0109, over 15087.00 frames. ], tot_loss[loss=0.07401, simple_loss=0.09575, pruned_loss=0.01663, audio_tagging_loss=0.009501, over 3033782.69 frames. ], batch size: 57, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:03:43,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1494486.6666666667, ans=0.125 2023-11-21 12:04:14,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224200 2023-11-21 12:04:24,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1494686.6666666667, ans=0.125 2023-11-21 12:04:31,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1494753.3333333333, ans=0.125 2023-11-21 12:04:34,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-21 12:04:42,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=10.0 2023-11-21 12:04:45,928 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7800, loss[loss=0.07674, simple_loss=0.09469, pruned_loss=0.01969, audio_tagging_loss=0.009703, over 15377.00 frames. ], tot_loss[loss=0.07434, simple_loss=0.09588, pruned_loss=0.01688, audio_tagging_loss=0.009524, over 3032454.82 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:04:48,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1494820.0, ans=0.125 2023-11-21 12:04:56,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=12.0 2023-11-21 12:05:04,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1494886.6666666667, ans=0.125 2023-11-21 12:05:17,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1494953.3333333333, ans=0.125 2023-11-21 12:05:18,303 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224250 2023-11-21 12:05:20,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2023-11-21 12:05:21,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.850e+01 8.303e+01 8.766e+01 9.637e+01 1.238e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 12:05:31,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1495020.0, ans=0.0 2023-11-21 12:05:50,430 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7850, loss[loss=0.07049, simple_loss=0.09901, pruned_loss=0.01284, audio_tagging_loss=0.008143, over 15801.00 frames. ], tot_loss[loss=0.07458, simple_loss=0.09633, pruned_loss=0.01681, audio_tagging_loss=0.009607, over 3040919.35 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:06:04,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1495220.0, ans=0.125 2023-11-21 12:06:11,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1495220.0, ans=0.0 2023-11-21 12:06:14,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1495286.6666666667, ans=0.2 2023-11-21 12:06:17,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-21 12:06:21,423 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224300 2023-11-21 12:06:40,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1495420.0, ans=0.0 2023-11-21 12:06:44,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-21 12:06:52,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1495486.6666666667, ans=0.125 2023-11-21 12:06:53,987 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7900, loss[loss=0.06171, simple_loss=0.07593, pruned_loss=0.009934, audio_tagging_loss=0.01382, over 15626.00 frames. ], tot_loss[loss=0.07514, simple_loss=0.09707, pruned_loss=0.01695, audio_tagging_loss=0.009657, over 3036062.32 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:06:55,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-21 12:07:01,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1495486.6666666667, ans=0.125 2023-11-21 12:07:08,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1495553.3333333333, ans=0.125 2023-11-21 12:07:12,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1495553.3333333333, ans=0.2 2023-11-21 12:07:26,396 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224350 2023-11-21 12:07:29,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.536e+01 8.237e+01 8.708e+01 9.758e+01 1.228e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-21 12:07:34,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.11 vs. limit=6.0 2023-11-21 12:07:57,008 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 7950, loss[loss=0.0774, simple_loss=0.1012, pruned_loss=0.01552, audio_tagging_loss=0.01131, over 15037.00 frames. ], tot_loss[loss=0.07534, simple_loss=0.09721, pruned_loss=0.017, audio_tagging_loss=0.009736, over 3041445.54 frames. ], batch size: 57, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:08:14,115 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:08:26,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-11-21 12:08:29,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224400 2023-11-21 12:08:30,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1495953.3333333333, ans=0.125 2023-11-21 12:08:34,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1496020.0, ans=0.2 2023-11-21 12:08:42,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1496020.0, ans=0.2 2023-11-21 12:08:45,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1496020.0, ans=0.035 2023-11-21 12:08:55,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1496086.6666666667, ans=0.0 2023-11-21 12:08:57,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1496086.6666666667, ans=0.0 2023-11-21 12:08:58,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2023-11-21 12:09:01,119 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8000, loss[loss=0.06244, simple_loss=0.07506, pruned_loss=0.01357, audio_tagging_loss=0.01133, over 14534.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.09657, pruned_loss=0.01708, audio_tagging_loss=0.009816, over 3045549.67 frames. ], batch size: 57, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:09:02,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1496153.3333333333, ans=0.125 2023-11-21 12:09:21,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2023-11-21 12:09:32,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224450 2023-11-21 12:09:33,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-21 12:09:35,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.210e+01 8.871e+01 9.654e+01 1.306e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 12:09:46,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1496353.3333333333, ans=0.125 2023-11-21 12:09:47,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1496353.3333333333, ans=0.1 2023-11-21 12:09:49,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1496353.3333333333, ans=0.1 2023-11-21 12:10:04,406 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8050, loss[loss=0.06273, simple_loss=0.08188, pruned_loss=0.01149, audio_tagging_loss=0.0103, over 15897.00 frames. ], tot_loss[loss=0.0755, simple_loss=0.09678, pruned_loss=0.01711, audio_tagging_loss=0.01001, over 3052675.27 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:10:29,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-11-21 12:10:32,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1496620.0, ans=0.0 2023-11-21 12:10:35,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224500 2023-11-21 12:11:07,052 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8100, loss[loss=0.08039, simple_loss=0.1057, pruned_loss=0.02068, audio_tagging_loss=0.006832, over 15741.00 frames. ], tot_loss[loss=0.07625, simple_loss=0.09811, pruned_loss=0.01745, audio_tagging_loss=0.009748, over 3053706.91 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:11:07,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2023-11-21 12:11:09,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1496820.0, ans=0.125 2023-11-21 12:11:28,449 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:11:37,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1496953.3333333333, ans=0.0 2023-11-21 12:11:38,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1496953.3333333333, ans=0.5 2023-11-21 12:11:39,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224550 2023-11-21 12:11:43,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.553e+01 8.024e+01 8.529e+01 9.304e+01 1.158e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-21 12:12:10,670 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8150, loss[loss=0.08164, simple_loss=0.1063, pruned_loss=0.01808, audio_tagging_loss=0.01041, over 15104.00 frames. ], tot_loss[loss=0.07638, simple_loss=0.09887, pruned_loss=0.01739, audio_tagging_loss=0.009559, over 3057870.32 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:12:20,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1497153.3333333333, ans=0.0 2023-11-21 12:12:38,125 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:12:42,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224600 2023-11-21 12:13:07,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1497420.0, ans=0.1 2023-11-21 12:13:12,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1497420.0, ans=0.125 2023-11-21 12:13:15,662 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8200, loss[loss=0.06273, simple_loss=0.08001, pruned_loss=0.01348, audio_tagging_loss=0.009246, over 14717.00 frames. ], tot_loss[loss=0.07574, simple_loss=0.09838, pruned_loss=0.01715, audio_tagging_loss=0.009399, over 3051398.50 frames. ], batch size: 55, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:13:16,946 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:13:19,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1497486.6666666667, ans=0.125 2023-11-21 12:13:29,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-21 12:13:32,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1497553.3333333333, ans=0.2 2023-11-21 12:13:38,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1497553.3333333333, ans=0.125 2023-11-21 12:13:40,315 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:13:42,765 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.577e-03 2023-11-21 12:13:45,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1497620.0, ans=0.09899494936611666 2023-11-21 12:13:46,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224650 2023-11-21 12:13:52,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.414e+01 8.039e+01 8.805e+01 9.521e+01 1.121e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 12:13:52,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2023-11-21 12:14:02,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1497686.6666666667, ans=0.125 2023-11-21 12:14:10,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1497753.3333333333, ans=0.0 2023-11-21 12:14:18,860 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8250, loss[loss=0.04883, simple_loss=0.05857, pruned_loss=0.007967, audio_tagging_loss=0.01158, over 14407.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09646, pruned_loss=0.01672, audio_tagging_loss=0.009445, over 3046207.97 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:14:26,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1497820.0, ans=0.0 2023-11-21 12:14:27,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1497820.0, ans=0.125 2023-11-21 12:14:34,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1497886.6666666667, ans=0.125 2023-11-21 12:14:41,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1497886.6666666667, ans=0.2 2023-11-21 12:14:46,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1497953.3333333333, ans=0.125 2023-11-21 12:14:48,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1497953.3333333333, ans=15.0 2023-11-21 12:14:51,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224700 2023-11-21 12:14:51,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1497953.3333333333, ans=0.125 2023-11-21 12:15:22,099 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8300, loss[loss=0.08126, simple_loss=0.105, pruned_loss=0.01959, audio_tagging_loss=0.009177, over 15186.00 frames. ], tot_loss[loss=0.07414, simple_loss=0.09598, pruned_loss=0.01669, audio_tagging_loss=0.009455, over 3043599.52 frames. ], batch size: 54, lr: 3.59e-03, grad_scale: 8.0 2023-11-21 12:15:23,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1498153.3333333333, ans=0.0 2023-11-21 12:15:40,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-21 12:15:52,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1498286.6666666667, ans=0.2 2023-11-21 12:15:54,098 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224750 2023-11-21 12:15:57,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1498286.6666666667, ans=0.07 2023-11-21 12:16:00,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.024e+01 7.847e+01 8.529e+01 9.488e+01 1.210e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-21 12:16:12,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-11-21 12:16:27,135 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8350, loss[loss=0.079, simple_loss=0.1014, pruned_loss=0.01567, audio_tagging_loss=0.01263, over 16700.00 frames. ], tot_loss[loss=0.07416, simple_loss=0.0962, pruned_loss=0.01663, audio_tagging_loss=0.009426, over 3052179.28 frames. ], batch size: 62, lr: 3.59e-03, grad_scale: 8.0 2023-11-21 12:16:58,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224800 2023-11-21 12:17:13,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1498686.6666666667, ans=0.07 2023-11-21 12:17:17,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1498753.3333333333, ans=0.0 2023-11-21 12:17:23,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1498753.3333333333, ans=0.125 2023-11-21 12:17:23,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1498753.3333333333, ans=0.125 2023-11-21 12:17:30,426 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8400, loss[loss=0.09413, simple_loss=0.1298, pruned_loss=0.02296, audio_tagging_loss=0.006276, over 15376.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09599, pruned_loss=0.01655, audio_tagging_loss=0.009424, over 3055921.37 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:17:37,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2023-11-21 12:17:53,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1498886.6666666667, ans=0.125 2023-11-21 12:18:03,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224850 2023-11-21 12:18:09,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.252e+01 9.012e+01 9.570e+01 1.298e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-21 12:18:18,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1499020.0, ans=0.125 2023-11-21 12:18:27,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1499086.6666666667, ans=0.125 2023-11-21 12:18:34,191 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8450, loss[loss=0.06429, simple_loss=0.07456, pruned_loss=0.01547, audio_tagging_loss=0.01155, over 14756.00 frames. ], tot_loss[loss=0.07371, simple_loss=0.09539, pruned_loss=0.01657, audio_tagging_loss=0.009448, over 3054346.97 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:18:34,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1499153.3333333333, ans=0.0 2023-11-21 12:18:37,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1499153.3333333333, ans=0.125 2023-11-21 12:18:41,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1499153.3333333333, ans=0.125 2023-11-21 12:18:45,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1499153.3333333333, ans=0.2 2023-11-21 12:18:58,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1499220.0, ans=0.0 2023-11-21 12:19:06,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-21 12:19:06,952 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224900 2023-11-21 12:19:18,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1499353.3333333333, ans=0.125 2023-11-21 12:19:21,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-21 12:19:39,611 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8500, loss[loss=0.06536, simple_loss=0.07501, pruned_loss=0.01364, audio_tagging_loss=0.01421, over 14314.00 frames. ], tot_loss[loss=0.074, simple_loss=0.09572, pruned_loss=0.01651, audio_tagging_loss=0.009629, over 3052377.62 frames. ], batch size: 54, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:19:52,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1499553.3333333333, ans=0.125 2023-11-21 12:19:56,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1499553.3333333333, ans=0.125 2023-11-21 12:19:56,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1499553.3333333333, ans=0.0 2023-11-21 12:19:58,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1499553.3333333333, ans=0.125 2023-11-21 12:20:06,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1499620.0, ans=0.0 2023-11-21 12:20:08,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-21 12:20:11,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 224950 2023-11-21 12:20:13,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1499620.0, ans=0.2 2023-11-21 12:20:17,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 8.392e+01 8.931e+01 9.765e+01 1.184e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-21 12:20:24,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1499686.6666666667, ans=0.1 2023-11-21 12:20:44,386 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8550, loss[loss=0.05367, simple_loss=0.06072, pruned_loss=0.01202, audio_tagging_loss=0.0113, over 14969.00 frames. ], tot_loss[loss=0.07374, simple_loss=0.09536, pruned_loss=0.01643, audio_tagging_loss=0.009626, over 3056049.13 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:20:45,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1499820.0, ans=0.0 2023-11-21 12:20:50,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1499820.0, ans=0.0 2023-11-21 12:20:55,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1499886.6666666667, ans=0.125 2023-11-21 12:20:58,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1499886.6666666667, ans=0.0 2023-11-21 12:21:16,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225000 2023-11-21 12:21:19,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1499953.3333333333, ans=0.1 2023-11-21 12:21:28,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1500020.0, ans=0.05 2023-11-21 12:21:31,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1500020.0, ans=0.125 2023-11-21 12:21:33,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1500020.0, ans=0.125 2023-11-21 12:21:48,081 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8600, loss[loss=0.06119, simple_loss=0.07325, pruned_loss=0.01122, audio_tagging_loss=0.01335, over 15616.00 frames. ], tot_loss[loss=0.07355, simple_loss=0.09503, pruned_loss=0.01629, audio_tagging_loss=0.009748, over 3049052.70 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:22:21,266 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225050 2023-11-21 12:22:27,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.106e+01 8.713e+01 9.527e+01 1.348e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 12:22:33,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1500353.3333333333, ans=0.125 2023-11-21 12:22:35,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1500353.3333333333, ans=0.125 2023-11-21 12:22:37,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500353.3333333333, ans=0.1 2023-11-21 12:22:52,960 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8650, loss[loss=0.09193, simple_loss=0.1051, pruned_loss=0.02272, audio_tagging_loss=0.01665, over 13990.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09634, pruned_loss=0.01647, audio_tagging_loss=0.00976, over 3049251.69 frames. ], batch size: 55, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:22:57,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1500486.6666666667, ans=0.1 2023-11-21 12:22:57,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1500486.6666666667, ans=0.0 2023-11-21 12:23:00,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1500486.6666666667, ans=0.125 2023-11-21 12:23:25,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225100 2023-11-21 12:23:25,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1500620.0, ans=0.2 2023-11-21 12:23:27,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1500620.0, ans=0.1 2023-11-21 12:23:36,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1500686.6666666667, ans=0.125 2023-11-21 12:23:54,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1500753.3333333333, ans=0.125 2023-11-21 12:23:57,073 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8700, loss[loss=0.06572, simple_loss=0.07655, pruned_loss=0.01316, audio_tagging_loss=0.01428, over 14736.00 frames. ], tot_loss[loss=0.07423, simple_loss=0.09568, pruned_loss=0.01659, audio_tagging_loss=0.009801, over 3041878.95 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:23:57,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1500820.0, ans=0.0 2023-11-21 12:23:58,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-21 12:24:02,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-11-21 12:24:06,479 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:24:29,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225150 2023-11-21 12:24:33,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1500953.3333333333, ans=0.125 2023-11-21 12:24:35,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.442e+01 8.072e+01 8.843e+01 9.223e+01 1.204e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 12:24:56,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1501086.6666666667, ans=0.1 2023-11-21 12:25:00,573 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8750, loss[loss=0.07507, simple_loss=0.09678, pruned_loss=0.01416, audio_tagging_loss=0.01253, over 16245.00 frames. ], tot_loss[loss=0.07538, simple_loss=0.09718, pruned_loss=0.01693, audio_tagging_loss=0.009865, over 3043684.15 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:25:23,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1501220.0, ans=0.125 2023-11-21 12:25:30,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1501286.6666666667, ans=0.125 2023-11-21 12:25:32,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225200 2023-11-21 12:25:51,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1501420.0, ans=0.2 2023-11-21 12:26:05,339 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8800, loss[loss=0.0587, simple_loss=0.06381, pruned_loss=0.01543, audio_tagging_loss=0.01137, over 13760.00 frames. ], tot_loss[loss=0.07577, simple_loss=0.09761, pruned_loss=0.01711, audio_tagging_loss=0.009855, over 3039359.80 frames. ], batch size: 54, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:26:25,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1501553.3333333333, ans=0.1 2023-11-21 12:26:34,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-21 12:26:37,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225250 2023-11-21 12:26:43,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.062e+01 8.940e+01 9.452e+01 1.337e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-21 12:27:09,846 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8850, loss[loss=0.06286, simple_loss=0.07758, pruned_loss=0.01107, audio_tagging_loss=0.013, over 15480.00 frames. ], tot_loss[loss=0.0756, simple_loss=0.09745, pruned_loss=0.01703, audio_tagging_loss=0.009849, over 3038904.80 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 32.0 2023-11-21 12:27:17,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1501820.0, ans=0.125 2023-11-21 12:27:22,943 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:27:25,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1501886.6666666667, ans=0.0 2023-11-21 12:27:42,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225300 2023-11-21 12:27:55,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1502020.0, ans=0.1 2023-11-21 12:28:03,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1502086.6666666667, ans=0.125 2023-11-21 12:28:12,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-11-21 12:28:15,039 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8900, loss[loss=0.06992, simple_loss=0.08976, pruned_loss=0.01592, audio_tagging_loss=0.009117, over 14411.00 frames. ], tot_loss[loss=0.07532, simple_loss=0.09732, pruned_loss=0.01691, audio_tagging_loss=0.009754, over 3043761.63 frames. ], batch size: 55, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:28:32,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1502220.0, ans=0.0 2023-11-21 12:28:47,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225350 2023-11-21 12:28:54,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.160e+01 8.777e+01 9.579e+01 1.339e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 12:29:10,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1502420.0, ans=10.0 2023-11-21 12:29:20,390 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 8950, loss[loss=0.07219, simple_loss=0.09489, pruned_loss=0.01765, audio_tagging_loss=0.007092, over 14564.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.0969, pruned_loss=0.01683, audio_tagging_loss=0.009521, over 3044937.17 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:29:33,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1502553.3333333333, ans=0.1 2023-11-21 12:29:42,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-21 12:29:44,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=22.5 2023-11-21 12:29:51,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225400 2023-11-21 12:30:03,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1502686.6666666667, ans=0.125 2023-11-21 12:30:04,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1502686.6666666667, ans=0.0 2023-11-21 12:30:11,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1502753.3333333333, ans=0.125 2023-11-21 12:30:25,203 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9000, loss[loss=0.07768, simple_loss=0.09131, pruned_loss=0.02057, audio_tagging_loss=0.01146, over 15485.00 frames. ], tot_loss[loss=0.07463, simple_loss=0.09641, pruned_loss=0.01688, audio_tagging_loss=0.009538, over 3044467.27 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 16.0 2023-11-21 12:30:25,204 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 12:31:04,514 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9734, 4.9919, 5.1358, 4.9448], device='cuda:1') 2023-11-21 12:31:06,235 INFO [train_asr.py:1253] (1/4) Epoch 19, validation: loss=0.06044, simple_loss=0.05233, pruned_loss=0.005297, audio_tagging_loss=0.02898, over 4681554.00 frames. 2023-11-21 12:31:06,236 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 12:31:11,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-21 12:31:17,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-21 12:31:23,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1502886.6666666667, ans=0.1 2023-11-21 12:31:38,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225450 2023-11-21 12:31:45,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.087e+01 8.929e+01 9.690e+01 1.260e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-21 12:31:59,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-21 12:32:07,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1503086.6666666667, ans=0.0 2023-11-21 12:32:09,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1503086.6666666667, ans=0.2 2023-11-21 12:32:11,483 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9050, loss[loss=0.09803, simple_loss=0.1289, pruned_loss=0.02533, audio_tagging_loss=0.008246, over 16337.00 frames. ], tot_loss[loss=0.07541, simple_loss=0.09769, pruned_loss=0.01708, audio_tagging_loss=0.009483, over 3044546.36 frames. ], batch size: 61, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:32:25,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1503220.0, ans=0.0 2023-11-21 12:32:28,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1503220.0, ans=0.2 2023-11-21 12:32:33,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1503220.0, ans=0.125 2023-11-21 12:32:42,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225500 2023-11-21 12:33:03,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1503420.0, ans=0.125 2023-11-21 12:33:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1503420.0, ans=0.125 2023-11-21 12:33:12,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1503420.0, ans=0.125 2023-11-21 12:33:15,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-11-21 12:33:15,757 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9100, loss[loss=0.08472, simple_loss=0.1186, pruned_loss=0.01657, audio_tagging_loss=0.008862, over 15961.00 frames. ], tot_loss[loss=0.07498, simple_loss=0.09716, pruned_loss=0.01691, audio_tagging_loss=0.009482, over 3048497.83 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:33:20,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=12.0 2023-11-21 12:33:21,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2023-11-21 12:33:22,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1503486.6666666667, ans=0.04949747468305833 2023-11-21 12:33:29,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503553.3333333333, ans=0.1 2023-11-21 12:33:37,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1503553.3333333333, ans=0.125 2023-11-21 12:33:48,302 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225550 2023-11-21 12:33:50,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1503620.0, ans=0.125 2023-11-21 12:33:51,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1503620.0, ans=0.0 2023-11-21 12:33:56,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.001e+01 8.590e+01 9.455e+01 1.329e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-21 12:33:56,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1503686.6666666667, ans=0.0 2023-11-21 12:34:07,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1503753.3333333333, ans=0.0 2023-11-21 12:34:19,568 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9150, loss[loss=0.07845, simple_loss=0.1025, pruned_loss=0.01805, audio_tagging_loss=0.009127, over 16319.00 frames. ], tot_loss[loss=0.07471, simple_loss=0.09673, pruned_loss=0.01689, audio_tagging_loss=0.00945, over 3051355.41 frames. ], batch size: 62, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:34:52,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225600 2023-11-21 12:34:53,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503953.3333333333, ans=0.1 2023-11-21 12:34:57,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1504020.0, ans=0.2 2023-11-21 12:35:03,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1504020.0, ans=0.125 2023-11-21 12:35:10,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1504086.6666666667, ans=0.2 2023-11-21 12:35:24,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2023-11-21 12:35:24,757 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9200, loss[loss=0.07317, simple_loss=0.1034, pruned_loss=0.01469, audio_tagging_loss=0.006765, over 15140.00 frames. ], tot_loss[loss=0.07479, simple_loss=0.09686, pruned_loss=0.01687, audio_tagging_loss=0.009491, over 3052504.27 frames. ], batch size: 54, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:35:29,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1504153.3333333333, ans=0.5 2023-11-21 12:35:52,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1504286.6666666667, ans=0.2 2023-11-21 12:35:55,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1504286.6666666667, ans=0.0 2023-11-21 12:35:55,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1504286.6666666667, ans=0.0 2023-11-21 12:35:56,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225650 2023-11-21 12:36:03,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.007e+01 8.520e+01 9.210e+01 1.380e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-21 12:36:14,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1504353.3333333333, ans=0.1 2023-11-21 12:36:29,512 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9250, loss[loss=0.06862, simple_loss=0.08179, pruned_loss=0.01449, audio_tagging_loss=0.01323, over 14490.00 frames. ], tot_loss[loss=0.07492, simple_loss=0.09703, pruned_loss=0.01694, audio_tagging_loss=0.009462, over 3047094.44 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:36:33,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1504486.6666666667, ans=0.0 2023-11-21 12:36:41,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1504553.3333333333, ans=0.125 2023-11-21 12:36:42,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=12.0 2023-11-21 12:36:50,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1504553.3333333333, ans=0.0 2023-11-21 12:37:01,625 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225700 2023-11-21 12:37:07,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-21 12:37:21,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1504753.3333333333, ans=0.125 2023-11-21 12:37:33,062 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9300, loss[loss=0.07563, simple_loss=0.102, pruned_loss=0.01626, audio_tagging_loss=0.008376, over 15615.00 frames. ], tot_loss[loss=0.07489, simple_loss=0.0971, pruned_loss=0.01684, audio_tagging_loss=0.009503, over 3049728.03 frames. ], batch size: 59, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:37:35,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1504820.0, ans=0.125 2023-11-21 12:37:52,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1504886.6666666667, ans=0.0 2023-11-21 12:37:55,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-21 12:38:05,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225750 2023-11-21 12:38:12,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 7.914e+01 8.650e+01 9.354e+01 1.480e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 12:38:16,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1505020.0, ans=0.125 2023-11-21 12:38:18,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1505020.0, ans=0.125 2023-11-21 12:38:21,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1505020.0, ans=0.04949747468305833 2023-11-21 12:38:36,991 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9350, loss[loss=0.07778, simple_loss=0.1075, pruned_loss=0.01511, audio_tagging_loss=0.008942, over 15168.00 frames. ], tot_loss[loss=0.07408, simple_loss=0.09585, pruned_loss=0.01664, audio_tagging_loss=0.009513, over 3050766.17 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:38:40,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1505153.3333333333, ans=0.035 2023-11-21 12:38:45,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1505153.3333333333, ans=0.0 2023-11-21 12:38:53,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1505220.0, ans=0.0 2023-11-21 12:39:03,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505286.6666666667, ans=0.1 2023-11-21 12:39:09,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225800 2023-11-21 12:39:42,121 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9400, loss[loss=0.0687, simple_loss=0.08229, pruned_loss=0.01549, audio_tagging_loss=0.01207, over 15507.00 frames. ], tot_loss[loss=0.07477, simple_loss=0.09661, pruned_loss=0.01697, audio_tagging_loss=0.009495, over 3055202.27 frames. ], batch size: 61, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:39:49,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505486.6666666667, ans=0.1 2023-11-21 12:40:12,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225850 2023-11-21 12:40:13,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1505620.0, ans=0.1 2023-11-21 12:40:22,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.417e+01 9.121e+01 9.724e+01 1.188e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-21 12:40:26,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1505686.6666666667, ans=0.0 2023-11-21 12:40:35,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1505753.3333333333, ans=0.0 2023-11-21 12:40:35,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1505753.3333333333, ans=0.2 2023-11-21 12:40:39,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1505753.3333333333, ans=0.1 2023-11-21 12:40:40,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1505753.3333333333, ans=0.125 2023-11-21 12:40:40,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1505753.3333333333, ans=0.125 2023-11-21 12:40:42,763 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:40:45,231 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9450, loss[loss=0.05988, simple_loss=0.0661, pruned_loss=0.01068, audio_tagging_loss=0.01615, over 13934.00 frames. ], tot_loss[loss=0.07447, simple_loss=0.09615, pruned_loss=0.01677, audio_tagging_loss=0.009628, over 3060749.09 frames. ], batch size: 53, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:40:51,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1505820.0, ans=0.125 2023-11-21 12:40:52,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-21 12:41:14,031 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:41:17,603 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225900 2023-11-21 12:41:48,201 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9500, loss[loss=0.07375, simple_loss=0.1002, pruned_loss=0.01447, audio_tagging_loss=0.009171, over 14700.00 frames. ], tot_loss[loss=0.07393, simple_loss=0.09486, pruned_loss=0.01667, audio_tagging_loss=0.009831, over 3053669.27 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:41:48,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1506153.3333333333, ans=0.0 2023-11-21 12:41:48,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1506153.3333333333, ans=0.0 2023-11-21 12:41:58,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1506153.3333333333, ans=0.125 2023-11-21 12:42:18,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1506286.6666666667, ans=0.2 2023-11-21 12:42:20,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 225950 2023-11-21 12:42:29,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.362e+01 9.191e+01 9.965e+01 1.276e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-21 12:42:52,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1506486.6666666667, ans=0.0 2023-11-21 12:42:53,259 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9550, loss[loss=0.08887, simple_loss=0.1164, pruned_loss=0.02072, audio_tagging_loss=0.009947, over 14974.00 frames. ], tot_loss[loss=0.07427, simple_loss=0.09516, pruned_loss=0.01676, audio_tagging_loss=0.009935, over 3056939.87 frames. ], batch size: 54, lr: 3.58e-03, grad_scale: 16.0 2023-11-21 12:42:56,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1506486.6666666667, ans=0.125 2023-11-21 12:42:58,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-21 12:43:02,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1506486.6666666667, ans=0.125 2023-11-21 12:43:05,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-21 12:43:10,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1506553.3333333333, ans=0.1 2023-11-21 12:43:24,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226000 2023-11-21 12:43:24,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1506620.0, ans=0.0 2023-11-21 12:43:37,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1506686.6666666667, ans=0.0 2023-11-21 12:43:52,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1506753.3333333333, ans=0.125 2023-11-21 12:43:57,190 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9600, loss[loss=0.06499, simple_loss=0.08536, pruned_loss=0.01248, audio_tagging_loss=0.009827, over 16452.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09584, pruned_loss=0.01698, audio_tagging_loss=0.009922, over 3049928.30 frames. ], batch size: 64, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:44:11,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1506886.6666666667, ans=0.1 2023-11-21 12:44:29,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226050 2023-11-21 12:44:38,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.576e+01 7.983e+01 8.743e+01 9.613e+01 1.291e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-21 12:44:53,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1507086.6666666667, ans=0.02 2023-11-21 12:45:00,277 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9650, loss[loss=0.05133, simple_loss=0.06329, pruned_loss=0.00836, audio_tagging_loss=0.01133, over 15094.00 frames. ], tot_loss[loss=0.07472, simple_loss=0.09588, pruned_loss=0.01691, audio_tagging_loss=0.009877, over 3053185.32 frames. ], batch size: 57, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:45:00,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1507153.3333333333, ans=0.2 2023-11-21 12:45:01,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1507153.3333333333, ans=0.125 2023-11-21 12:45:12,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1507220.0, ans=0.125 2023-11-21 12:45:24,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-21 12:45:33,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226100 2023-11-21 12:45:35,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1507286.6666666667, ans=0.1 2023-11-21 12:45:53,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507420.0, ans=0.1 2023-11-21 12:46:05,003 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9700, loss[loss=0.076, simple_loss=0.1001, pruned_loss=0.0159, audio_tagging_loss=0.01004, over 15381.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09569, pruned_loss=0.01675, audio_tagging_loss=0.009731, over 3045978.63 frames. ], batch size: 59, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:46:16,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1507486.6666666667, ans=0.125 2023-11-21 12:46:17,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1507553.3333333333, ans=0.0 2023-11-21 12:46:18,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1507553.3333333333, ans=0.125 2023-11-21 12:46:23,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507553.3333333333, ans=0.1 2023-11-21 12:46:23,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507553.3333333333, ans=0.1 2023-11-21 12:46:35,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-21 12:46:36,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226150 2023-11-21 12:46:46,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 7.992e+01 8.607e+01 9.446e+01 1.380e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-21 12:46:46,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1507686.6666666667, ans=0.0 2023-11-21 12:46:56,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507753.3333333333, ans=0.1 2023-11-21 12:47:06,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507753.3333333333, ans=0.1 2023-11-21 12:47:07,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1507753.3333333333, ans=0.125 2023-11-21 12:47:09,288 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9750, loss[loss=0.07624, simple_loss=0.09389, pruned_loss=0.01783, audio_tagging_loss=0.01147, over 14882.00 frames. ], tot_loss[loss=0.0739, simple_loss=0.09533, pruned_loss=0.01656, audio_tagging_loss=0.009679, over 3044395.03 frames. ], batch size: 55, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:47:09,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1507820.0, ans=0.09899494936611666 2023-11-21 12:47:36,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1507953.3333333333, ans=0.0 2023-11-21 12:47:41,406 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226200 2023-11-21 12:48:10,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2023-11-21 12:48:13,263 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9800, loss[loss=0.08765, simple_loss=0.112, pruned_loss=0.02317, audio_tagging_loss=0.00847, over 15128.00 frames. ], tot_loss[loss=0.07435, simple_loss=0.09606, pruned_loss=0.01675, audio_tagging_loss=0.00956, over 3040751.89 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:48:15,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1508153.3333333333, ans=0.05 2023-11-21 12:48:39,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1508286.6666666667, ans=0.125 2023-11-21 12:48:45,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226250 2023-11-21 12:48:46,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1508286.6666666667, ans=0.125 2023-11-21 12:48:47,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1508286.6666666667, ans=0.0 2023-11-21 12:48:53,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 7.972e+01 8.418e+01 9.313e+01 1.119e+02, threshold=1.684e+02, percent-clipped=0.0 2023-11-21 12:48:56,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1508353.3333333333, ans=0.1 2023-11-21 12:49:09,508 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:49:12,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1508420.0, ans=0.09899494936611666 2023-11-21 12:49:17,409 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9850, loss[loss=0.07239, simple_loss=0.09651, pruned_loss=0.01495, audio_tagging_loss=0.009185, over 15333.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09618, pruned_loss=0.01675, audio_tagging_loss=0.009565, over 3042889.39 frames. ], batch size: 55, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:49:26,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1508486.6666666667, ans=0.025 2023-11-21 12:49:31,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1508553.3333333333, ans=0.09899494936611666 2023-11-21 12:49:45,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1508620.0, ans=0.125 2023-11-21 12:49:48,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226300 2023-11-21 12:49:57,939 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 12:49:59,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1508686.6666666667, ans=0.0 2023-11-21 12:50:16,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1508753.3333333333, ans=0.125 2023-11-21 12:50:17,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1508753.3333333333, ans=0.125 2023-11-21 12:50:20,969 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9900, loss[loss=0.09028, simple_loss=0.1156, pruned_loss=0.02375, audio_tagging_loss=0.008748, over 15202.00 frames. ], tot_loss[loss=0.0747, simple_loss=0.09672, pruned_loss=0.01686, audio_tagging_loss=0.009489, over 3043859.06 frames. ], batch size: 58, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:50:36,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1508886.6666666667, ans=0.125 2023-11-21 12:50:52,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1508953.3333333333, ans=0.125 2023-11-21 12:50:54,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226350 2023-11-21 12:51:02,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.161e+01 8.727e+01 9.211e+01 1.112e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 12:51:07,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-21 12:51:13,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2023-11-21 12:51:17,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1509086.6666666667, ans=0.0 2023-11-21 12:51:25,629 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 9950, loss[loss=0.06665, simple_loss=0.0884, pruned_loss=0.01405, audio_tagging_loss=0.008391, over 14657.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09593, pruned_loss=0.01676, audio_tagging_loss=0.009594, over 3041148.34 frames. ], batch size: 53, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:51:45,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1509220.0, ans=0.125 2023-11-21 12:51:57,982 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226400 2023-11-21 12:52:05,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1509353.3333333333, ans=0.1 2023-11-21 12:52:06,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1509353.3333333333, ans=0.125 2023-11-21 12:52:14,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1509353.3333333333, ans=0.125 2023-11-21 12:52:21,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1509420.0, ans=0.0 2023-11-21 12:52:21,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1509420.0, ans=0.125 2023-11-21 12:52:29,829 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10000, loss[loss=0.03775, simple_loss=0.04739, pruned_loss=0.004251, audio_tagging_loss=0.009802, over 14086.00 frames. ], tot_loss[loss=0.07358, simple_loss=0.09517, pruned_loss=0.01644, audio_tagging_loss=0.009561, over 3039275.80 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:52:32,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1509486.6666666667, ans=0.125 2023-11-21 12:52:40,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1509486.6666666667, ans=0.0 2023-11-21 12:52:43,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1509553.3333333333, ans=0.1 2023-11-21 12:52:44,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1509553.3333333333, ans=0.125 2023-11-21 12:52:55,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1509620.0, ans=0.07 2023-11-21 12:53:01,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226450 2023-11-21 12:53:10,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 7.892e+01 8.555e+01 9.256e+01 1.223e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 12:53:14,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1509686.6666666667, ans=0.2 2023-11-21 12:53:32,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-21 12:53:33,286 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10050, loss[loss=0.06989, simple_loss=0.08903, pruned_loss=0.01598, audio_tagging_loss=0.009388, over 13827.00 frames. ], tot_loss[loss=0.07448, simple_loss=0.09658, pruned_loss=0.01685, audio_tagging_loss=0.009349, over 3040724.62 frames. ], batch size: 55, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:53:40,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1509820.0, ans=0.0 2023-11-21 12:54:04,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1509953.3333333333, ans=10.0 2023-11-21 12:54:05,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226500 2023-11-21 12:54:18,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1510020.0, ans=0.0 2023-11-21 12:54:28,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-21 12:54:32,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-21 12:54:36,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-21 12:54:36,856 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10100, loss[loss=0.07661, simple_loss=0.1086, pruned_loss=0.01546, audio_tagging_loss=0.006835, over 15946.00 frames. ], tot_loss[loss=0.07438, simple_loss=0.09625, pruned_loss=0.01684, audio_tagging_loss=0.009408, over 3034407.84 frames. ], batch size: 59, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:54:38,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1510153.3333333333, ans=0.125 2023-11-21 12:54:40,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1510153.3333333333, ans=10.0 2023-11-21 12:54:40,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1510153.3333333333, ans=0.09899494936611666 2023-11-21 12:55:08,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226550 2023-11-21 12:55:12,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1510286.6666666667, ans=0.2 2023-11-21 12:55:16,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510353.3333333333, ans=0.1 2023-11-21 12:55:16,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1510353.3333333333, ans=0.2 2023-11-21 12:55:17,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.133e+01 8.640e+01 9.422e+01 1.310e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-21 12:55:27,656 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:55:40,523 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10150, loss[loss=0.0733, simple_loss=0.09009, pruned_loss=0.01771, audio_tagging_loss=0.01054, over 16083.00 frames. ], tot_loss[loss=0.0742, simple_loss=0.0957, pruned_loss=0.01678, audio_tagging_loss=0.009565, over 3035425.47 frames. ], batch size: 61, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:56:05,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1510620.0, ans=0.125 2023-11-21 12:56:09,678 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:56:12,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226600 2023-11-21 12:56:25,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1510686.6666666667, ans=0.125 2023-11-21 12:56:29,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1510686.6666666667, ans=0.0 2023-11-21 12:56:32,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1510753.3333333333, ans=0.125 2023-11-21 12:56:44,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1510820.0, ans=0.125 2023-11-21 12:56:44,944 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10200, loss[loss=0.1049, simple_loss=0.1423, pruned_loss=0.02631, audio_tagging_loss=0.007393, over 16264.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09527, pruned_loss=0.01671, audio_tagging_loss=0.009638, over 3043741.21 frames. ], batch size: 60, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:56:48,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-11-21 12:57:07,530 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 12:57:15,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510953.3333333333, ans=0.1 2023-11-21 12:57:16,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226650 2023-11-21 12:57:26,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 7.915e+01 8.561e+01 9.293e+01 1.189e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 12:57:45,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-21 12:57:48,036 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10250, loss[loss=0.05485, simple_loss=0.06444, pruned_loss=0.01084, audio_tagging_loss=0.01179, over 14423.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09527, pruned_loss=0.01674, audio_tagging_loss=0.009737, over 3043817.74 frames. ], batch size: 56, lr: 3.58e-03, grad_scale: 32.0 2023-11-21 12:58:02,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1511220.0, ans=0.125 2023-11-21 12:58:06,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1511220.0, ans=0.1 2023-11-21 12:58:16,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1511286.6666666667, ans=0.125 2023-11-21 12:58:18,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1511286.6666666667, ans=0.0 2023-11-21 12:58:20,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226700 2023-11-21 12:58:38,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1511420.0, ans=0.0 2023-11-21 12:58:50,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1511486.6666666667, ans=0.0 2023-11-21 12:58:51,832 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10300, loss[loss=0.07446, simple_loss=0.08476, pruned_loss=0.02034, audio_tagging_loss=0.01174, over 14867.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09448, pruned_loss=0.01658, audio_tagging_loss=0.009852, over 3049742.92 frames. ], batch size: 56, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 12:58:52,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1511486.6666666667, ans=0.0 2023-11-21 12:58:56,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-21 12:59:08,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1511553.3333333333, ans=0.0 2023-11-21 12:59:24,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226750 2023-11-21 12:59:32,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.325e+01 8.857e+01 9.782e+01 1.240e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-21 12:59:45,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1511753.3333333333, ans=0.125 2023-11-21 12:59:46,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1511753.3333333333, ans=0.125 2023-11-21 12:59:56,449 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10350, loss[loss=0.05675, simple_loss=0.0685, pruned_loss=0.01044, audio_tagging_loss=0.01206, over 15245.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09506, pruned_loss=0.01659, audio_tagging_loss=0.009996, over 3049106.85 frames. ], batch size: 58, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:00:00,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-21 13:00:18,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-11-21 13:00:26,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1511953.3333333333, ans=0.125 2023-11-21 13:00:27,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226800 2023-11-21 13:00:28,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1511953.3333333333, ans=0.2 2023-11-21 13:00:36,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1512020.0, ans=0.125 2023-11-21 13:00:39,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1512020.0, ans=0.125 2023-11-21 13:00:40,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1512020.0, ans=0.2 2023-11-21 13:01:00,011 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10400, loss[loss=0.0821, simple_loss=0.09964, pruned_loss=0.02497, audio_tagging_loss=0.007305, over 14759.00 frames. ], tot_loss[loss=0.07423, simple_loss=0.09537, pruned_loss=0.01658, audio_tagging_loss=0.00996, over 3047592.61 frames. ], batch size: 56, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:01:09,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-21 13:01:18,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1512220.0, ans=0.125 2023-11-21 13:01:31,585 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 13:01:32,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226850 2023-11-21 13:01:42,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.081e+01 8.812e+01 9.594e+01 2.017e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-21 13:02:03,532 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10450, loss[loss=0.08398, simple_loss=0.1062, pruned_loss=0.02246, audio_tagging_loss=0.008438, over 14219.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.0949, pruned_loss=0.0164, audio_tagging_loss=0.009963, over 3043424.09 frames. ], batch size: 56, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:02:16,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2023-11-21 13:02:20,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1512553.3333333333, ans=0.125 2023-11-21 13:02:31,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1512620.0, ans=0.0 2023-11-21 13:02:36,214 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226900 2023-11-21 13:02:48,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1512686.6666666667, ans=0.2 2023-11-21 13:02:53,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1512753.3333333333, ans=0.0 2023-11-21 13:02:53,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1512753.3333333333, ans=0.125 2023-11-21 13:03:08,413 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10500, loss[loss=0.08915, simple_loss=0.1139, pruned_loss=0.02374, audio_tagging_loss=0.008468, over 14808.00 frames. ], tot_loss[loss=0.07411, simple_loss=0.0955, pruned_loss=0.01653, audio_tagging_loss=0.009828, over 3045873.27 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:03:35,565 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 13:03:35,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1512953.3333333333, ans=0.2 2023-11-21 13:03:37,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1512953.3333333333, ans=0.0 2023-11-21 13:03:38,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 226950 2023-11-21 13:03:47,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1513020.0, ans=0.1 2023-11-21 13:03:49,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 8.276e+01 8.777e+01 9.449e+01 1.286e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 13:03:50,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1513020.0, ans=0.125 2023-11-21 13:03:56,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1513020.0, ans=0.125 2023-11-21 13:04:05,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1513086.6666666667, ans=0.125 2023-11-21 13:04:08,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-21 13:04:10,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1513153.3333333333, ans=0.0 2023-11-21 13:04:11,139 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10550, loss[loss=0.06157, simple_loss=0.08223, pruned_loss=0.01267, audio_tagging_loss=0.007782, over 15614.00 frames. ], tot_loss[loss=0.0739, simple_loss=0.09537, pruned_loss=0.01649, audio_tagging_loss=0.009725, over 3052459.02 frames. ], batch size: 60, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:04:17,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1513153.3333333333, ans=0.025 2023-11-21 13:04:29,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1513220.0, ans=0.1 2023-11-21 13:04:33,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2023-11-21 13:04:34,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1513220.0, ans=0.0 2023-11-21 13:04:43,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227000 2023-11-21 13:05:14,598 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10600, loss[loss=0.05566, simple_loss=0.06898, pruned_loss=0.01211, audio_tagging_loss=0.00906, over 14774.00 frames. ], tot_loss[loss=0.07421, simple_loss=0.09602, pruned_loss=0.01666, audio_tagging_loss=0.009541, over 3049685.57 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:05:33,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1513553.3333333333, ans=0.1 2023-11-21 13:05:35,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1513553.3333333333, ans=0.125 2023-11-21 13:05:39,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1513553.3333333333, ans=0.2 2023-11-21 13:05:41,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-21 13:05:43,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2023-11-21 13:05:47,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227050 2023-11-21 13:05:52,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2023-11-21 13:05:57,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.727e+01 8.166e+01 8.856e+01 9.761e+01 1.795e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-21 13:06:17,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1513753.3333333333, ans=0.125 2023-11-21 13:06:19,701 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10650, loss[loss=0.06683, simple_loss=0.08808, pruned_loss=0.01116, audio_tagging_loss=0.01163, over 15384.00 frames. ], tot_loss[loss=0.07411, simple_loss=0.09588, pruned_loss=0.01667, audio_tagging_loss=0.009494, over 3049420.83 frames. ], batch size: 56, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:06:19,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1513820.0, ans=0.125 2023-11-21 13:06:50,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227100 2023-11-21 13:07:13,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1514086.6666666667, ans=0.0 2023-11-21 13:07:22,142 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10700, loss[loss=0.07387, simple_loss=0.09995, pruned_loss=0.01166, audio_tagging_loss=0.01223, over 15135.00 frames. ], tot_loss[loss=0.07323, simple_loss=0.09464, pruned_loss=0.01632, audio_tagging_loss=0.00959, over 3038593.56 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:07:23,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514153.3333333333, ans=0.1 2023-11-21 13:07:33,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1514220.0, ans=0.035 2023-11-21 13:07:33,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1514220.0, ans=0.125 2023-11-21 13:07:47,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1514286.6666666667, ans=0.125 2023-11-21 13:07:51,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1514286.6666666667, ans=0.125 2023-11-21 13:07:55,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227150 2023-11-21 13:07:56,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514286.6666666667, ans=0.1 2023-11-21 13:08:03,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1514353.3333333333, ans=0.125 2023-11-21 13:08:04,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.011e+01 8.687e+01 9.462e+01 1.243e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 13:08:22,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1514420.0, ans=0.1 2023-11-21 13:08:25,497 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10750, loss[loss=0.0646, simple_loss=0.09125, pruned_loss=0.01174, audio_tagging_loss=0.007237, over 15565.00 frames. ], tot_loss[loss=0.07411, simple_loss=0.09615, pruned_loss=0.01655, audio_tagging_loss=0.009483, over 3048627.59 frames. ], batch size: 57, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:08:29,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1514486.6666666667, ans=0.125 2023-11-21 13:08:38,735 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 13:08:48,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1514553.3333333333, ans=0.1 2023-11-21 13:08:57,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227200 2023-11-21 13:09:08,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1514686.6666666667, ans=0.125 2023-11-21 13:09:12,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1514686.6666666667, ans=0.125 2023-11-21 13:09:23,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1514753.3333333333, ans=0.2 2023-11-21 13:09:30,366 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10800, loss[loss=0.05539, simple_loss=0.06894, pruned_loss=0.01102, audio_tagging_loss=0.009904, over 15124.00 frames. ], tot_loss[loss=0.07376, simple_loss=0.09572, pruned_loss=0.01646, audio_tagging_loss=0.009436, over 3045970.81 frames. ], batch size: 58, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:09:38,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1514820.0, ans=0.125 2023-11-21 13:09:41,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1514820.0, ans=0.2 2023-11-21 13:10:01,804 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227250 2023-11-21 13:10:13,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.538e+01 7.864e+01 8.499e+01 9.519e+01 1.176e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-21 13:10:29,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1515086.6666666667, ans=0.0 2023-11-21 13:10:34,451 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10850, loss[loss=0.09162, simple_loss=0.1222, pruned_loss=0.02407, audio_tagging_loss=0.006435, over 14928.00 frames. ], tot_loss[loss=0.07357, simple_loss=0.09526, pruned_loss=0.01639, audio_tagging_loss=0.009546, over 3044128.33 frames. ], batch size: 54, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:10:38,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1515153.3333333333, ans=0.125 2023-11-21 13:10:38,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1515153.3333333333, ans=0.0 2023-11-21 13:10:46,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1515220.0, ans=0.2 2023-11-21 13:11:05,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1515286.6666666667, ans=0.125 2023-11-21 13:11:06,257 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227300 2023-11-21 13:11:09,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515286.6666666667, ans=0.1 2023-11-21 13:11:14,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1515353.3333333333, ans=0.125 2023-11-21 13:11:16,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515353.3333333333, ans=0.1 2023-11-21 13:11:24,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-21 13:11:33,359 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 13:11:38,251 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10900, loss[loss=0.09077, simple_loss=0.1159, pruned_loss=0.02507, audio_tagging_loss=0.007774, over 15757.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.09535, pruned_loss=0.01654, audio_tagging_loss=0.009591, over 3044619.59 frames. ], batch size: 59, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:11:44,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1515486.6666666667, ans=0.0 2023-11-21 13:12:06,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1515620.0, ans=0.125 2023-11-21 13:12:11,311 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227350 2023-11-21 13:12:21,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-11-21 13:12:22,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.236e+01 8.108e+01 8.684e+01 9.152e+01 1.151e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 13:12:24,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1515686.6666666667, ans=0.0 2023-11-21 13:12:42,775 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 10950, loss[loss=0.08836, simple_loss=0.1244, pruned_loss=0.01751, audio_tagging_loss=0.008664, over 15165.00 frames. ], tot_loss[loss=0.07346, simple_loss=0.09479, pruned_loss=0.01647, audio_tagging_loss=0.009585, over 3039498.81 frames. ], batch size: 57, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:12:49,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1515820.0, ans=0.2 2023-11-21 13:12:49,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1515820.0, ans=0.2 2023-11-21 13:13:02,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1515886.6666666667, ans=0.0 2023-11-21 13:13:12,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1515953.3333333333, ans=0.5 2023-11-21 13:13:14,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227400 2023-11-21 13:13:20,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1516020.0, ans=0.125 2023-11-21 13:13:22,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1516020.0, ans=0.0 2023-11-21 13:13:28,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1516020.0, ans=0.125 2023-11-21 13:13:35,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1516086.6666666667, ans=0.125 2023-11-21 13:13:48,211 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11000, loss[loss=0.07236, simple_loss=0.0923, pruned_loss=0.01432, audio_tagging_loss=0.01189, over 14838.00 frames. ], tot_loss[loss=0.07324, simple_loss=0.09465, pruned_loss=0.01627, audio_tagging_loss=0.009642, over 3030338.93 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:13:49,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1516153.3333333333, ans=0.2 2023-11-21 13:13:58,138 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 13:14:07,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2023-11-21 13:14:20,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227450 2023-11-21 13:14:28,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1516353.3333333333, ans=0.0 2023-11-21 13:14:32,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.066e+01 8.741e+01 9.512e+01 1.178e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 13:14:50,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1516420.0, ans=0.0 2023-11-21 13:14:52,767 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11050, loss[loss=0.06013, simple_loss=0.07878, pruned_loss=0.008274, audio_tagging_loss=0.01247, over 16611.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09487, pruned_loss=0.01619, audio_tagging_loss=0.009673, over 3034633.67 frames. ], batch size: 62, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:15:02,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1516486.6666666667, ans=0.2 2023-11-21 13:15:04,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1516553.3333333333, ans=0.125 2023-11-21 13:15:19,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516620.0, ans=0.1 2023-11-21 13:15:25,279 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227500 2023-11-21 13:15:57,806 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11100, loss[loss=0.0703, simple_loss=0.09067, pruned_loss=0.01686, audio_tagging_loss=0.0081, over 16165.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09555, pruned_loss=0.01632, audio_tagging_loss=0.009827, over 3040399.49 frames. ], batch size: 61, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:16:19,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1516886.6666666667, ans=10.0 2023-11-21 13:16:24,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-21 13:16:24,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1516953.3333333333, ans=0.0 2023-11-21 13:16:26,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1516953.3333333333, ans=0.125 2023-11-21 13:16:30,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227550 2023-11-21 13:16:32,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1516953.3333333333, ans=0.125 2023-11-21 13:16:41,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.109e+01 8.821e+01 9.588e+01 2.410e+02, threshold=1.764e+02, percent-clipped=1.0 2023-11-21 13:16:42,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-21 13:16:43,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2023-11-21 13:16:50,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517086.6666666667, ans=0.1 2023-11-21 13:17:03,331 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11150, loss[loss=0.05922, simple_loss=0.07019, pruned_loss=0.01138, audio_tagging_loss=0.01274, over 14147.00 frames. ], tot_loss[loss=0.0741, simple_loss=0.09538, pruned_loss=0.01652, audio_tagging_loss=0.009894, over 3040537.36 frames. ], batch size: 54, lr: 3.57e-03, grad_scale: 16.0 2023-11-21 13:17:35,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227600 2023-11-21 13:17:52,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1517353.3333333333, ans=0.125 2023-11-21 13:17:52,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517353.3333333333, ans=0.1 2023-11-21 13:18:04,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-21 13:18:07,666 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11200, loss[loss=0.08134, simple_loss=0.1078, pruned_loss=0.01871, audio_tagging_loss=0.008711, over 16507.00 frames. ], tot_loss[loss=0.07426, simple_loss=0.09554, pruned_loss=0.01652, audio_tagging_loss=0.009967, over 3040535.21 frames. ], batch size: 63, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:18:34,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1517620.0, ans=0.1 2023-11-21 13:18:40,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227650 2023-11-21 13:18:42,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1517620.0, ans=0.125 2023-11-21 13:18:51,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.451e+01 8.180e+01 8.891e+01 9.538e+01 1.267e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 13:18:58,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1517753.3333333333, ans=0.0 2023-11-21 13:19:01,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-21 13:19:02,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1517753.3333333333, ans=0.125 2023-11-21 13:19:11,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1517820.0, ans=0.1 2023-11-21 13:19:12,935 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11250, loss[loss=0.05735, simple_loss=0.06866, pruned_loss=0.01411, audio_tagging_loss=0.008907, over 15264.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.09658, pruned_loss=0.01676, audio_tagging_loss=0.009758, over 3046372.35 frames. ], batch size: 58, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:19:27,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517886.6666666667, ans=0.1 2023-11-21 13:19:34,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1517886.6666666667, ans=0.125 2023-11-21 13:19:37,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1517953.3333333333, ans=0.0 2023-11-21 13:19:45,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227700 2023-11-21 13:19:50,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1518020.0, ans=0.0 2023-11-21 13:19:50,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1518020.0, ans=0.125 2023-11-21 13:20:00,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1518020.0, ans=10.0 2023-11-21 13:20:18,085 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11300, loss[loss=0.0628, simple_loss=0.07744, pruned_loss=0.01541, audio_tagging_loss=0.008674, over 15395.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.0974, pruned_loss=0.01702, audio_tagging_loss=0.00946, over 3049887.98 frames. ], batch size: 58, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:20:31,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1518220.0, ans=0.1 2023-11-21 13:20:48,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227750 2023-11-21 13:21:01,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.039e+01 8.792e+01 9.406e+01 1.983e+02, threshold=1.758e+02, percent-clipped=1.0 2023-11-21 13:21:14,060 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.946e-02 2023-11-21 13:21:21,391 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11350, loss[loss=0.0675, simple_loss=0.0779, pruned_loss=0.01772, audio_tagging_loss=0.01083, over 14740.00 frames. ], tot_loss[loss=0.07427, simple_loss=0.0963, pruned_loss=0.01668, audio_tagging_loss=0.009437, over 3043292.87 frames. ], batch size: 57, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:21:39,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1518553.3333333333, ans=0.125 2023-11-21 13:21:42,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1518553.3333333333, ans=0.0 2023-11-21 13:21:54,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227800 2023-11-21 13:22:16,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-21 13:22:25,996 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11400, loss[loss=0.09019, simple_loss=0.1148, pruned_loss=0.02551, audio_tagging_loss=0.007292, over 14674.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09705, pruned_loss=0.01701, audio_tagging_loss=0.009348, over 3044683.18 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:22:27,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1518820.0, ans=0.1 2023-11-21 13:22:31,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1518820.0, ans=0.125 2023-11-21 13:22:54,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1518953.3333333333, ans=0.0 2023-11-21 13:22:58,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227850 2023-11-21 13:23:09,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.128e+01 8.790e+01 9.462e+01 1.169e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 13:23:09,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1519020.0, ans=0.0 2023-11-21 13:23:14,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1519020.0, ans=0.125 2023-11-21 13:23:17,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1519086.6666666667, ans=0.5 2023-11-21 13:23:20,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1519086.6666666667, ans=0.0 2023-11-21 13:23:30,972 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11450, loss[loss=0.06683, simple_loss=0.0836, pruned_loss=0.01613, audio_tagging_loss=0.008899, over 15108.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09694, pruned_loss=0.017, audio_tagging_loss=0.009403, over 3042478.47 frames. ], batch size: 57, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:23:39,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1519153.3333333333, ans=0.0 2023-11-21 13:24:02,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227900 2023-11-21 13:24:04,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1519286.6666666667, ans=0.125 2023-11-21 13:24:10,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1519353.3333333333, ans=0.0 2023-11-21 13:24:12,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1519353.3333333333, ans=0.125 2023-11-21 13:24:19,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1519353.3333333333, ans=0.125 2023-11-21 13:24:21,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1519420.0, ans=0.125 2023-11-21 13:24:34,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-11-21 13:24:34,781 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11500, loss[loss=0.07597, simple_loss=0.1089, pruned_loss=0.01545, audio_tagging_loss=0.00608, over 15090.00 frames. ], tot_loss[loss=0.07476, simple_loss=0.09688, pruned_loss=0.01697, audio_tagging_loss=0.00935, over 3038112.84 frames. ], batch size: 55, lr: 3.57e-03, grad_scale: 32.0 2023-11-21 13:24:38,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1519486.6666666667, ans=0.125 2023-11-21 13:24:54,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1519553.3333333333, ans=0.0 2023-11-21 13:24:57,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1519553.3333333333, ans=0.125 2023-11-21 13:25:01,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2023-11-21 13:25:04,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1519620.0, ans=0.95 2023-11-21 13:25:07,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 227950 2023-11-21 13:25:13,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-21 13:25:17,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-21 13:25:18,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 8.149e+01 8.832e+01 9.671e+01 1.256e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-21 13:25:37,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-21 13:25:38,545 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11550, loss[loss=0.06199, simple_loss=0.08037, pruned_loss=0.0129, audio_tagging_loss=0.008905, over 14161.00 frames. ], tot_loss[loss=0.07494, simple_loss=0.09714, pruned_loss=0.01702, audio_tagging_loss=0.009346, over 3037285.33 frames. ], batch size: 53, lr: 3.56e-03, grad_scale: 16.0 2023-11-21 13:25:44,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1519820.0, ans=0.0 2023-11-21 13:26:10,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228000 2023-11-21 13:26:20,425 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 13:26:21,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1520020.0, ans=0.0 2023-11-21 13:26:23,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1520020.0, ans=0.125 2023-11-21 13:26:44,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1520086.6666666667, ans=0.035 2023-11-21 13:26:45,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-21 13:26:47,319 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11600, loss[loss=0.06149, simple_loss=0.08331, pruned_loss=0.01174, audio_tagging_loss=0.008097, over 15648.00 frames. ], tot_loss[loss=0.07435, simple_loss=0.09631, pruned_loss=0.0168, audio_tagging_loss=0.009397, over 3034589.17 frames. ], batch size: 59, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:27:14,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1520286.6666666667, ans=0.125 2023-11-21 13:27:18,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228050 2023-11-21 13:27:32,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.168e+01 8.737e+01 9.633e+01 1.247e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 13:27:37,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1520353.3333333333, ans=0.0 2023-11-21 13:27:38,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1520420.0, ans=0.0 2023-11-21 13:27:42,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1520420.0, ans=0.1 2023-11-21 13:27:45,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1520420.0, ans=0.0 2023-11-21 13:27:51,775 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11650, loss[loss=0.09907, simple_loss=0.1301, pruned_loss=0.02589, audio_tagging_loss=0.008141, over 15265.00 frames. ], tot_loss[loss=0.07533, simple_loss=0.09789, pruned_loss=0.01707, audio_tagging_loss=0.009321, over 3039249.31 frames. ], batch size: 55, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:28:20,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1520620.0, ans=0.125 2023-11-21 13:28:24,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228100 2023-11-21 13:28:32,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1520686.6666666667, ans=0.125 2023-11-21 13:28:36,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1520686.6666666667, ans=0.0 2023-11-21 13:28:54,875 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11700, loss[loss=0.07936, simple_loss=0.09979, pruned_loss=0.02091, audio_tagging_loss=0.008557, over 15828.00 frames. ], tot_loss[loss=0.07538, simple_loss=0.0979, pruned_loss=0.01704, audio_tagging_loss=0.009388, over 3038464.36 frames. ], batch size: 58, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:29:08,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1520886.6666666667, ans=0.125 2023-11-21 13:29:13,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=22.5 2023-11-21 13:29:14,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1520886.6666666667, ans=0.1 2023-11-21 13:29:14,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1520886.6666666667, ans=0.2 2023-11-21 13:29:27,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228150 2023-11-21 13:29:39,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.045e+01 8.764e+01 9.289e+01 1.171e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 13:29:40,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2023-11-21 13:29:54,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1521086.6666666667, ans=0.2 2023-11-21 13:29:59,669 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11750, loss[loss=0.06627, simple_loss=0.07997, pruned_loss=0.01243, audio_tagging_loss=0.01385, over 15319.00 frames. ], tot_loss[loss=0.07503, simple_loss=0.09713, pruned_loss=0.01695, audio_tagging_loss=0.009517, over 3039082.67 frames. ], batch size: 56, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:30:15,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1521220.0, ans=0.0 2023-11-21 13:30:20,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1521220.0, ans=0.0 2023-11-21 13:30:30,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228200 2023-11-21 13:30:36,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-11-21 13:30:41,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1521353.3333333333, ans=0.1 2023-11-21 13:30:44,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1521353.3333333333, ans=0.125 2023-11-21 13:30:47,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1521353.3333333333, ans=0.2 2023-11-21 13:30:53,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521420.0, ans=0.1 2023-11-21 13:30:56,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1521420.0, ans=0.0 2023-11-21 13:31:03,783 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11800, loss[loss=0.07235, simple_loss=0.0936, pruned_loss=0.01631, audio_tagging_loss=0.009239, over 15567.00 frames. ], tot_loss[loss=0.07457, simple_loss=0.0966, pruned_loss=0.01668, audio_tagging_loss=0.009587, over 3034307.03 frames. ], batch size: 59, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:31:20,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1521553.3333333333, ans=0.0 2023-11-21 13:31:26,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1521553.3333333333, ans=0.125 2023-11-21 13:31:35,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228250 2023-11-21 13:31:48,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.292e+01 8.361e+01 9.006e+01 1.004e+02 1.264e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-21 13:31:50,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1521686.6666666667, ans=0.125 2023-11-21 13:32:02,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1521753.3333333333, ans=0.2 2023-11-21 13:32:03,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 13:32:07,049 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11850, loss[loss=0.09814, simple_loss=0.1298, pruned_loss=0.02567, audio_tagging_loss=0.007596, over 14711.00 frames. ], tot_loss[loss=0.0748, simple_loss=0.097, pruned_loss=0.01664, audio_tagging_loss=0.009667, over 3037600.11 frames. ], batch size: 55, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:32:08,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1521820.0, ans=0.125 2023-11-21 13:32:40,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228300 2023-11-21 13:32:50,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1522020.0, ans=0.0 2023-11-21 13:33:11,963 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11900, loss[loss=0.06984, simple_loss=0.07007, pruned_loss=0.0181, audio_tagging_loss=0.01671, over 14860.00 frames. ], tot_loss[loss=0.07414, simple_loss=0.09583, pruned_loss=0.01637, audio_tagging_loss=0.009856, over 3042764.21 frames. ], batch size: 57, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:33:14,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1522153.3333333333, ans=0.125 2023-11-21 13:33:30,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1522220.0, ans=0.0 2023-11-21 13:33:34,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1522220.0, ans=0.0 2023-11-21 13:33:43,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-21 13:33:43,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228350 2023-11-21 13:33:44,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1522286.6666666667, ans=0.95 2023-11-21 13:33:56,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.221e+01 8.725e+01 9.501e+01 1.211e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 13:34:12,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-21 13:34:16,698 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 11950, loss[loss=0.07343, simple_loss=0.1027, pruned_loss=0.0161, audio_tagging_loss=0.005989, over 15897.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.09527, pruned_loss=0.01622, audio_tagging_loss=0.009954, over 3043362.55 frames. ], batch size: 58, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:34:30,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1522553.3333333333, ans=0.0 2023-11-21 13:34:47,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228400 2023-11-21 13:35:00,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1522686.6666666667, ans=0.125 2023-11-21 13:35:14,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1522753.3333333333, ans=0.0 2023-11-21 13:35:18,697 INFO [train_asr.py:1221] (1/4) Epoch 19, batch 12000, loss[loss=0.08064, simple_loss=0.09018, pruned_loss=0.02223, audio_tagging_loss=0.01332, over 14911.00 frames. ], tot_loss[loss=0.07426, simple_loss=0.09573, pruned_loss=0.0164, audio_tagging_loss=0.009996, over 3045878.92 frames. ], batch size: 57, lr: 3.56e-03, grad_scale: 32.0 2023-11-21 13:35:18,697 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 13:36:01,999 INFO [train_asr.py:1253] (1/4) Epoch 19, validation: loss=0.06018, simple_loss=0.05227, pruned_loss=0.005307, audio_tagging_loss=0.02874, over 4681554.00 frames. 2023-11-21 13:36:02,000 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 13:36:05,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1522820.0, ans=0.0 2023-11-21 13:37:05,067 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 0, loss[loss=0.1004, simple_loss=0.1142, pruned_loss=0.02474, audio_tagging_loss=0.01861, over 15748.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1142, pruned_loss=0.02474, audio_tagging_loss=0.01861, over 15748.00 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:37:05,068 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 13:37:32,015 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7796, 5.8127, 5.8714, 5.8507], device='cuda:1') 2023-11-21 13:37:41,801 INFO [train_asr.py:1253] (1/4) Epoch 20, validation: loss=0.05938, simple_loss=0.0523, pruned_loss=0.005287, audio_tagging_loss=0.02794, over 4681554.00 frames. 2023-11-21 13:37:41,802 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 13:37:43,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228450 2023-11-21 13:37:53,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-21 13:37:55,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.213e+01 9.034e+01 9.702e+01 1.198e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-21 13:38:06,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-21 13:38:24,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1523180.0, ans=0.05 2023-11-21 13:38:26,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1523180.0, ans=0.035 2023-11-21 13:38:40,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523246.6666666667, ans=0.1 2023-11-21 13:38:44,841 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 50, loss[loss=0.07403, simple_loss=0.09144, pruned_loss=0.01186, audio_tagging_loss=0.01645, over 15283.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1006, pruned_loss=0.01737, audio_tagging_loss=0.01776, over 686167.76 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:38:44,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1523313.3333333333, ans=0.125 2023-11-21 13:38:46,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228500 2023-11-21 13:38:48,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-21 13:39:05,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1523380.0, ans=0.125 2023-11-21 13:39:36,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1523580.0, ans=0.125 2023-11-21 13:39:49,070 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 100, loss[loss=0.0763, simple_loss=0.08592, pruned_loss=0.01676, audio_tagging_loss=0.01658, over 15103.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.09642, pruned_loss=0.01684, audio_tagging_loss=0.01762, over 1208882.94 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:39:50,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228550 2023-11-21 13:40:04,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.678e+01 9.460e+01 1.029e+02 1.398e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-21 13:40:06,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1523713.3333333333, ans=0.04949747468305833 2023-11-21 13:40:13,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1523713.3333333333, ans=0.2 2023-11-21 13:40:23,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1523780.0, ans=0.0 2023-11-21 13:40:43,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1523913.3333333333, ans=0.0 2023-11-21 13:40:50,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-21 13:40:54,011 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 150, loss[loss=0.06489, simple_loss=0.09085, pruned_loss=0.00858, audio_tagging_loss=0.01088, over 16297.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.09871, pruned_loss=0.01713, audio_tagging_loss=0.01558, over 1611201.56 frames. ], batch size: 58, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:40:55,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228600 2023-11-21 13:41:08,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1524046.6666666667, ans=0.0 2023-11-21 13:41:18,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1524113.3333333333, ans=0.95 2023-11-21 13:41:48,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2023-11-21 13:41:58,546 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 200, loss[loss=0.07418, simple_loss=0.09284, pruned_loss=0.01768, audio_tagging_loss=0.01008, over 15401.00 frames. ], tot_loss[loss=0.08018, simple_loss=0.09835, pruned_loss=0.01721, audio_tagging_loss=0.01379, over 1933458.71 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:41:59,821 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228650 2023-11-21 13:42:12,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.304e+01 8.770e+01 9.399e+01 1.399e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 13:42:24,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1524446.6666666667, ans=0.2 2023-11-21 13:42:27,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1524446.6666666667, ans=0.2 2023-11-21 13:42:31,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1524446.6666666667, ans=0.0 2023-11-21 13:42:38,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-21 13:42:42,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1524513.3333333333, ans=0.125 2023-11-21 13:42:43,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=22.5 2023-11-21 13:43:02,047 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 250, loss[loss=0.09105, simple_loss=0.1218, pruned_loss=0.02392, audio_tagging_loss=0.006252, over 15590.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09992, pruned_loss=0.01744, audio_tagging_loss=0.01235, over 2185283.38 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:43:02,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-21 13:43:03,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228700 2023-11-21 13:43:23,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1524713.3333333333, ans=0.125 2023-11-21 13:43:32,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1524780.0, ans=0.2 2023-11-21 13:43:54,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1524913.3333333333, ans=0.0 2023-11-21 13:44:07,306 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 300, loss[loss=0.06274, simple_loss=0.08442, pruned_loss=0.01135, audio_tagging_loss=0.009181, over 15631.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.1004, pruned_loss=0.01772, audio_tagging_loss=0.01159, over 2377023.79 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:44:08,647 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228750 2023-11-21 13:44:16,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1524980.0, ans=0.125 2023-11-21 13:44:21,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.024e+01 8.166e+01 8.999e+01 9.797e+01 1.140e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-21 13:45:04,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1525246.6666666667, ans=0.125 2023-11-21 13:45:08,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1525246.6666666667, ans=0.0 2023-11-21 13:45:11,684 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 350, loss[loss=0.08997, simple_loss=0.1274, pruned_loss=0.02118, audio_tagging_loss=0.005088, over 15409.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09909, pruned_loss=0.01731, audio_tagging_loss=0.01104, over 2525178.04 frames. ], batch size: 54, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:45:12,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228800 2023-11-21 13:45:26,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1525380.0, ans=0.125 2023-11-21 13:45:42,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1525446.6666666667, ans=0.125 2023-11-21 13:45:47,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-21 13:46:03,728 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 13:46:15,862 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 400, loss[loss=0.07479, simple_loss=0.09843, pruned_loss=0.01312, audio_tagging_loss=0.01246, over 14952.00 frames. ], tot_loss[loss=0.07746, simple_loss=0.09921, pruned_loss=0.01727, audio_tagging_loss=0.01059, over 2644409.90 frames. ], batch size: 54, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:46:17,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228850 2023-11-21 13:46:20,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1525646.6666666667, ans=0.125 2023-11-21 13:46:22,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1525646.6666666667, ans=0.07 2023-11-21 13:46:26,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1525646.6666666667, ans=0.0 2023-11-21 13:46:26,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1525646.6666666667, ans=0.0 2023-11-21 13:46:30,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 8.083e+01 8.649e+01 9.202e+01 1.163e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 13:47:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1525913.3333333333, ans=0.125 2023-11-21 13:47:15,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1525913.3333333333, ans=0.125 2023-11-21 13:47:20,841 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 450, loss[loss=0.08636, simple_loss=0.119, pruned_loss=0.02017, audio_tagging_loss=0.006691, over 14330.00 frames. ], tot_loss[loss=0.07782, simple_loss=0.1, pruned_loss=0.01741, audio_tagging_loss=0.01039, over 2734168.48 frames. ], batch size: 55, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:47:22,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228900 2023-11-21 13:47:22,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1525980.0, ans=0.125 2023-11-21 13:47:28,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1525980.0, ans=0.1 2023-11-21 13:47:50,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1526113.3333333333, ans=0.0 2023-11-21 13:47:56,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1526113.3333333333, ans=0.035 2023-11-21 13:48:03,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2023-11-21 13:48:04,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1526180.0, ans=0.0 2023-11-21 13:48:20,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2023-11-21 13:48:24,761 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 500, loss[loss=0.06882, simple_loss=0.08848, pruned_loss=0.01661, audio_tagging_loss=0.007969, over 16209.00 frames. ], tot_loss[loss=0.07644, simple_loss=0.09838, pruned_loss=0.01708, audio_tagging_loss=0.01017, over 2807388.13 frames. ], batch size: 61, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:48:26,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 228950 2023-11-21 13:48:36,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2023-11-21 13:48:38,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.882e+01 7.917e+01 8.636e+01 9.467e+01 1.294e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-21 13:49:28,863 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 550, loss[loss=0.07311, simple_loss=0.08882, pruned_loss=0.01692, audio_tagging_loss=0.01178, over 15200.00 frames. ], tot_loss[loss=0.07612, simple_loss=0.09813, pruned_loss=0.01701, audio_tagging_loss=0.01004, over 2860736.75 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2023-11-21 13:49:30,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229000 2023-11-21 13:50:07,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1526846.6666666667, ans=0.125 2023-11-21 13:50:08,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-21 13:50:10,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1526846.6666666667, ans=0.1 2023-11-21 13:50:30,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1526913.3333333333, ans=0.125 2023-11-21 13:50:33,695 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 600, loss[loss=0.07411, simple_loss=0.09209, pruned_loss=0.01857, audio_tagging_loss=0.009493, over 14785.00 frames. ], tot_loss[loss=0.07601, simple_loss=0.09797, pruned_loss=0.01696, audio_tagging_loss=0.01007, over 2899852.26 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:50:35,093 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229050 2023-11-21 13:50:43,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2023-11-21 13:50:47,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.121e+01 7.930e+01 8.613e+01 9.327e+01 1.199e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 13:51:07,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1527113.3333333333, ans=0.1 2023-11-21 13:51:19,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1527180.0, ans=0.2 2023-11-21 13:51:22,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1527180.0, ans=0.125 2023-11-21 13:51:38,007 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 650, loss[loss=0.07917, simple_loss=0.09877, pruned_loss=0.01898, audio_tagging_loss=0.0108, over 14478.00 frames. ], tot_loss[loss=0.076, simple_loss=0.09785, pruned_loss=0.01707, audio_tagging_loss=0.01, over 2926155.29 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:51:39,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229100 2023-11-21 13:51:50,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-21 13:52:05,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1527446.6666666667, ans=0.125 2023-11-21 13:52:09,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1527446.6666666667, ans=0.125 2023-11-21 13:52:11,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1527446.6666666667, ans=0.125 2023-11-21 13:52:14,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1527446.6666666667, ans=0.125 2023-11-21 13:52:39,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1527580.0, ans=0.2 2023-11-21 13:52:41,374 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 700, loss[loss=0.06033, simple_loss=0.07074, pruned_loss=0.01294, audio_tagging_loss=0.01202, over 15342.00 frames. ], tot_loss[loss=0.07578, simple_loss=0.09783, pruned_loss=0.01696, audio_tagging_loss=0.009902, over 2958844.76 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:52:43,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229150 2023-11-21 13:52:46,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2023-11-21 13:52:54,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1527713.3333333333, ans=0.125 2023-11-21 13:52:55,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1527713.3333333333, ans=0.2 2023-11-21 13:52:56,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.031e+01 8.760e+01 9.624e+01 1.233e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 13:53:01,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1527713.3333333333, ans=0.2 2023-11-21 13:53:07,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1527780.0, ans=0.025 2023-11-21 13:53:17,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1527780.0, ans=0.0 2023-11-21 13:53:47,346 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 750, loss[loss=0.06974, simple_loss=0.082, pruned_loss=0.01711, audio_tagging_loss=0.01163, over 14360.00 frames. ], tot_loss[loss=0.07626, simple_loss=0.09855, pruned_loss=0.0171, audio_tagging_loss=0.009882, over 2984027.42 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:53:48,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229200 2023-11-21 13:53:48,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1527980.0, ans=0.125 2023-11-21 13:53:49,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1527980.0, ans=0.2 2023-11-21 13:53:53,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1527980.0, ans=0.0 2023-11-21 13:54:00,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1528046.6666666667, ans=0.125 2023-11-21 13:54:02,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-21 13:54:17,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1528113.3333333333, ans=0.2 2023-11-21 13:54:24,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1528180.0, ans=0.0 2023-11-21 13:54:27,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1528180.0, ans=15.0 2023-11-21 13:54:48,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528246.6666666667, ans=0.1 2023-11-21 13:54:48,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1528246.6666666667, ans=0.1 2023-11-21 13:54:51,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-21 13:54:52,688 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 800, loss[loss=0.0962, simple_loss=0.13, pruned_loss=0.02287, audio_tagging_loss=0.008331, over 14836.00 frames. ], tot_loss[loss=0.07696, simple_loss=0.09958, pruned_loss=0.01733, audio_tagging_loss=0.009841, over 2996541.48 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:54:54,010 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229250 2023-11-21 13:55:06,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.111e+01 8.814e+01 9.591e+01 1.302e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-21 13:55:22,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-21 13:55:33,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-11-21 13:55:34,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1528513.3333333333, ans=0.1 2023-11-21 13:55:45,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1528580.0, ans=0.0 2023-11-21 13:55:52,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1528580.0, ans=0.125 2023-11-21 13:55:57,294 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 850, loss[loss=0.07587, simple_loss=0.08703, pruned_loss=0.02025, audio_tagging_loss=0.01211, over 14820.00 frames. ], tot_loss[loss=0.07642, simple_loss=0.0985, pruned_loss=0.01729, audio_tagging_loss=0.009884, over 3005636.27 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 13:55:58,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229300 2023-11-21 13:56:01,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1528646.6666666667, ans=0.125 2023-11-21 13:56:07,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1528646.6666666667, ans=0.95 2023-11-21 13:56:09,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1528713.3333333333, ans=0.0 2023-11-21 13:56:25,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-21 13:57:02,857 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 900, loss[loss=0.05773, simple_loss=0.07183, pruned_loss=0.008163, audio_tagging_loss=0.01365, over 13903.00 frames. ], tot_loss[loss=0.07606, simple_loss=0.09809, pruned_loss=0.01705, audio_tagging_loss=0.009964, over 3022312.77 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 13:57:04,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229350 2023-11-21 13:57:07,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-21 13:57:12,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-21 13:57:18,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=22.5 2023-11-21 13:57:19,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.022e+01 8.539e+01 9.514e+01 1.359e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-21 13:57:56,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-21 13:58:08,512 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 950, loss[loss=0.06491, simple_loss=0.08603, pruned_loss=0.01157, audio_tagging_loss=0.01033, over 15209.00 frames. ], tot_loss[loss=0.07597, simple_loss=0.09811, pruned_loss=0.01716, audio_tagging_loss=0.00976, over 3025671.09 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 13:58:09,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229400 2023-11-21 13:58:16,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1529313.3333333333, ans=0.125 2023-11-21 13:58:23,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-21 13:58:30,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1529380.0, ans=0.125 2023-11-21 13:58:32,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1529446.6666666667, ans=0.1 2023-11-21 13:58:34,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2023-11-21 13:58:51,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1529513.3333333333, ans=0.125 2023-11-21 13:58:51,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1529513.3333333333, ans=0.0 2023-11-21 13:58:57,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1529513.3333333333, ans=0.2 2023-11-21 13:59:12,409 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1000, loss[loss=0.06009, simple_loss=0.07887, pruned_loss=0.01226, audio_tagging_loss=0.0084, over 14059.00 frames. ], tot_loss[loss=0.07523, simple_loss=0.09701, pruned_loss=0.01704, audio_tagging_loss=0.009687, over 3033236.51 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 13:59:13,741 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229450 2023-11-21 13:59:17,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1529646.6666666667, ans=0.125 2023-11-21 13:59:27,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.099e+01 8.790e+01 9.899e+01 1.209e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 13:59:34,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1529713.3333333333, ans=0.0 2023-11-21 13:59:41,905 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:00:11,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1529913.3333333333, ans=0.07 2023-11-21 14:00:16,820 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1050, loss[loss=0.06616, simple_loss=0.08771, pruned_loss=0.01453, audio_tagging_loss=0.007771, over 15258.00 frames. ], tot_loss[loss=0.07466, simple_loss=0.09652, pruned_loss=0.0168, audio_tagging_loss=0.009599, over 3031354.38 frames. ], batch size: 57, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:00:18,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229500 2023-11-21 14:00:32,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1530046.6666666667, ans=0.125 2023-11-21 14:00:36,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-21 14:00:43,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1530113.3333333333, ans=0.0 2023-11-21 14:00:44,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1530113.3333333333, ans=0.1 2023-11-21 14:01:23,288 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1100, loss[loss=0.0719, simple_loss=0.09502, pruned_loss=0.01551, audio_tagging_loss=0.008873, over 16117.00 frames. ], tot_loss[loss=0.07402, simple_loss=0.09558, pruned_loss=0.0166, audio_tagging_loss=0.009639, over 3039551.65 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:01:24,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229550 2023-11-21 14:01:27,042 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:01:38,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.405e+01 8.211e+01 9.088e+01 9.888e+01 2.167e+02, threshold=1.818e+02, percent-clipped=1.0 2023-11-21 14:01:59,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2023-11-21 14:02:22,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-11-21 14:02:28,044 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1150, loss[loss=0.06782, simple_loss=0.09276, pruned_loss=0.0136, audio_tagging_loss=0.007846, over 15189.00 frames. ], tot_loss[loss=0.07379, simple_loss=0.0953, pruned_loss=0.01655, audio_tagging_loss=0.009597, over 3040271.28 frames. ], batch size: 56, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:02:29,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229600 2023-11-21 14:02:51,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1530713.3333333333, ans=0.0 2023-11-21 14:03:08,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1530846.6666666667, ans=0.125 2023-11-21 14:03:31,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1530980.0, ans=0.0 2023-11-21 14:03:32,053 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1200, loss[loss=0.07014, simple_loss=0.08967, pruned_loss=0.01723, audio_tagging_loss=0.008068, over 15473.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09451, pruned_loss=0.01628, audio_tagging_loss=0.009685, over 3045456.78 frames. ], batch size: 56, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:03:33,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229650 2023-11-21 14:03:36,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1530980.0, ans=0.0 2023-11-21 14:03:43,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1530980.0, ans=0.05 2023-11-21 14:03:48,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.148e+01 8.693e+01 9.555e+01 1.704e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 14:04:15,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1531180.0, ans=0.2 2023-11-21 14:04:32,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1531246.6666666667, ans=0.1 2023-11-21 14:04:34,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-21 14:04:38,686 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1250, loss[loss=0.07599, simple_loss=0.09851, pruned_loss=0.01669, audio_tagging_loss=0.01004, over 15542.00 frames. ], tot_loss[loss=0.07345, simple_loss=0.09482, pruned_loss=0.01642, audio_tagging_loss=0.009623, over 3049122.50 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:04:40,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229700 2023-11-21 14:05:08,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:05:09,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1531446.6666666667, ans=0.2 2023-11-21 14:05:43,351 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1300, loss[loss=0.06655, simple_loss=0.08424, pruned_loss=0.0131, audio_tagging_loss=0.01133, over 15101.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.09375, pruned_loss=0.01611, audio_tagging_loss=0.009704, over 3043464.21 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:05:44,642 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229750 2023-11-21 14:05:58,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.180e+01 8.668e+01 9.567e+01 1.145e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 14:06:17,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2023-11-21 14:06:19,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1531780.0, ans=0.1 2023-11-21 14:06:22,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-21 14:06:24,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2023-11-21 14:06:24,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1531846.6666666667, ans=0.1 2023-11-21 14:06:48,137 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1350, loss[loss=0.08014, simple_loss=0.1055, pruned_loss=0.01986, audio_tagging_loss=0.007503, over 15300.00 frames. ], tot_loss[loss=0.07258, simple_loss=0.09373, pruned_loss=0.01611, audio_tagging_loss=0.009609, over 3045340.11 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:06:49,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229800 2023-11-21 14:07:05,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1532046.6666666667, ans=0.0 2023-11-21 14:07:05,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1532046.6666666667, ans=0.2 2023-11-21 14:07:06,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1532046.6666666667, ans=0.2 2023-11-21 14:07:09,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1532046.6666666667, ans=0.125 2023-11-21 14:07:34,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1532180.0, ans=0.0 2023-11-21 14:07:35,717 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:07:39,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-21 14:07:40,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1532246.6666666667, ans=0.125 2023-11-21 14:07:52,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2023-11-21 14:07:52,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-21 14:07:55,121 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1400, loss[loss=0.07051, simple_loss=0.0848, pruned_loss=0.02061, audio_tagging_loss=0.007508, over 14140.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09477, pruned_loss=0.01637, audio_tagging_loss=0.00954, over 3046919.07 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:07:56,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229850 2023-11-21 14:07:59,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1532313.3333333333, ans=0.125 2023-11-21 14:08:10,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.288e+01 8.939e+01 9.468e+01 2.052e+02, threshold=1.788e+02, percent-clipped=1.0 2023-11-21 14:08:10,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1532380.0, ans=0.0 2023-11-21 14:08:11,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1532380.0, ans=0.125 2023-11-21 14:08:23,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=1532446.6666666667, ans=12.0 2023-11-21 14:08:25,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-21 14:08:27,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-21 14:08:59,454 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1450, loss[loss=0.07954, simple_loss=0.09288, pruned_loss=0.02171, audio_tagging_loss=0.01139, over 14137.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.0956, pruned_loss=0.01652, audio_tagging_loss=0.009616, over 3046971.91 frames. ], batch size: 57, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:09:00,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229900 2023-11-21 14:09:26,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1532780.0, ans=0.2 2023-11-21 14:09:33,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1532780.0, ans=0.125 2023-11-21 14:09:40,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1532846.6666666667, ans=0.0 2023-11-21 14:09:50,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1532913.3333333333, ans=0.0 2023-11-21 14:09:51,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1532913.3333333333, ans=0.125 2023-11-21 14:09:56,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1532913.3333333333, ans=0.125 2023-11-21 14:10:00,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1532913.3333333333, ans=0.125 2023-11-21 14:10:02,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1532980.0, ans=0.125 2023-11-21 14:10:03,591 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1500, loss[loss=0.06515, simple_loss=0.09, pruned_loss=0.01237, audio_tagging_loss=0.007785, over 14475.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.09488, pruned_loss=0.01644, audio_tagging_loss=0.009616, over 3042319.62 frames. ], batch size: 53, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:10:04,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1532980.0, ans=0.0 2023-11-21 14:10:04,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 229950 2023-11-21 14:10:16,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-11-21 14:10:18,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.143e+01 8.696e+01 9.361e+01 1.676e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 14:10:22,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:10:22,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1533046.6666666667, ans=0.125 2023-11-21 14:10:39,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1533113.3333333333, ans=0.125 2023-11-21 14:10:55,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1533246.6666666667, ans=0.05 2023-11-21 14:10:55,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-21 14:11:08,124 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1550, loss[loss=0.0685, simple_loss=0.0834, pruned_loss=0.01526, audio_tagging_loss=0.01153, over 15712.00 frames. ], tot_loss[loss=0.07353, simple_loss=0.09483, pruned_loss=0.01633, audio_tagging_loss=0.009783, over 3038306.40 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:11:09,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1533313.3333333333, ans=0.125 2023-11-21 14:11:10,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230000 2023-11-21 14:11:36,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-21 14:11:37,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1533446.6666666667, ans=0.2 2023-11-21 14:11:41,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1533446.6666666667, ans=0.0 2023-11-21 14:12:14,366 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1600, loss[loss=0.05253, simple_loss=0.06624, pruned_loss=0.01004, audio_tagging_loss=0.009365, over 15588.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.09559, pruned_loss=0.01642, audio_tagging_loss=0.009721, over 3041948.89 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:12:16,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230050 2023-11-21 14:12:24,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1533646.6666666667, ans=0.2 2023-11-21 14:12:30,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.321e+01 8.893e+01 9.688e+01 1.461e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-21 14:12:30,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1533713.3333333333, ans=0.0 2023-11-21 14:12:34,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2023-11-21 14:12:35,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1533713.3333333333, ans=0.2 2023-11-21 14:12:42,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1533780.0, ans=0.1 2023-11-21 14:12:45,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1533780.0, ans=0.0 2023-11-21 14:12:50,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1533780.0, ans=0.1 2023-11-21 14:13:19,707 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1650, loss[loss=0.06874, simple_loss=0.09019, pruned_loss=0.01413, audio_tagging_loss=0.009516, over 14561.00 frames. ], tot_loss[loss=0.07456, simple_loss=0.09656, pruned_loss=0.01657, audio_tagging_loss=0.009712, over 3050471.63 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:13:21,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230100 2023-11-21 14:13:24,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-21 14:13:26,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1533980.0, ans=0.0 2023-11-21 14:13:31,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=22.5 2023-11-21 14:13:35,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1534046.6666666667, ans=0.0 2023-11-21 14:13:36,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1534046.6666666667, ans=0.0 2023-11-21 14:14:11,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-21 14:14:24,713 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1700, loss[loss=0.0832, simple_loss=0.1018, pruned_loss=0.02406, audio_tagging_loss=0.008227, over 17766.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09679, pruned_loss=0.01663, audio_tagging_loss=0.009652, over 3051989.43 frames. ], batch size: 66, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:14:26,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230150 2023-11-21 14:14:31,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:14:40,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.297e+01 8.260e+01 8.837e+01 9.595e+01 1.453e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 14:14:43,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1534380.0, ans=0.125 2023-11-21 14:14:44,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1534380.0, ans=0.0 2023-11-21 14:14:44,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1534380.0, ans=0.0 2023-11-21 14:15:01,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1534446.6666666667, ans=0.0 2023-11-21 14:15:05,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1534513.3333333333, ans=0.125 2023-11-21 14:15:06,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1534513.3333333333, ans=0.0 2023-11-21 14:15:14,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2023-11-21 14:15:17,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1534580.0, ans=0.125 2023-11-21 14:15:28,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1534580.0, ans=0.125 2023-11-21 14:15:30,391 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1750, loss[loss=0.09748, simple_loss=0.135, pruned_loss=0.02209, audio_tagging_loss=0.007903, over 16067.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09734, pruned_loss=0.01658, audio_tagging_loss=0.00958, over 3053785.31 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2023-11-21 14:15:31,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230200 2023-11-21 14:15:38,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1534646.6666666667, ans=0.0 2023-11-21 14:15:42,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1534713.3333333333, ans=0.125 2023-11-21 14:15:45,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1534713.3333333333, ans=0.0 2023-11-21 14:16:08,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1534846.6666666667, ans=0.0 2023-11-21 14:16:28,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1534913.3333333333, ans=0.125 2023-11-21 14:16:28,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1534913.3333333333, ans=0.125 2023-11-21 14:16:34,748 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1800, loss[loss=0.06925, simple_loss=0.08188, pruned_loss=0.01752, audio_tagging_loss=0.01078, over 14551.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09661, pruned_loss=0.01639, audio_tagging_loss=0.009585, over 3060749.82 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:16:36,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230250 2023-11-21 14:16:42,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1534980.0, ans=0.125 2023-11-21 14:16:50,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-21 14:16:51,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.667e+01 8.097e+01 8.882e+01 9.608e+01 1.325e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-21 14:16:59,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535046.6666666667, ans=0.1 2023-11-21 14:16:59,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1535046.6666666667, ans=0.1 2023-11-21 14:17:08,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-21 14:17:12,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1535180.0, ans=0.125 2023-11-21 14:17:39,530 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1850, loss[loss=0.07325, simple_loss=0.08781, pruned_loss=0.01689, audio_tagging_loss=0.01245, over 15690.00 frames. ], tot_loss[loss=0.07419, simple_loss=0.09614, pruned_loss=0.01653, audio_tagging_loss=0.009589, over 3050351.22 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:17:40,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230300 2023-11-21 14:17:43,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-21 14:17:46,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1535313.3333333333, ans=0.0 2023-11-21 14:17:51,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-21 14:18:07,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1535446.6666666667, ans=0.035 2023-11-21 14:18:45,275 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1900, loss[loss=0.08425, simple_loss=0.1063, pruned_loss=0.02068, audio_tagging_loss=0.01041, over 14980.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09623, pruned_loss=0.01635, audio_tagging_loss=0.009511, over 3053797.03 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 16.0 2023-11-21 14:18:46,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230350 2023-11-21 14:19:01,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.472e+01 7.941e+01 8.654e+01 9.429e+01 1.278e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 14:19:01,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1535713.3333333333, ans=10.0 2023-11-21 14:19:13,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1535780.0, ans=0.5 2023-11-21 14:19:17,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1535780.0, ans=0.0 2023-11-21 14:19:20,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1535780.0, ans=0.1 2023-11-21 14:19:28,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1535846.6666666667, ans=0.2 2023-11-21 14:19:30,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-21 14:19:34,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1535846.6666666667, ans=0.125 2023-11-21 14:19:49,114 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 1950, loss[loss=0.08898, simple_loss=0.1251, pruned_loss=0.02057, audio_tagging_loss=0.005846, over 15633.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09577, pruned_loss=0.01628, audio_tagging_loss=0.009512, over 3052187.46 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:19:50,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230400 2023-11-21 14:20:01,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1536046.6666666667, ans=0.1 2023-11-21 14:20:06,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1536046.6666666667, ans=0.05 2023-11-21 14:20:21,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1536113.3333333333, ans=0.0 2023-11-21 14:20:43,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1536246.6666666667, ans=0.125 2023-11-21 14:20:54,215 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2000, loss[loss=0.06524, simple_loss=0.08232, pruned_loss=0.01425, audio_tagging_loss=0.009828, over 13416.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09571, pruned_loss=0.01627, audio_tagging_loss=0.009555, over 3044954.56 frames. ], batch size: 50, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:20:54,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1536313.3333333333, ans=0.2 2023-11-21 14:20:55,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230450 2023-11-21 14:21:00,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1536313.3333333333, ans=0.125 2023-11-21 14:21:02,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1536313.3333333333, ans=0.09899494936611666 2023-11-21 14:21:09,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-21 14:21:10,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 7.989e+01 8.739e+01 9.353e+01 1.086e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 14:21:31,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1536513.3333333333, ans=10.0 2023-11-21 14:21:58,520 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2050, loss[loss=0.08785, simple_loss=0.1236, pruned_loss=0.01937, audio_tagging_loss=0.006678, over 16383.00 frames. ], tot_loss[loss=0.07413, simple_loss=0.0962, pruned_loss=0.01648, audio_tagging_loss=0.009546, over 3047651.76 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:21:59,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-11-21 14:21:59,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230500 2023-11-21 14:22:05,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1536646.6666666667, ans=0.0 2023-11-21 14:22:08,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1536646.6666666667, ans=0.015 2023-11-21 14:22:41,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-21 14:23:01,830 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2100, loss[loss=0.08127, simple_loss=0.1041, pruned_loss=0.0208, audio_tagging_loss=0.008442, over 14622.00 frames. ], tot_loss[loss=0.07375, simple_loss=0.09567, pruned_loss=0.01635, audio_tagging_loss=0.009561, over 3041288.92 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:23:03,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230550 2023-11-21 14:23:14,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1537046.6666666667, ans=0.125 2023-11-21 14:23:20,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.079e+01 8.846e+01 9.402e+01 1.190e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 14:23:28,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1537113.3333333333, ans=0.125 2023-11-21 14:23:58,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1537246.6666666667, ans=0.0 2023-11-21 14:24:05,857 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2150, loss[loss=0.07062, simple_loss=0.09621, pruned_loss=0.01389, audio_tagging_loss=0.008626, over 15074.00 frames. ], tot_loss[loss=0.07426, simple_loss=0.09603, pruned_loss=0.01669, audio_tagging_loss=0.009556, over 3050135.09 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:24:07,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230600 2023-11-21 14:24:08,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=8.0 2023-11-21 14:24:13,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1537313.3333333333, ans=0.125 2023-11-21 14:24:26,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1537380.0, ans=0.125 2023-11-21 14:24:40,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1537446.6666666667, ans=0.0 2023-11-21 14:24:45,365 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:24:50,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1537513.3333333333, ans=0.05 2023-11-21 14:24:59,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-21 14:25:11,559 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2200, loss[loss=0.07278, simple_loss=0.09284, pruned_loss=0.0148, audio_tagging_loss=0.01156, over 16097.00 frames. ], tot_loss[loss=0.07399, simple_loss=0.09549, pruned_loss=0.01671, audio_tagging_loss=0.009533, over 3045827.51 frames. ], batch size: 61, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:25:12,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230650 2023-11-21 14:25:19,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1537646.6666666667, ans=0.2 2023-11-21 14:25:20,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2023-11-21 14:25:21,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-21 14:25:28,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.037e+01 8.654e+01 9.441e+01 1.350e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 14:25:33,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:25:38,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1537780.0, ans=0.125 2023-11-21 14:25:41,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1537780.0, ans=0.04949747468305833 2023-11-21 14:25:42,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1537780.0, ans=0.125 2023-11-21 14:25:43,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1537780.0, ans=0.125 2023-11-21 14:25:47,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1537846.6666666667, ans=0.09899494936611666 2023-11-21 14:26:01,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=22.5 2023-11-21 14:26:07,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1537913.3333333333, ans=0.0 2023-11-21 14:26:14,307 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2250, loss[loss=0.09287, simple_loss=0.126, pruned_loss=0.02098, audio_tagging_loss=0.008892, over 15459.00 frames. ], tot_loss[loss=0.07413, simple_loss=0.09562, pruned_loss=0.01669, audio_tagging_loss=0.009632, over 3045416.28 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:26:15,601 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230700 2023-11-21 14:26:30,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-21 14:26:41,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-21 14:26:43,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1538113.3333333333, ans=0.0 2023-11-21 14:27:17,387 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2300, loss[loss=0.05757, simple_loss=0.06787, pruned_loss=0.01331, audio_tagging_loss=0.01033, over 15547.00 frames. ], tot_loss[loss=0.07386, simple_loss=0.09538, pruned_loss=0.01644, audio_tagging_loss=0.00973, over 3050370.86 frames. ], batch size: 61, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:27:18,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230750 2023-11-21 14:27:20,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1538313.3333333333, ans=0.2 2023-11-21 14:27:26,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1538313.3333333333, ans=0.1 2023-11-21 14:27:28,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1538313.3333333333, ans=0.125 2023-11-21 14:27:36,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 7.885e+01 8.573e+01 9.171e+01 1.137e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-21 14:27:38,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1538380.0, ans=0.125 2023-11-21 14:27:41,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1538380.0, ans=0.125 2023-11-21 14:27:42,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1538446.6666666667, ans=0.09899494936611666 2023-11-21 14:27:52,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1538446.6666666667, ans=0.125 2023-11-21 14:28:03,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1538513.3333333333, ans=0.125 2023-11-21 14:28:13,478 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:28:13,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1538580.0, ans=0.125 2023-11-21 14:28:22,098 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2350, loss[loss=0.07344, simple_loss=0.09605, pruned_loss=0.01505, audio_tagging_loss=0.01037, over 14071.00 frames. ], tot_loss[loss=0.07413, simple_loss=0.09591, pruned_loss=0.01641, audio_tagging_loss=0.009764, over 3057914.48 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:28:23,423 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230800 2023-11-21 14:28:27,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1538646.6666666667, ans=0.125 2023-11-21 14:28:28,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1538646.6666666667, ans=0.2 2023-11-21 14:28:31,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1538646.6666666667, ans=0.2 2023-11-21 14:29:26,657 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2400, loss[loss=0.05882, simple_loss=0.06552, pruned_loss=0.01328, audio_tagging_loss=0.01278, over 14754.00 frames. ], tot_loss[loss=0.07471, simple_loss=0.0966, pruned_loss=0.01661, audio_tagging_loss=0.009795, over 3057161.61 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:29:28,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230850 2023-11-21 14:29:30,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1538980.0, ans=0.5 2023-11-21 14:29:43,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.345e+01 8.985e+01 9.638e+01 1.227e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-21 14:30:00,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1539113.3333333333, ans=0.125 2023-11-21 14:30:08,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-21 14:30:14,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-21 14:30:15,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1539180.0, ans=0.125 2023-11-21 14:30:15,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1539180.0, ans=0.1 2023-11-21 14:30:20,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1539246.6666666667, ans=0.125 2023-11-21 14:30:30,111 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2450, loss[loss=0.05417, simple_loss=0.06138, pruned_loss=0.01147, audio_tagging_loss=0.01201, over 14617.00 frames. ], tot_loss[loss=0.07492, simple_loss=0.09681, pruned_loss=0.01669, audio_tagging_loss=0.009824, over 3054669.70 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:30:31,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230900 2023-11-21 14:30:42,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-21 14:31:29,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1539580.0, ans=0.0 2023-11-21 14:31:31,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1539580.0, ans=0.0 2023-11-21 14:31:35,090 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2500, loss[loss=0.07769, simple_loss=0.1014, pruned_loss=0.01481, audio_tagging_loss=0.01218, over 15004.00 frames. ], tot_loss[loss=0.07556, simple_loss=0.09775, pruned_loss=0.01691, audio_tagging_loss=0.009782, over 3055034.52 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:31:37,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 230950 2023-11-21 14:31:53,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 7.942e+01 8.529e+01 9.362e+01 1.301e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-21 14:32:04,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1539780.0, ans=0.125 2023-11-21 14:32:40,128 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2550, loss[loss=0.07924, simple_loss=0.09888, pruned_loss=0.01907, audio_tagging_loss=0.01072, over 15376.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09685, pruned_loss=0.0167, audio_tagging_loss=0.009755, over 3053170.36 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:32:41,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231000 2023-11-21 14:32:45,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1539980.0, ans=0.125 2023-11-21 14:32:49,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1539980.0, ans=0.125 2023-11-21 14:32:52,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1540046.6666666667, ans=0.125 2023-11-21 14:32:55,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1540046.6666666667, ans=0.125 2023-11-21 14:33:03,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-21 14:33:06,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1540113.3333333333, ans=0.0 2023-11-21 14:33:09,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1540113.3333333333, ans=0.1 2023-11-21 14:33:18,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1540180.0, ans=0.125 2023-11-21 14:33:20,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-11-21 14:33:25,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1540180.0, ans=0.0 2023-11-21 14:33:25,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1540180.0, ans=0.125 2023-11-21 14:33:25,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1540180.0, ans=0.0 2023-11-21 14:33:44,018 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2600, loss[loss=0.07327, simple_loss=0.09635, pruned_loss=0.01704, audio_tagging_loss=0.008052, over 14963.00 frames. ], tot_loss[loss=0.07378, simple_loss=0.0958, pruned_loss=0.01629, audio_tagging_loss=0.00959, over 3047171.21 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:33:45,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231050 2023-11-21 14:34:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1540380.0, ans=0.0 2023-11-21 14:34:03,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 7.933e+01 8.665e+01 9.563e+01 1.438e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 14:34:06,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1540380.0, ans=0.125 2023-11-21 14:34:27,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1540513.3333333333, ans=0.125 2023-11-21 14:34:38,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1540580.0, ans=0.125 2023-11-21 14:34:47,256 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2650, loss[loss=0.06155, simple_loss=0.08456, pruned_loss=0.007557, audio_tagging_loss=0.01171, over 15732.00 frames. ], tot_loss[loss=0.07373, simple_loss=0.09606, pruned_loss=0.01621, audio_tagging_loss=0.00949, over 3042209.33 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:34:49,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231100 2023-11-21 14:34:50,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-21 14:34:53,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1540646.6666666667, ans=0.125 2023-11-21 14:34:59,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1540713.3333333333, ans=0.125 2023-11-21 14:35:07,639 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:35:14,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1540780.0, ans=0.05 2023-11-21 14:35:27,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-21 14:35:28,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1540846.6666666667, ans=0.1 2023-11-21 14:35:28,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1540846.6666666667, ans=0.02 2023-11-21 14:35:42,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1540913.3333333333, ans=0.2 2023-11-21 14:35:51,737 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2700, loss[loss=0.0581, simple_loss=0.06482, pruned_loss=0.01318, audio_tagging_loss=0.01251, over 14500.00 frames. ], tot_loss[loss=0.07378, simple_loss=0.09608, pruned_loss=0.01627, audio_tagging_loss=0.009466, over 3043467.01 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:35:53,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231150 2023-11-21 14:36:10,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.073e+01 8.815e+01 9.662e+01 1.358e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-21 14:36:14,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1541046.6666666667, ans=0.2 2023-11-21 14:36:19,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1541113.3333333333, ans=0.125 2023-11-21 14:36:33,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1541180.0, ans=0.04949747468305833 2023-11-21 14:36:55,377 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2750, loss[loss=0.05591, simple_loss=0.06801, pruned_loss=0.01217, audio_tagging_loss=0.00973, over 14518.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09557, pruned_loss=0.01627, audio_tagging_loss=0.0095, over 3045814.60 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:36:55,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2023-11-21 14:36:56,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231200 2023-11-21 14:36:59,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1541313.3333333333, ans=0.125 2023-11-21 14:36:59,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1541313.3333333333, ans=0.125 2023-11-21 14:37:00,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1541313.3333333333, ans=0.1 2023-11-21 14:37:12,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1541380.0, ans=0.125 2023-11-21 14:37:30,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1541446.6666666667, ans=0.125 2023-11-21 14:37:50,355 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:37:58,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1541646.6666666667, ans=0.0 2023-11-21 14:37:59,538 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2800, loss[loss=0.05693, simple_loss=0.07867, pruned_loss=0.009131, audio_tagging_loss=0.008462, over 14256.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.0953, pruned_loss=0.01619, audio_tagging_loss=0.009523, over 3050131.42 frames. ], batch size: 53, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:37:59,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1541646.6666666667, ans=0.1 2023-11-21 14:38:00,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231250 2023-11-21 14:38:05,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1541646.6666666667, ans=0.125 2023-11-21 14:38:09,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1541646.6666666667, ans=0.125 2023-11-21 14:38:19,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.694e+01 8.133e+01 8.565e+01 9.239e+01 1.235e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 14:38:35,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1541780.0, ans=0.05 2023-11-21 14:38:49,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1541846.6666666667, ans=0.1 2023-11-21 14:38:58,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1541913.3333333333, ans=0.0 2023-11-21 14:39:04,750 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2850, loss[loss=0.0769, simple_loss=0.106, pruned_loss=0.01468, audio_tagging_loss=0.009248, over 15005.00 frames. ], tot_loss[loss=0.0729, simple_loss=0.09453, pruned_loss=0.01612, audio_tagging_loss=0.009518, over 3035874.63 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:39:06,095 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231300 2023-11-21 14:39:29,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1542113.3333333333, ans=0.125 2023-11-21 14:39:38,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-21 14:39:50,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1542180.0, ans=0.2 2023-11-21 14:39:53,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2023-11-21 14:39:57,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542246.6666666667, ans=0.1 2023-11-21 14:40:02,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1542246.6666666667, ans=0.0 2023-11-21 14:40:05,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1542246.6666666667, ans=0.2 2023-11-21 14:40:09,146 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2900, loss[loss=0.06916, simple_loss=0.0907, pruned_loss=0.01352, audio_tagging_loss=0.01029, over 15446.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.0942, pruned_loss=0.01616, audio_tagging_loss=0.00948, over 3036046.23 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:40:10,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231350 2023-11-21 14:40:29,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.134e+01 8.745e+01 9.473e+01 1.198e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-21 14:40:32,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1542380.0, ans=0.125 2023-11-21 14:40:32,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1542380.0, ans=0.0 2023-11-21 14:40:44,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1542446.6666666667, ans=0.0 2023-11-21 14:41:00,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1542580.0, ans=0.0 2023-11-21 14:41:12,927 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 2950, loss[loss=0.1102, simple_loss=0.1508, pruned_loss=0.02682, audio_tagging_loss=0.007964, over 15248.00 frames. ], tot_loss[loss=0.07399, simple_loss=0.09589, pruned_loss=0.01654, audio_tagging_loss=0.009509, over 3042783.70 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:41:14,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231400 2023-11-21 14:41:49,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-21 14:42:06,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-11-21 14:42:18,039 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3000, loss[loss=0.07781, simple_loss=0.1031, pruned_loss=0.01756, audio_tagging_loss=0.008678, over 15066.00 frames. ], tot_loss[loss=0.07442, simple_loss=0.09626, pruned_loss=0.01678, audio_tagging_loss=0.00951, over 3044577.25 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:42:18,040 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 14:42:56,761 INFO [train_asr.py:1253] (1/4) Epoch 20, validation: loss=0.05942, simple_loss=0.05225, pruned_loss=0.00524, audio_tagging_loss=0.02805, over 4681554.00 frames. 2023-11-21 14:42:56,762 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 14:42:58,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231450 2023-11-21 14:42:58,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1542980.0, ans=0.0 2023-11-21 14:43:07,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1542980.0, ans=0.125 2023-11-21 14:43:17,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.247e+01 8.896e+01 9.640e+01 1.197e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-21 14:43:23,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1543113.3333333333, ans=0.5 2023-11-21 14:43:23,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1543113.3333333333, ans=0.125 2023-11-21 14:43:34,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1543180.0, ans=0.125 2023-11-21 14:43:42,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543180.0, ans=0.1 2023-11-21 14:43:57,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1543246.6666666667, ans=0.1 2023-11-21 14:43:58,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1543246.6666666667, ans=0.2 2023-11-21 14:44:00,893 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3050, loss[loss=0.07099, simple_loss=0.09271, pruned_loss=0.015, audio_tagging_loss=0.009635, over 14156.00 frames. ], tot_loss[loss=0.07456, simple_loss=0.09654, pruned_loss=0.01673, audio_tagging_loss=0.009562, over 3048252.85 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:44:02,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231500 2023-11-21 14:44:08,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1543313.3333333333, ans=0.09899494936611666 2023-11-21 14:44:15,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1543380.0, ans=0.125 2023-11-21 14:44:23,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1543380.0, ans=0.125 2023-11-21 14:44:37,610 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:44:50,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2023-11-21 14:45:04,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1543646.6666666667, ans=0.125 2023-11-21 14:45:05,759 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3100, loss[loss=0.0685, simple_loss=0.08007, pruned_loss=0.01726, audio_tagging_loss=0.01121, over 15554.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09715, pruned_loss=0.01681, audio_tagging_loss=0.009687, over 3046392.02 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:45:07,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231550 2023-11-21 14:45:08,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1543646.6666666667, ans=0.0 2023-11-21 14:45:22,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1543713.3333333333, ans=0.0 2023-11-21 14:45:24,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543713.3333333333, ans=0.125 2023-11-21 14:45:25,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 8.074e+01 8.626e+01 9.315e+01 1.250e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-21 14:45:29,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1543780.0, ans=0.125 2023-11-21 14:45:31,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1543780.0, ans=0.125 2023-11-21 14:45:47,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543846.6666666667, ans=0.1 2023-11-21 14:46:07,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543980.0, ans=0.1 2023-11-21 14:46:08,544 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3150, loss[loss=0.07027, simple_loss=0.08415, pruned_loss=0.0157, audio_tagging_loss=0.01249, over 14751.00 frames. ], tot_loss[loss=0.07496, simple_loss=0.09694, pruned_loss=0.01673, audio_tagging_loss=0.00976, over 3045613.35 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 16.0 2023-11-21 14:46:09,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231600 2023-11-21 14:46:14,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1543980.0, ans=0.2 2023-11-21 14:46:15,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1543980.0, ans=0.035 2023-11-21 14:46:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1543980.0, ans=0.125 2023-11-21 14:46:59,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1544246.6666666667, ans=0.1 2023-11-21 14:47:05,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1544246.6666666667, ans=0.5 2023-11-21 14:47:06,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1544246.6666666667, ans=0.0 2023-11-21 14:47:13,968 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3200, loss[loss=0.08263, simple_loss=0.1117, pruned_loss=0.01514, audio_tagging_loss=0.01165, over 15507.00 frames. ], tot_loss[loss=0.07429, simple_loss=0.09568, pruned_loss=0.01652, audio_tagging_loss=0.009926, over 3043660.60 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:47:15,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231650 2023-11-21 14:47:21,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1544313.3333333333, ans=0.125 2023-11-21 14:47:24,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1544313.3333333333, ans=0.0 2023-11-21 14:47:24,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-21 14:47:34,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.766e+01 7.989e+01 8.860e+01 9.564e+01 1.510e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-21 14:47:44,984 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:48:13,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1544580.0, ans=15.0 2023-11-21 14:48:19,241 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3250, loss[loss=0.07982, simple_loss=0.1039, pruned_loss=0.01821, audio_tagging_loss=0.009682, over 16566.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09599, pruned_loss=0.01644, audio_tagging_loss=0.009844, over 3044831.17 frames. ], batch size: 61, lr: 3.45e-03, grad_scale: 32.0 2023-11-21 14:48:20,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231700 2023-11-21 14:48:20,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1544646.6666666667, ans=0.125 2023-11-21 14:48:23,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1544646.6666666667, ans=0.015 2023-11-21 14:49:07,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544846.6666666667, ans=0.1 2023-11-21 14:49:18,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1544913.3333333333, ans=0.1 2023-11-21 14:49:22,562 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3300, loss[loss=0.07502, simple_loss=0.1086, pruned_loss=0.01465, audio_tagging_loss=0.006067, over 15265.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09491, pruned_loss=0.01633, audio_tagging_loss=0.00992, over 3043846.81 frames. ], batch size: 55, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:49:23,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231750 2023-11-21 14:49:27,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1544980.0, ans=0.125 2023-11-21 14:49:32,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1544980.0, ans=0.125 2023-11-21 14:49:43,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.162e+01 8.724e+01 9.354e+01 1.405e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 14:50:01,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1545180.0, ans=0.1 2023-11-21 14:50:06,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1545180.0, ans=0.125 2023-11-21 14:50:08,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1545180.0, ans=0.1 2023-11-21 14:50:25,994 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3350, loss[loss=0.09096, simple_loss=0.129, pruned_loss=0.01933, audio_tagging_loss=0.007121, over 15701.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09584, pruned_loss=0.01646, audio_tagging_loss=0.00978, over 3047325.41 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:50:27,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231800 2023-11-21 14:50:30,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1545313.3333333333, ans=0.125 2023-11-21 14:50:51,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1545446.6666666667, ans=0.1 2023-11-21 14:50:56,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1545446.6666666667, ans=0.125 2023-11-21 14:51:17,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1545580.0, ans=0.1 2023-11-21 14:51:28,344 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 14:51:30,452 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3400, loss[loss=0.07086, simple_loss=0.09543, pruned_loss=0.01475, audio_tagging_loss=0.0084, over 15369.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09628, pruned_loss=0.01644, audio_tagging_loss=0.009666, over 3044562.48 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:51:31,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231850 2023-11-21 14:51:34,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-21 14:51:40,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2023-11-21 14:51:44,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=15.0 2023-11-21 14:51:49,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.184e+01 8.749e+01 9.223e+01 1.235e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 14:51:57,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1545780.0, ans=0.0 2023-11-21 14:52:02,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1545780.0, ans=0.2 2023-11-21 14:52:06,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1545846.6666666667, ans=0.125 2023-11-21 14:52:18,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1545846.6666666667, ans=0.125 2023-11-21 14:52:33,413 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3450, loss[loss=0.09574, simple_loss=0.1339, pruned_loss=0.0225, audio_tagging_loss=0.006306, over 16172.00 frames. ], tot_loss[loss=0.07449, simple_loss=0.09668, pruned_loss=0.0166, audio_tagging_loss=0.009557, over 3041481.04 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:52:34,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2023-11-21 14:52:34,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231900 2023-11-21 14:52:37,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-21 14:52:47,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1546046.6666666667, ans=0.0 2023-11-21 14:52:51,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1546046.6666666667, ans=0.1 2023-11-21 14:53:28,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1546246.6666666667, ans=0.0 2023-11-21 14:53:28,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1546246.6666666667, ans=0.125 2023-11-21 14:53:36,599 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3500, loss[loss=0.08628, simple_loss=0.1112, pruned_loss=0.02131, audio_tagging_loss=0.009379, over 15172.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09662, pruned_loss=0.01667, audio_tagging_loss=0.009406, over 3047465.16 frames. ], batch size: 54, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 14:53:38,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 231950 2023-11-21 14:53:42,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1546313.3333333333, ans=0.125 2023-11-21 14:53:57,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1546380.0, ans=0.2 2023-11-21 14:53:59,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.435e+01 8.228e+01 8.776e+01 9.674e+01 1.230e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 14:54:10,767 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:54:11,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1546446.6666666667, ans=0.125 2023-11-21 14:54:12,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1546446.6666666667, ans=0.125 2023-11-21 14:54:14,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1546513.3333333333, ans=0.0 2023-11-21 14:54:24,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1546513.3333333333, ans=0.0 2023-11-21 14:54:41,973 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3550, loss[loss=0.06306, simple_loss=0.0769, pruned_loss=0.01406, audio_tagging_loss=0.01055, over 15030.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09649, pruned_loss=0.01664, audio_tagging_loss=0.009437, over 3053238.11 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 14:54:43,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232000 2023-11-21 14:54:51,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.45 vs. limit=10.0 2023-11-21 14:54:59,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1546713.3333333333, ans=0.0 2023-11-21 14:55:13,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1546780.0, ans=0.125 2023-11-21 14:55:15,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2023-11-21 14:55:20,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1546780.0, ans=0.125 2023-11-21 14:55:31,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1546846.6666666667, ans=0.1 2023-11-21 14:55:41,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1546913.3333333333, ans=0.0 2023-11-21 14:55:43,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1546913.3333333333, ans=0.125 2023-11-21 14:55:49,453 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3600, loss[loss=0.08025, simple_loss=0.1069, pruned_loss=0.01595, audio_tagging_loss=0.01083, over 15881.00 frames. ], tot_loss[loss=0.07396, simple_loss=0.09588, pruned_loss=0.01658, audio_tagging_loss=0.009443, over 3050293.62 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:55:50,752 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232050 2023-11-21 14:56:10,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 7.980e+01 8.692e+01 9.366e+01 1.171e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 14:56:39,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1547180.0, ans=0.125 2023-11-21 14:56:53,203 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3650, loss[loss=0.06241, simple_loss=0.08413, pruned_loss=0.01126, audio_tagging_loss=0.009081, over 16330.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09624, pruned_loss=0.01663, audio_tagging_loss=0.009395, over 3048237.05 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:56:54,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232100 2023-11-21 14:57:19,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1547446.6666666667, ans=0.125 2023-11-21 14:57:23,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1547446.6666666667, ans=0.0 2023-11-21 14:57:24,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1547446.6666666667, ans=0.125 2023-11-21 14:57:50,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1547580.0, ans=0.0 2023-11-21 14:57:58,552 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3700, loss[loss=0.08892, simple_loss=0.1163, pruned_loss=0.02, audio_tagging_loss=0.01079, over 15607.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09508, pruned_loss=0.01641, audio_tagging_loss=0.009468, over 3049601.14 frames. ], batch size: 55, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:57:59,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232150 2023-11-21 14:58:06,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1547646.6666666667, ans=0.125 2023-11-21 14:58:20,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.102e+01 8.765e+01 9.489e+01 1.317e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-21 14:58:25,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1547780.0, ans=0.05 2023-11-21 14:58:25,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1547780.0, ans=0.0 2023-11-21 14:58:30,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1547780.0, ans=0.125 2023-11-21 14:58:31,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-11-21 14:58:33,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-11-21 14:58:53,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1547913.3333333333, ans=0.2 2023-11-21 14:58:55,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1547913.3333333333, ans=0.1 2023-11-21 14:59:00,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1547913.3333333333, ans=0.0 2023-11-21 14:59:02,963 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3750, loss[loss=0.06409, simple_loss=0.08274, pruned_loss=0.01282, audio_tagging_loss=0.0099, over 15886.00 frames. ], tot_loss[loss=0.07446, simple_loss=0.09645, pruned_loss=0.01672, audio_tagging_loss=0.009516, over 3054450.91 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 14:59:04,280 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232200 2023-11-21 14:59:14,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1548046.6666666667, ans=0.125 2023-11-21 14:59:27,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1548113.3333333333, ans=0.125 2023-11-21 14:59:40,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1548180.0, ans=0.125 2023-11-21 14:59:40,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1548180.0, ans=0.0 2023-11-21 14:59:46,983 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 14:59:47,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1548180.0, ans=0.025 2023-11-21 14:59:53,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-21 15:00:04,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:00:06,894 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3800, loss[loss=0.05583, simple_loss=0.07181, pruned_loss=0.00952, audio_tagging_loss=0.01041, over 14732.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09745, pruned_loss=0.01683, audio_tagging_loss=0.0096, over 3049950.60 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:00:08,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232250 2023-11-21 15:00:29,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.346e+01 8.333e+01 9.130e+01 9.759e+01 1.207e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-21 15:00:34,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=12.0 2023-11-21 15:00:37,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1548446.6666666667, ans=0.2 2023-11-21 15:01:11,468 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3850, loss[loss=0.07827, simple_loss=0.1054, pruned_loss=0.01816, audio_tagging_loss=0.007421, over 15372.00 frames. ], tot_loss[loss=0.07478, simple_loss=0.09662, pruned_loss=0.01676, audio_tagging_loss=0.009707, over 3054461.40 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:01:12,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232300 2023-11-21 15:01:18,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2023-11-21 15:01:57,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1548846.6666666667, ans=15.0 2023-11-21 15:02:15,395 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3900, loss[loss=0.05258, simple_loss=0.07207, pruned_loss=0.00687, audio_tagging_loss=0.009676, over 15709.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09626, pruned_loss=0.01667, audio_tagging_loss=0.009742, over 3054027.81 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:02:16,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232350 2023-11-21 15:02:18,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1548980.0, ans=0.125 2023-11-21 15:02:25,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1548980.0, ans=0.0 2023-11-21 15:02:36,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.138e+01 8.613e+01 9.443e+01 1.181e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 15:02:44,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1549113.3333333333, ans=0.125 2023-11-21 15:03:08,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1549246.6666666667, ans=0.125 2023-11-21 15:03:15,758 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:03:16,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1549246.6666666667, ans=0.125 2023-11-21 15:03:19,034 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 3950, loss[loss=0.07937, simple_loss=0.0985, pruned_loss=0.02002, audio_tagging_loss=0.01009, over 15770.00 frames. ], tot_loss[loss=0.07419, simple_loss=0.0956, pruned_loss=0.01653, audio_tagging_loss=0.009856, over 3048823.97 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:03:20,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232400 2023-11-21 15:03:54,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1549446.6666666667, ans=0.125 2023-11-21 15:03:54,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1549446.6666666667, ans=0.1 2023-11-21 15:03:58,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1549513.3333333333, ans=0.0 2023-11-21 15:04:23,700 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4000, loss[loss=0.06383, simple_loss=0.08301, pruned_loss=0.009821, audio_tagging_loss=0.0125, over 14868.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09588, pruned_loss=0.01661, audio_tagging_loss=0.009876, over 3039405.28 frames. ], batch size: 57, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:04:25,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232450 2023-11-21 15:04:28,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1549646.6666666667, ans=0.125 2023-11-21 15:04:29,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1549646.6666666667, ans=0.0 2023-11-21 15:04:45,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.188e+01 8.912e+01 9.535e+01 1.152e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-21 15:05:23,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1549913.3333333333, ans=0.125 2023-11-21 15:05:28,170 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4050, loss[loss=0.08229, simple_loss=0.111, pruned_loss=0.01642, audio_tagging_loss=0.01037, over 14975.00 frames. ], tot_loss[loss=0.07452, simple_loss=0.09588, pruned_loss=0.0166, audio_tagging_loss=0.009988, over 3044202.08 frames. ], batch size: 54, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:05:29,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232500 2023-11-21 15:05:30,632 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 15:05:34,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1549980.0, ans=0.125 2023-11-21 15:06:26,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1550246.6666666667, ans=0.125 2023-11-21 15:06:29,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1550246.6666666667, ans=0.2 2023-11-21 15:06:32,490 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4100, loss[loss=0.0739, simple_loss=0.09498, pruned_loss=0.0169, audio_tagging_loss=0.009516, over 14282.00 frames. ], tot_loss[loss=0.07465, simple_loss=0.09615, pruned_loss=0.0167, audio_tagging_loss=0.009878, over 3043686.09 frames. ], batch size: 54, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:06:33,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232550 2023-11-21 15:06:35,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1550313.3333333333, ans=0.0 2023-11-21 15:06:40,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1550313.3333333333, ans=0.125 2023-11-21 15:06:46,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550380.0, ans=0.1 2023-11-21 15:06:55,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.215e+01 8.893e+01 9.542e+01 1.222e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-21 15:07:03,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1550446.6666666667, ans=0.125 2023-11-21 15:07:07,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-21 15:07:21,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1550513.3333333333, ans=0.0 2023-11-21 15:07:30,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1550580.0, ans=0.125 2023-11-21 15:07:36,351 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4150, loss[loss=0.0806, simple_loss=0.1038, pruned_loss=0.02077, audio_tagging_loss=0.007941, over 14527.00 frames. ], tot_loss[loss=0.07485, simple_loss=0.09675, pruned_loss=0.01682, audio_tagging_loss=0.009653, over 3046911.82 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:07:37,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232600 2023-11-21 15:07:38,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-21 15:07:47,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1550646.6666666667, ans=0.125 2023-11-21 15:07:50,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=22.5 2023-11-21 15:08:15,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1550846.6666666667, ans=0.0 2023-11-21 15:08:20,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1550846.6666666667, ans=0.0 2023-11-21 15:08:22,771 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 15:08:25,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1550846.6666666667, ans=0.0 2023-11-21 15:08:35,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1550913.3333333333, ans=0.125 2023-11-21 15:08:41,872 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4200, loss[loss=0.0633, simple_loss=0.08852, pruned_loss=0.01178, audio_tagging_loss=0.007258, over 14677.00 frames. ], tot_loss[loss=0.07553, simple_loss=0.09792, pruned_loss=0.01704, audio_tagging_loss=0.009532, over 3046988.98 frames. ], batch size: 55, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:08:43,222 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232650 2023-11-21 15:08:49,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1550980.0, ans=0.125 2023-11-21 15:08:51,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1550980.0, ans=0.125 2023-11-21 15:08:59,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1551046.6666666667, ans=0.2 2023-11-21 15:09:03,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.544e+01 8.218e+01 8.944e+01 9.900e+01 1.172e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-21 15:09:04,037 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:09:12,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1551113.3333333333, ans=0.2 2023-11-21 15:09:16,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1551113.3333333333, ans=0.125 2023-11-21 15:09:24,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1551180.0, ans=0.1 2023-11-21 15:09:27,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1551180.0, ans=0.1 2023-11-21 15:09:45,360 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4250, loss[loss=0.07973, simple_loss=0.1027, pruned_loss=0.01676, audio_tagging_loss=0.0116, over 16438.00 frames. ], tot_loss[loss=0.07557, simple_loss=0.09785, pruned_loss=0.01716, audio_tagging_loss=0.009489, over 3047537.53 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:09:46,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232700 2023-11-21 15:09:56,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-21 15:10:07,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1551380.0, ans=0.125 2023-11-21 15:10:07,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551380.0, ans=0.1 2023-11-21 15:10:08,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1551380.0, ans=15.0 2023-11-21 15:10:10,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1551380.0, ans=0.2 2023-11-21 15:10:19,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1551446.6666666667, ans=0.125 2023-11-21 15:10:39,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1551580.0, ans=0.125 2023-11-21 15:10:49,916 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4300, loss[loss=0.07714, simple_loss=0.09492, pruned_loss=0.01909, audio_tagging_loss=0.01059, over 14778.00 frames. ], tot_loss[loss=0.07585, simple_loss=0.09823, pruned_loss=0.0173, audio_tagging_loss=0.009433, over 3045000.89 frames. ], batch size: 57, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:10:51,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232750 2023-11-21 15:11:10,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1551713.3333333333, ans=0.0 2023-11-21 15:11:13,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.861e+01 8.351e+01 9.107e+01 1.007e+02 1.335e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-21 15:11:35,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1551846.6666666667, ans=0.125 2023-11-21 15:11:54,984 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4350, loss[loss=0.06031, simple_loss=0.08028, pruned_loss=0.01001, audio_tagging_loss=0.01016, over 16233.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09813, pruned_loss=0.01721, audio_tagging_loss=0.009403, over 3049775.06 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:11:56,300 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232800 2023-11-21 15:11:56,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1551980.0, ans=0.0 2023-11-21 15:12:11,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1552046.6666666667, ans=0.07 2023-11-21 15:12:19,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1552113.3333333333, ans=0.2 2023-11-21 15:12:39,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1552180.0, ans=0.125 2023-11-21 15:12:44,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1552180.0, ans=15.0 2023-11-21 15:12:53,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1552246.6666666667, ans=0.0 2023-11-21 15:12:53,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1552246.6666666667, ans=0.125 2023-11-21 15:12:55,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1552246.6666666667, ans=0.0 2023-11-21 15:12:58,332 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4400, loss[loss=0.09093, simple_loss=0.1252, pruned_loss=0.02231, audio_tagging_loss=0.006045, over 14389.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09833, pruned_loss=0.0172, audio_tagging_loss=0.009389, over 3049064.80 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:12:59,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232850 2023-11-21 15:12:59,992 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:13:02,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1552313.3333333333, ans=0.07 2023-11-21 15:13:20,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.302e+01 8.149e+01 8.654e+01 9.425e+01 1.207e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 15:13:44,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1552513.3333333333, ans=0.1 2023-11-21 15:14:01,243 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4450, loss[loss=0.06032, simple_loss=0.0728, pruned_loss=0.01305, audio_tagging_loss=0.01086, over 15043.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09772, pruned_loss=0.01695, audio_tagging_loss=0.00934, over 3050579.05 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:14:02,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232900 2023-11-21 15:14:46,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1552846.6666666667, ans=0.0 2023-11-21 15:14:55,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-21 15:15:03,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-11-21 15:15:06,397 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4500, loss[loss=0.07787, simple_loss=0.1044, pruned_loss=0.01732, audio_tagging_loss=0.008333, over 14977.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.09783, pruned_loss=0.01692, audio_tagging_loss=0.009349, over 3052385.52 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2023-11-21 15:15:06,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-21 15:15:07,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 232950 2023-11-21 15:15:08,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-21 15:15:11,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2023-11-21 15:15:12,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1552980.0, ans=0.125 2023-11-21 15:15:13,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1552980.0, ans=0.1 2023-11-21 15:15:17,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1552980.0, ans=0.2 2023-11-21 15:15:19,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1553046.6666666667, ans=0.125 2023-11-21 15:15:26,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1553046.6666666667, ans=0.5 2023-11-21 15:15:29,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1553046.6666666667, ans=0.125 2023-11-21 15:15:30,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 8.048e+01 8.795e+01 9.628e+01 1.621e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 15:15:37,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1553113.3333333333, ans=0.1 2023-11-21 15:15:46,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1553180.0, ans=0.125 2023-11-21 15:15:46,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1553180.0, ans=0.0 2023-11-21 15:15:46,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1553180.0, ans=0.125 2023-11-21 15:15:53,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1553180.0, ans=0.0 2023-11-21 15:16:01,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553246.6666666667, ans=0.125 2023-11-21 15:16:10,803 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4550, loss[loss=0.07715, simple_loss=0.1016, pruned_loss=0.01872, audio_tagging_loss=0.007622, over 15720.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09714, pruned_loss=0.01675, audio_tagging_loss=0.009414, over 3048344.91 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:16:12,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233000 2023-11-21 15:16:19,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1553313.3333333333, ans=0.1 2023-11-21 15:16:33,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1553380.0, ans=0.0 2023-11-21 15:16:44,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1553446.6666666667, ans=0.1 2023-11-21 15:17:00,273 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 15:17:07,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1553580.0, ans=0.1 2023-11-21 15:17:12,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.51 vs. limit=10.0 2023-11-21 15:17:15,283 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4600, loss[loss=0.05869, simple_loss=0.07529, pruned_loss=0.01118, audio_tagging_loss=0.009869, over 14996.00 frames. ], tot_loss[loss=0.07403, simple_loss=0.09625, pruned_loss=0.01639, audio_tagging_loss=0.009509, over 3056844.42 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 16.0 2023-11-21 15:17:16,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233050 2023-11-21 15:17:40,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.142e+01 8.759e+01 9.508e+01 1.262e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 15:17:53,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1553846.6666666667, ans=0.125 2023-11-21 15:18:06,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1553913.3333333333, ans=0.0 2023-11-21 15:18:21,098 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4650, loss[loss=0.07058, simple_loss=0.09053, pruned_loss=0.01572, audio_tagging_loss=0.009592, over 14630.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.09571, pruned_loss=0.01653, audio_tagging_loss=0.009559, over 3055569.68 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:18:23,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233100 2023-11-21 15:18:30,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-21 15:18:38,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1554046.6666666667, ans=0.0 2023-11-21 15:19:26,199 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4700, loss[loss=0.0665, simple_loss=0.08445, pruned_loss=0.01247, audio_tagging_loss=0.0118, over 14627.00 frames. ], tot_loss[loss=0.07408, simple_loss=0.09596, pruned_loss=0.0165, audio_tagging_loss=0.009595, over 3052211.70 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:19:27,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233150 2023-11-21 15:19:42,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1554380.0, ans=0.2 2023-11-21 15:19:43,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1554380.0, ans=0.02 2023-11-21 15:19:46,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1554380.0, ans=0.0 2023-11-21 15:19:49,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.239e+01 8.978e+01 9.572e+01 1.151e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-21 15:20:12,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1554513.3333333333, ans=0.0 2023-11-21 15:20:15,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1554513.3333333333, ans=0.1 2023-11-21 15:20:29,863 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4750, loss[loss=0.07715, simple_loss=0.09732, pruned_loss=0.01564, audio_tagging_loss=0.01285, over 15976.00 frames. ], tot_loss[loss=0.0738, simple_loss=0.09512, pruned_loss=0.01649, audio_tagging_loss=0.009747, over 3049383.05 frames. ], batch size: 58, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:20:31,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233200 2023-11-21 15:20:32,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1554646.6666666667, ans=0.0 2023-11-21 15:21:34,026 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4800, loss[loss=0.07078, simple_loss=0.09598, pruned_loss=0.01448, audio_tagging_loss=0.008306, over 15154.00 frames. ], tot_loss[loss=0.07295, simple_loss=0.09397, pruned_loss=0.01621, audio_tagging_loss=0.009758, over 3052452.82 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:21:36,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233250 2023-11-21 15:21:39,801 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:21:59,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.234e+01 8.749e+01 9.624e+01 1.145e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 15:22:04,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1555113.3333333333, ans=0.0 2023-11-21 15:22:06,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1555113.3333333333, ans=0.1 2023-11-21 15:22:08,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1555113.3333333333, ans=0.1 2023-11-21 15:22:40,375 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4850, loss[loss=0.07278, simple_loss=0.09682, pruned_loss=0.01553, audio_tagging_loss=0.008837, over 13734.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09407, pruned_loss=0.01631, audio_tagging_loss=0.009927, over 3049114.64 frames. ], batch size: 52, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:22:41,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233300 2023-11-21 15:22:55,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-21 15:23:29,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1555513.3333333333, ans=0.125 2023-11-21 15:23:35,316 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:23:40,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1555580.0, ans=0.125 2023-11-21 15:23:43,895 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4900, loss[loss=0.06701, simple_loss=0.08813, pruned_loss=0.0135, audio_tagging_loss=0.009455, over 16158.00 frames. ], tot_loss[loss=0.07314, simple_loss=0.09396, pruned_loss=0.01626, audio_tagging_loss=0.009904, over 3049053.08 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:23:45,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233350 2023-11-21 15:23:47,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1555646.6666666667, ans=0.125 2023-11-21 15:23:59,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1555713.3333333333, ans=0.125 2023-11-21 15:24:07,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1555713.3333333333, ans=0.04949747468305833 2023-11-21 15:24:08,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.805e+01 8.037e+01 8.614e+01 9.424e+01 1.262e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 15:24:15,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1555780.0, ans=0.0 2023-11-21 15:24:15,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1555780.0, ans=0.04949747468305833 2023-11-21 15:24:16,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1555780.0, ans=0.0 2023-11-21 15:24:20,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-21 15:24:27,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-21 15:24:34,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1555913.3333333333, ans=0.04949747468305833 2023-11-21 15:24:36,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555913.3333333333, ans=0.1 2023-11-21 15:24:45,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-21 15:24:48,002 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 4950, loss[loss=0.07127, simple_loss=0.09129, pruned_loss=0.01513, audio_tagging_loss=0.01049, over 15106.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.0949, pruned_loss=0.01634, audio_tagging_loss=0.009707, over 3042282.00 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:24:48,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1555980.0, ans=0.2 2023-11-21 15:24:49,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233400 2023-11-21 15:24:52,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2023-11-21 15:24:55,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1555980.0, ans=0.0 2023-11-21 15:25:02,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1556046.6666666667, ans=0.125 2023-11-21 15:25:06,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1556046.6666666667, ans=0.0 2023-11-21 15:25:44,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1556246.6666666667, ans=0.0 2023-11-21 15:25:53,387 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5000, loss[loss=0.0672, simple_loss=0.08878, pruned_loss=0.01427, audio_tagging_loss=0.008539, over 15867.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09581, pruned_loss=0.01646, audio_tagging_loss=0.009554, over 3046120.00 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:25:54,627 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233450 2023-11-21 15:26:04,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:26:17,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.018e+01 8.729e+01 9.400e+01 1.250e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 15:26:29,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1556446.6666666667, ans=0.0 2023-11-21 15:26:38,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1556513.3333333333, ans=0.125 2023-11-21 15:26:55,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1556580.0, ans=10.0 2023-11-21 15:26:57,626 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5050, loss[loss=0.07578, simple_loss=0.098, pruned_loss=0.01939, audio_tagging_loss=0.007391, over 14607.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.0951, pruned_loss=0.01626, audio_tagging_loss=0.009476, over 3041344.38 frames. ], batch size: 53, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:26:58,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233500 2023-11-21 15:27:07,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1556646.6666666667, ans=0.125 2023-11-21 15:27:18,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1556713.3333333333, ans=0.0 2023-11-21 15:27:26,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1556780.0, ans=0.0 2023-11-21 15:27:44,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1556846.6666666667, ans=0.125 2023-11-21 15:27:50,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1556913.3333333333, ans=0.125 2023-11-21 15:27:52,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=15.0 2023-11-21 15:27:57,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1556913.3333333333, ans=0.125 2023-11-21 15:28:01,673 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5100, loss[loss=0.07837, simple_loss=0.1054, pruned_loss=0.01576, audio_tagging_loss=0.009897, over 15341.00 frames. ], tot_loss[loss=0.07306, simple_loss=0.09499, pruned_loss=0.0161, audio_tagging_loss=0.009467, over 3042539.64 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:28:02,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233550 2023-11-21 15:28:13,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2023-11-21 15:28:15,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1557046.6666666667, ans=0.0 2023-11-21 15:28:22,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1557046.6666666667, ans=0.0 2023-11-21 15:28:26,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2023-11-21 15:28:26,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 7.860e+01 8.610e+01 9.200e+01 1.229e+02, threshold=1.722e+02, percent-clipped=0.0 2023-11-21 15:28:29,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1557113.3333333333, ans=0.1 2023-11-21 15:28:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1557113.3333333333, ans=0.0 2023-11-21 15:28:36,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1557113.3333333333, ans=0.0 2023-11-21 15:28:45,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1557180.0, ans=0.1 2023-11-21 15:29:06,986 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5150, loss[loss=0.07788, simple_loss=0.1135, pruned_loss=0.01466, audio_tagging_loss=0.006485, over 16157.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09511, pruned_loss=0.01612, audio_tagging_loss=0.009547, over 3041152.02 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:29:08,286 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233600 2023-11-21 15:29:18,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1557380.0, ans=0.125 2023-11-21 15:29:34,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1557446.6666666667, ans=0.125 2023-11-21 15:30:11,330 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5200, loss[loss=0.07891, simple_loss=0.1115, pruned_loss=0.01453, audio_tagging_loss=0.008648, over 16796.00 frames. ], tot_loss[loss=0.0736, simple_loss=0.0956, pruned_loss=0.01631, audio_tagging_loss=0.009486, over 3044489.82 frames. ], batch size: 60, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:30:12,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233650 2023-11-21 15:30:12,864 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:30:20,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1557646.6666666667, ans=0.0 2023-11-21 15:30:35,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.165e+01 8.844e+01 9.636e+01 1.346e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 15:30:39,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1557780.0, ans=0.1 2023-11-21 15:30:41,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1557780.0, ans=0.2 2023-11-21 15:30:52,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1557846.6666666667, ans=0.125 2023-11-21 15:30:55,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1557846.6666666667, ans=0.125 2023-11-21 15:31:15,906 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5250, loss[loss=0.08355, simple_loss=0.1054, pruned_loss=0.021, audio_tagging_loss=0.009827, over 15626.00 frames. ], tot_loss[loss=0.07422, simple_loss=0.09637, pruned_loss=0.0166, audio_tagging_loss=0.009436, over 3038345.47 frames. ], batch size: 58, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:31:17,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233700 2023-11-21 15:32:02,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1558180.0, ans=0.035 2023-11-21 15:32:21,274 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5300, loss[loss=0.06509, simple_loss=0.08459, pruned_loss=0.01292, audio_tagging_loss=0.009878, over 14340.00 frames. ], tot_loss[loss=0.07449, simple_loss=0.09706, pruned_loss=0.01658, audio_tagging_loss=0.00938, over 3038103.21 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:32:22,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233750 2023-11-21 15:32:44,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1558380.0, ans=0.125 2023-11-21 15:32:45,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.789e+01 8.165e+01 8.716e+01 9.457e+01 1.108e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 15:33:25,891 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5350, loss[loss=0.06472, simple_loss=0.08725, pruned_loss=0.01347, audio_tagging_loss=0.007634, over 14539.00 frames. ], tot_loss[loss=0.07403, simple_loss=0.09597, pruned_loss=0.01662, audio_tagging_loss=0.009433, over 3031892.09 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:33:27,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233800 2023-11-21 15:33:45,296 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 15:34:22,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1558913.3333333333, ans=0.125 2023-11-21 15:34:30,906 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5400, loss[loss=0.07884, simple_loss=0.1007, pruned_loss=0.01957, audio_tagging_loss=0.008947, over 15544.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09626, pruned_loss=0.01657, audio_tagging_loss=0.009456, over 3028644.12 frames. ], batch size: 59, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:34:32,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233850 2023-11-21 15:34:39,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1558980.0, ans=0.125 2023-11-21 15:34:44,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1559046.6666666667, ans=0.125 2023-11-21 15:34:57,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.221e+01 8.112e+01 8.686e+01 9.399e+01 1.125e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 15:35:24,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1559246.6666666667, ans=0.0 2023-11-21 15:35:31,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1559246.6666666667, ans=0.1 2023-11-21 15:35:31,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2023-11-21 15:35:36,469 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5450, loss[loss=0.06617, simple_loss=0.08506, pruned_loss=0.01487, audio_tagging_loss=0.008772, over 13951.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09629, pruned_loss=0.01663, audio_tagging_loss=0.009462, over 3030923.57 frames. ], batch size: 53, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:35:37,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233900 2023-11-21 15:35:48,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1559380.0, ans=0.0 2023-11-21 15:36:22,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-11-21 15:36:40,859 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5500, loss[loss=0.06029, simple_loss=0.07567, pruned_loss=0.00869, audio_tagging_loss=0.01376, over 14664.00 frames. ], tot_loss[loss=0.074, simple_loss=0.09577, pruned_loss=0.01655, audio_tagging_loss=0.009561, over 3029981.92 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:36:42,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 233950 2023-11-21 15:37:05,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.405e+01 9.190e+01 9.926e+01 1.565e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-21 15:37:16,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1559780.0, ans=0.04949747468305833 2023-11-21 15:37:39,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1559913.3333333333, ans=0.025 2023-11-21 15:37:41,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1559913.3333333333, ans=0.125 2023-11-21 15:37:44,552 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5550, loss[loss=0.06713, simple_loss=0.08184, pruned_loss=0.01274, audio_tagging_loss=0.01347, over 15122.00 frames. ], tot_loss[loss=0.07448, simple_loss=0.09625, pruned_loss=0.01669, audio_tagging_loss=0.009663, over 3038248.42 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 16.0 2023-11-21 15:37:45,932 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234000 2023-11-21 15:37:49,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1559980.0, ans=0.125 2023-11-21 15:37:51,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-21 15:37:53,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1559980.0, ans=0.125 2023-11-21 15:37:57,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1560046.6666666667, ans=0.125 2023-11-21 15:37:57,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1560046.6666666667, ans=0.025 2023-11-21 15:38:20,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1560113.3333333333, ans=0.0 2023-11-21 15:38:30,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1560180.0, ans=0.1 2023-11-21 15:38:33,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1560180.0, ans=0.125 2023-11-21 15:38:51,283 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5600, loss[loss=0.06119, simple_loss=0.07452, pruned_loss=0.008065, audio_tagging_loss=0.01587, over 15330.00 frames. ], tot_loss[loss=0.07437, simple_loss=0.0959, pruned_loss=0.01662, audio_tagging_loss=0.009798, over 3036912.84 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:38:52,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234050 2023-11-21 15:39:08,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1560380.0, ans=0.2 2023-11-21 15:39:14,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1560380.0, ans=0.125 2023-11-21 15:39:16,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.332e+01 7.943e+01 8.570e+01 9.599e+01 1.261e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 15:39:30,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1560513.3333333333, ans=0.95 2023-11-21 15:39:37,676 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 15:39:56,472 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5650, loss[loss=0.0711, simple_loss=0.0843, pruned_loss=0.01925, audio_tagging_loss=0.009694, over 15299.00 frames. ], tot_loss[loss=0.07428, simple_loss=0.09581, pruned_loss=0.01658, audio_tagging_loss=0.009799, over 3038240.71 frames. ], batch size: 59, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:39:57,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234100 2023-11-21 15:40:04,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1560646.6666666667, ans=0.1 2023-11-21 15:40:10,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1560713.3333333333, ans=0.0 2023-11-21 15:40:37,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1560846.6666666667, ans=0.2 2023-11-21 15:40:43,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1560846.6666666667, ans=0.0 2023-11-21 15:40:47,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1560913.3333333333, ans=0.0 2023-11-21 15:40:55,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-21 15:41:00,852 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5700, loss[loss=0.09425, simple_loss=0.1326, pruned_loss=0.02015, audio_tagging_loss=0.007799, over 15186.00 frames. ], tot_loss[loss=0.07427, simple_loss=0.09576, pruned_loss=0.01654, audio_tagging_loss=0.009853, over 3041949.33 frames. ], batch size: 57, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:41:02,210 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234150 2023-11-21 15:41:04,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1560980.0, ans=0.1 2023-11-21 15:41:06,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1560980.0, ans=0.125 2023-11-21 15:41:06,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1560980.0, ans=0.125 2023-11-21 15:41:08,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1560980.0, ans=0.125 2023-11-21 15:41:13,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1561046.6666666667, ans=0.1 2023-11-21 15:41:16,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1561046.6666666667, ans=0.05 2023-11-21 15:41:27,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.266e+01 8.051e+01 8.513e+01 9.127e+01 1.242e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-21 15:41:29,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1561113.3333333333, ans=0.1 2023-11-21 15:41:48,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1561180.0, ans=0.2 2023-11-21 15:41:48,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1561180.0, ans=0.125 2023-11-21 15:41:57,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-21 15:42:03,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-21 15:42:05,389 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5750, loss[loss=0.05891, simple_loss=0.07642, pruned_loss=0.01106, audio_tagging_loss=0.00964, over 17525.00 frames. ], tot_loss[loss=0.07353, simple_loss=0.09481, pruned_loss=0.0164, audio_tagging_loss=0.009721, over 3048482.64 frames. ], batch size: 66, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:42:06,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234200 2023-11-21 15:42:17,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1561313.3333333333, ans=0.0 2023-11-21 15:42:20,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1561380.0, ans=0.07 2023-11-21 15:42:22,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-21 15:42:47,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1561513.3333333333, ans=0.125 2023-11-21 15:43:11,813 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5800, loss[loss=0.09071, simple_loss=0.1242, pruned_loss=0.02228, audio_tagging_loss=0.006341, over 15559.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09395, pruned_loss=0.0163, audio_tagging_loss=0.009634, over 3054719.06 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:43:13,159 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234250 2023-11-21 15:43:18,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1561646.6666666667, ans=0.125 2023-11-21 15:43:24,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1561713.3333333333, ans=0.2 2023-11-21 15:43:28,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1561713.3333333333, ans=0.125 2023-11-21 15:43:29,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1561713.3333333333, ans=0.2 2023-11-21 15:43:36,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.121e+01 8.819e+01 9.541e+01 1.299e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 15:43:48,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1561780.0, ans=0.95 2023-11-21 15:43:52,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1561846.6666666667, ans=0.2 2023-11-21 15:43:54,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1561846.6666666667, ans=0.125 2023-11-21 15:43:59,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561846.6666666667, ans=0.125 2023-11-21 15:44:05,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-21 15:44:16,116 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5850, loss[loss=0.08527, simple_loss=0.109, pruned_loss=0.02111, audio_tagging_loss=0.009665, over 15099.00 frames. ], tot_loss[loss=0.07289, simple_loss=0.09391, pruned_loss=0.01632, audio_tagging_loss=0.009624, over 3047691.06 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:44:17,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234300 2023-11-21 15:44:23,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561980.0, ans=0.125 2023-11-21 15:44:31,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1562046.6666666667, ans=0.0 2023-11-21 15:44:44,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-11-21 15:44:53,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-21 15:44:55,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-21 15:44:58,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1562180.0, ans=0.125 2023-11-21 15:45:02,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1562180.0, ans=10.0 2023-11-21 15:45:20,798 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5900, loss[loss=0.08939, simple_loss=0.113, pruned_loss=0.02328, audio_tagging_loss=0.009597, over 14149.00 frames. ], tot_loss[loss=0.07326, simple_loss=0.09462, pruned_loss=0.01638, audio_tagging_loss=0.009573, over 3047588.31 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:45:22,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234350 2023-11-21 15:45:30,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1562313.3333333333, ans=0.125 2023-11-21 15:45:45,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-21 15:45:47,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.953e+01 8.033e+01 8.577e+01 9.444e+01 1.216e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-21 15:45:52,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1562446.6666666667, ans=0.125 2023-11-21 15:46:12,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1562580.0, ans=0.1 2023-11-21 15:46:17,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1562580.0, ans=0.07 2023-11-21 15:46:26,925 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 5950, loss[loss=0.05725, simple_loss=0.06512, pruned_loss=0.01172, audio_tagging_loss=0.01297, over 14637.00 frames. ], tot_loss[loss=0.0736, simple_loss=0.09506, pruned_loss=0.01643, audio_tagging_loss=0.009644, over 3046799.96 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 32.0 2023-11-21 15:46:28,286 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234400 2023-11-21 15:46:32,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1562646.6666666667, ans=0.125 2023-11-21 15:46:36,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-21 15:46:47,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1562713.3333333333, ans=0.125 2023-11-21 15:47:01,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1562780.0, ans=0.125 2023-11-21 15:47:31,485 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6000, loss[loss=0.07607, simple_loss=0.09263, pruned_loss=0.01748, audio_tagging_loss=0.01228, over 15411.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09578, pruned_loss=0.01651, audio_tagging_loss=0.009576, over 3049366.24 frames. ], batch size: 60, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:47:31,485 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 15:47:51,722 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1965, 3.8710, 3.2776, 3.8438], device='cuda:1') 2023-11-21 15:47:51,854 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6502, 2.2030, 3.2737, 2.4901], device='cuda:1') 2023-11-21 15:48:12,904 INFO [train_asr.py:1253] (1/4) Epoch 20, validation: loss=0.06068, simple_loss=0.0522, pruned_loss=0.005214, audio_tagging_loss=0.02937, over 4681554.00 frames. 2023-11-21 15:48:12,904 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 15:48:14,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234450 2023-11-21 15:48:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1562980.0, ans=0.125 2023-11-21 15:48:22,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1562980.0, ans=0.1 2023-11-21 15:48:24,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1563046.6666666667, ans=0.0 2023-11-21 15:48:38,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 7.928e+01 8.696e+01 9.354e+01 1.094e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 15:48:47,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1563113.3333333333, ans=0.035 2023-11-21 15:48:58,821 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 15:49:17,921 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6050, loss[loss=0.04843, simple_loss=0.0605, pruned_loss=0.009257, audio_tagging_loss=0.008928, over 16220.00 frames. ], tot_loss[loss=0.07404, simple_loss=0.09592, pruned_loss=0.01659, audio_tagging_loss=0.00949, over 3043029.50 frames. ], batch size: 63, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:49:19,220 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234500 2023-11-21 15:49:23,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1563313.3333333333, ans=0.125 2023-11-21 15:49:26,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1563313.3333333333, ans=0.125 2023-11-21 15:49:32,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1563380.0, ans=0.0 2023-11-21 15:50:12,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1563580.0, ans=0.125 2023-11-21 15:50:21,456 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6100, loss[loss=0.05746, simple_loss=0.0716, pruned_loss=0.01123, audio_tagging_loss=0.01044, over 14741.00 frames. ], tot_loss[loss=0.07401, simple_loss=0.09595, pruned_loss=0.01663, audio_tagging_loss=0.009414, over 3042027.45 frames. ], batch size: 57, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:50:22,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234550 2023-11-21 15:50:24,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2023-11-21 15:50:38,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1563713.3333333333, ans=0.125 2023-11-21 15:50:39,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-21 15:50:47,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.060e+01 8.562e+01 9.373e+01 1.321e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 15:51:03,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1563846.6666666667, ans=0.1 2023-11-21 15:51:13,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1563913.3333333333, ans=0.09899494936611666 2023-11-21 15:51:25,627 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6150, loss[loss=0.05675, simple_loss=0.07346, pruned_loss=0.01036, audio_tagging_loss=0.009666, over 15926.00 frames. ], tot_loss[loss=0.07403, simple_loss=0.09604, pruned_loss=0.01659, audio_tagging_loss=0.009411, over 3050314.20 frames. ], batch size: 58, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:51:27,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=1563980.0, ans=12.0 2023-11-21 15:51:27,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234600 2023-11-21 15:51:32,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1563980.0, ans=0.125 2023-11-21 15:51:45,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=12.0 2023-11-21 15:51:58,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=22.5 2023-11-21 15:52:10,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1564180.0, ans=0.0 2023-11-21 15:52:31,531 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6200, loss[loss=0.05789, simple_loss=0.07016, pruned_loss=0.01214, audio_tagging_loss=0.01067, over 16284.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09651, pruned_loss=0.01656, audio_tagging_loss=0.009524, over 3052286.70 frames. ], batch size: 63, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:52:32,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234650 2023-11-21 15:52:38,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-21 15:52:46,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-21 15:52:47,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1564380.0, ans=0.0 2023-11-21 15:52:56,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 7.947e+01 8.423e+01 9.173e+01 1.216e+02, threshold=1.685e+02, percent-clipped=0.0 2023-11-21 15:52:59,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-21 15:53:07,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1564446.6666666667, ans=0.125 2023-11-21 15:53:08,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2023-11-21 15:53:13,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-21 15:53:35,400 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6250, loss[loss=0.08855, simple_loss=0.1171, pruned_loss=0.02194, audio_tagging_loss=0.00807, over 14565.00 frames. ], tot_loss[loss=0.07366, simple_loss=0.09575, pruned_loss=0.01623, audio_tagging_loss=0.009563, over 3052207.51 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:53:36,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234700 2023-11-21 15:53:36,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1564646.6666666667, ans=0.125 2023-11-21 15:54:10,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1564780.0, ans=0.125 2023-11-21 15:54:15,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-21 15:54:20,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1564846.6666666667, ans=0.1 2023-11-21 15:54:39,039 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6300, loss[loss=0.08022, simple_loss=0.1018, pruned_loss=0.02063, audio_tagging_loss=0.008713, over 15091.00 frames. ], tot_loss[loss=0.07366, simple_loss=0.0959, pruned_loss=0.01618, audio_tagging_loss=0.009527, over 3050139.07 frames. ], batch size: 56, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:54:40,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234750 2023-11-21 15:54:44,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.03 vs. limit=15.0 2023-11-21 15:54:45,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1564980.0, ans=0.1 2023-11-21 15:55:05,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.110e+01 8.746e+01 9.579e+01 1.135e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-21 15:55:09,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1565113.3333333333, ans=0.09899494936611666 2023-11-21 15:55:26,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1565180.0, ans=0.05 2023-11-21 15:55:39,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1565246.6666666667, ans=0.125 2023-11-21 15:55:42,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1565246.6666666667, ans=0.0 2023-11-21 15:55:45,084 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6350, loss[loss=0.07252, simple_loss=0.09529, pruned_loss=0.01662, audio_tagging_loss=0.008263, over 14718.00 frames. ], tot_loss[loss=0.0735, simple_loss=0.09546, pruned_loss=0.01609, audio_tagging_loss=0.009679, over 3056970.15 frames. ], batch size: 58, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:55:46,414 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234800 2023-11-21 15:55:58,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1565380.0, ans=0.125 2023-11-21 15:56:40,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565580.0, ans=0.1 2023-11-21 15:56:41,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1565580.0, ans=0.0 2023-11-21 15:56:41,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1565580.0, ans=0.125 2023-11-21 15:56:49,991 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6400, loss[loss=0.1006, simple_loss=0.1342, pruned_loss=0.02667, audio_tagging_loss=0.00682, over 14917.00 frames. ], tot_loss[loss=0.07429, simple_loss=0.09629, pruned_loss=0.01642, audio_tagging_loss=0.009722, over 3059384.04 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:56:51,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234850 2023-11-21 15:56:57,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1565646.6666666667, ans=0.0 2023-11-21 15:57:16,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.183e+01 8.791e+01 9.502e+01 1.187e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 15:57:17,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1565780.0, ans=0.0 2023-11-21 15:57:19,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-21 15:57:21,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1565780.0, ans=0.125 2023-11-21 15:57:34,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-21 15:57:36,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-21 15:57:46,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1565913.3333333333, ans=0.2 2023-11-21 15:57:49,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1565913.3333333333, ans=0.2 2023-11-21 15:57:50,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1565913.3333333333, ans=0.04949747468305833 2023-11-21 15:57:55,260 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6450, loss[loss=0.06964, simple_loss=0.09182, pruned_loss=0.01406, audio_tagging_loss=0.009667, over 14813.00 frames. ], tot_loss[loss=0.0738, simple_loss=0.09542, pruned_loss=0.01631, audio_tagging_loss=0.009784, over 3059985.50 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:57:56,587 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234900 2023-11-21 15:58:28,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1566113.3333333333, ans=0.0 2023-11-21 15:58:55,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1566246.6666666667, ans=0.2 2023-11-21 15:59:01,503 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6500, loss[loss=0.08991, simple_loss=0.1193, pruned_loss=0.0212, audio_tagging_loss=0.009054, over 16128.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09644, pruned_loss=0.01653, audio_tagging_loss=0.009784, over 3059713.17 frames. ], batch size: 57, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 15:59:01,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1566313.3333333333, ans=0.125 2023-11-21 15:59:02,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 234950 2023-11-21 15:59:26,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.005e+01 8.613e+01 9.254e+01 1.234e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 15:59:27,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1566446.6666666667, ans=22.5 2023-11-21 15:59:46,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-21 15:59:59,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-11-21 16:00:00,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1566580.0, ans=0.125 2023-11-21 16:00:06,215 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6550, loss[loss=0.0571, simple_loss=0.07171, pruned_loss=0.01217, audio_tagging_loss=0.009075, over 14993.00 frames. ], tot_loss[loss=0.07389, simple_loss=0.09576, pruned_loss=0.01635, audio_tagging_loss=0.009656, over 3056834.12 frames. ], batch size: 56, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:00:07,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235000 2023-11-21 16:00:12,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1566646.6666666667, ans=0.015 2023-11-21 16:00:31,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1566713.3333333333, ans=0.1 2023-11-21 16:00:31,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1566713.3333333333, ans=0.125 2023-11-21 16:00:35,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1566780.0, ans=0.5 2023-11-21 16:00:40,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-21 16:00:57,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1566913.3333333333, ans=0.125 2023-11-21 16:01:03,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-11-21 16:01:05,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1566913.3333333333, ans=0.125 2023-11-21 16:01:06,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1566913.3333333333, ans=0.95 2023-11-21 16:01:11,152 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6600, loss[loss=0.0891, simple_loss=0.1197, pruned_loss=0.02303, audio_tagging_loss=0.006225, over 15329.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.09504, pruned_loss=0.01639, audio_tagging_loss=0.009571, over 3059726.17 frames. ], batch size: 54, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:01:13,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235050 2023-11-21 16:01:16,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2023-11-21 16:01:22,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-21 16:01:28,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1567046.6666666667, ans=0.125 2023-11-21 16:01:39,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.039e+01 8.543e+01 9.389e+01 1.376e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-21 16:01:41,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1567113.3333333333, ans=15.0 2023-11-21 16:01:52,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1567180.0, ans=0.0 2023-11-21 16:02:00,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=12.0 2023-11-21 16:02:01,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1567180.0, ans=0.125 2023-11-21 16:02:12,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1567246.6666666667, ans=0.1 2023-11-21 16:02:15,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1567246.6666666667, ans=0.0 2023-11-21 16:02:17,520 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6650, loss[loss=0.0637, simple_loss=0.08909, pruned_loss=0.01407, audio_tagging_loss=0.005083, over 15331.00 frames. ], tot_loss[loss=0.07341, simple_loss=0.095, pruned_loss=0.01641, audio_tagging_loss=0.009497, over 3057041.93 frames. ], batch size: 59, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:02:18,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235100 2023-11-21 16:02:59,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1567513.3333333333, ans=0.2 2023-11-21 16:03:05,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1567513.3333333333, ans=0.125 2023-11-21 16:03:22,499 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6700, loss[loss=0.07013, simple_loss=0.09348, pruned_loss=0.01507, audio_tagging_loss=0.008316, over 15246.00 frames. ], tot_loss[loss=0.07315, simple_loss=0.09486, pruned_loss=0.01626, audio_tagging_loss=0.00945, over 3049257.66 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:03:23,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235150 2023-11-21 16:03:29,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1567646.6666666667, ans=0.125 2023-11-21 16:03:38,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-11-21 16:03:49,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.083e+01 8.674e+01 9.269e+01 1.242e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 16:04:17,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:04:22,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1567913.3333333333, ans=0.0 2023-11-21 16:04:26,635 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6750, loss[loss=0.06199, simple_loss=0.07721, pruned_loss=0.01454, audio_tagging_loss=0.008843, over 15140.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09387, pruned_loss=0.01604, audio_tagging_loss=0.009427, over 3045874.69 frames. ], batch size: 56, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:04:27,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235200 2023-11-21 16:04:35,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567980.0, ans=0.1 2023-11-21 16:05:07,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1568180.0, ans=0.125 2023-11-21 16:05:17,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1568246.6666666667, ans=15.0 2023-11-21 16:05:28,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1568246.6666666667, ans=0.0 2023-11-21 16:05:30,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1568313.3333333333, ans=0.1 2023-11-21 16:05:31,733 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6800, loss[loss=0.08035, simple_loss=0.09676, pruned_loss=0.02184, audio_tagging_loss=0.01013, over 14444.00 frames. ], tot_loss[loss=0.07283, simple_loss=0.09441, pruned_loss=0.01618, audio_tagging_loss=0.009448, over 3048822.63 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:05:32,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1568313.3333333333, ans=0.2 2023-11-21 16:05:33,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235250 2023-11-21 16:05:33,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1568313.3333333333, ans=0.125 2023-11-21 16:05:57,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.694e+01 8.054e+01 8.686e+01 9.528e+01 1.643e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 16:05:58,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-21 16:06:10,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1568513.3333333333, ans=0.125 2023-11-21 16:06:24,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1568580.0, ans=0.0 2023-11-21 16:06:35,964 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6850, loss[loss=0.07149, simple_loss=0.09611, pruned_loss=0.01604, audio_tagging_loss=0.007399, over 15792.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09491, pruned_loss=0.01638, audio_tagging_loss=0.00945, over 3043513.89 frames. ], batch size: 58, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:06:37,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235300 2023-11-21 16:06:58,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1568713.3333333333, ans=0.0 2023-11-21 16:07:02,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1568780.0, ans=0.0 2023-11-21 16:07:22,851 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:07:28,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1568913.3333333333, ans=0.125 2023-11-21 16:07:39,659 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6900, loss[loss=0.07285, simple_loss=0.09939, pruned_loss=0.014, audio_tagging_loss=0.00916, over 14741.00 frames. ], tot_loss[loss=0.07333, simple_loss=0.09504, pruned_loss=0.01638, audio_tagging_loss=0.00943, over 3035277.61 frames. ], batch size: 56, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:07:41,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235350 2023-11-21 16:07:53,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1569046.6666666667, ans=0.0 2023-11-21 16:07:57,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1569046.6666666667, ans=0.0 2023-11-21 16:08:02,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1569046.6666666667, ans=0.125 2023-11-21 16:08:06,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 7.983e+01 8.656e+01 9.439e+01 1.131e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 16:08:11,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1569113.3333333333, ans=0.035 2023-11-21 16:08:29,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1569180.0, ans=0.0 2023-11-21 16:08:30,058 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 16:08:32,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1569246.6666666667, ans=0.5 2023-11-21 16:08:34,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1569246.6666666667, ans=0.125 2023-11-21 16:08:35,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=22.5 2023-11-21 16:08:44,340 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 6950, loss[loss=0.1023, simple_loss=0.1321, pruned_loss=0.03007, audio_tagging_loss=0.006163, over 15533.00 frames. ], tot_loss[loss=0.07343, simple_loss=0.09519, pruned_loss=0.01637, audio_tagging_loss=0.009459, over 3041225.95 frames. ], batch size: 54, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:08:45,666 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235400 2023-11-21 16:08:59,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:09:39,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1569580.0, ans=0.1 2023-11-21 16:09:45,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-21 16:09:50,354 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7000, loss[loss=0.04959, simple_loss=0.05272, pruned_loss=0.01265, audio_tagging_loss=0.01058, over 16213.00 frames. ], tot_loss[loss=0.07389, simple_loss=0.09574, pruned_loss=0.01652, audio_tagging_loss=0.009493, over 3045572.06 frames. ], batch size: 64, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:09:51,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235450 2023-11-21 16:09:59,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1569646.6666666667, ans=0.125 2023-11-21 16:10:07,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1569713.3333333333, ans=0.1 2023-11-21 16:10:15,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.127e+01 8.792e+01 9.421e+01 1.149e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 16:10:34,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1569846.6666666667, ans=0.0 2023-11-21 16:10:43,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1569913.3333333333, ans=0.0 2023-11-21 16:10:43,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=12.0 2023-11-21 16:10:53,802 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7050, loss[loss=0.07767, simple_loss=0.09924, pruned_loss=0.01741, audio_tagging_loss=0.01064, over 15321.00 frames. ], tot_loss[loss=0.07348, simple_loss=0.09526, pruned_loss=0.01636, audio_tagging_loss=0.009497, over 3044482.68 frames. ], batch size: 59, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:10:55,105 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235500 2023-11-21 16:10:59,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1569980.0, ans=0.1 2023-11-21 16:11:03,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1569980.0, ans=0.05 2023-11-21 16:11:05,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1570046.6666666667, ans=0.125 2023-11-21 16:11:26,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1570113.3333333333, ans=0.0 2023-11-21 16:11:45,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1570246.6666666667, ans=0.2 2023-11-21 16:11:47,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1570246.6666666667, ans=0.0 2023-11-21 16:11:57,012 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7100, loss[loss=0.07222, simple_loss=0.1011, pruned_loss=0.01388, audio_tagging_loss=0.007779, over 14969.00 frames. ], tot_loss[loss=0.07386, simple_loss=0.09553, pruned_loss=0.01646, audio_tagging_loss=0.009635, over 3046028.35 frames. ], batch size: 55, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:11:58,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235550 2023-11-21 16:12:12,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-21 16:12:12,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-11-21 16:12:16,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1570380.0, ans=0.125 2023-11-21 16:12:18,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1570380.0, ans=0.125 2023-11-21 16:12:20,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1570380.0, ans=0.2 2023-11-21 16:12:25,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.128e+01 8.670e+01 9.348e+01 1.083e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 16:12:39,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1570513.3333333333, ans=0.0 2023-11-21 16:12:42,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1570513.3333333333, ans=0.07 2023-11-21 16:12:51,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1570580.0, ans=0.0 2023-11-21 16:13:01,306 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7150, loss[loss=0.0796, simple_loss=0.1019, pruned_loss=0.02012, audio_tagging_loss=0.008534, over 15311.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09494, pruned_loss=0.01638, audio_tagging_loss=0.009596, over 3046258.70 frames. ], batch size: 60, lr: 3.42e-03, grad_scale: 16.0 2023-11-21 16:13:03,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235600 2023-11-21 16:13:14,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1570713.3333333333, ans=0.1 2023-11-21 16:13:24,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1570713.3333333333, ans=0.0 2023-11-21 16:13:38,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1570780.0, ans=0.2 2023-11-21 16:13:40,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1570846.6666666667, ans=0.125 2023-11-21 16:13:54,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2023-11-21 16:14:04,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1570980.0, ans=0.1 2023-11-21 16:14:05,656 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7200, loss[loss=0.0736, simple_loss=0.1001, pruned_loss=0.01548, audio_tagging_loss=0.008063, over 14962.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.09429, pruned_loss=0.01624, audio_tagging_loss=0.009685, over 3044421.08 frames. ], batch size: 57, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:14:07,010 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235650 2023-11-21 16:14:27,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571046.6666666667, ans=0.1 2023-11-21 16:14:33,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.075e+01 8.939e+01 9.765e+01 1.346e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-21 16:14:33,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1571113.3333333333, ans=0.07 2023-11-21 16:14:37,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1571113.3333333333, ans=0.125 2023-11-21 16:14:41,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1571113.3333333333, ans=0.2 2023-11-21 16:14:47,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1571180.0, ans=0.04949747468305833 2023-11-21 16:14:55,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571246.6666666667, ans=0.1 2023-11-21 16:15:08,520 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7250, loss[loss=0.06968, simple_loss=0.09464, pruned_loss=0.01293, audio_tagging_loss=0.009425, over 15292.00 frames. ], tot_loss[loss=0.07316, simple_loss=0.09445, pruned_loss=0.01617, audio_tagging_loss=0.009765, over 3047517.32 frames. ], batch size: 57, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:15:09,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235700 2023-11-21 16:15:16,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1571313.3333333333, ans=0.0 2023-11-21 16:15:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1571446.6666666667, ans=0.125 2023-11-21 16:15:55,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1571513.3333333333, ans=0.125 2023-11-21 16:15:55,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1571513.3333333333, ans=0.125 2023-11-21 16:16:09,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1571580.0, ans=0.125 2023-11-21 16:16:13,772 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7300, loss[loss=0.05127, simple_loss=0.0583, pruned_loss=0.0132, audio_tagging_loss=0.008916, over 13345.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.09439, pruned_loss=0.0162, audio_tagging_loss=0.009735, over 3042534.96 frames. ], batch size: 53, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:16:15,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235750 2023-11-21 16:16:42,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.221e+01 8.789e+01 9.369e+01 1.391e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 16:16:57,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1571846.6666666667, ans=0.0 2023-11-21 16:17:15,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1571913.3333333333, ans=0.125 2023-11-21 16:17:19,077 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7350, loss[loss=0.07079, simple_loss=0.08986, pruned_loss=0.0154, audio_tagging_loss=0.01046, over 14023.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.09402, pruned_loss=0.01604, audio_tagging_loss=0.009639, over 3043289.28 frames. ], batch size: 54, lr: 3.42e-03, grad_scale: 32.0 2023-11-21 16:17:20,428 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235800 2023-11-21 16:17:27,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1571980.0, ans=0.125 2023-11-21 16:17:43,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1572113.3333333333, ans=0.0 2023-11-21 16:18:06,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1572180.0, ans=0.0 2023-11-21 16:18:22,171 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7400, loss[loss=0.06263, simple_loss=0.08424, pruned_loss=0.0111, audio_tagging_loss=0.009419, over 15078.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.09434, pruned_loss=0.01598, audio_tagging_loss=0.009455, over 3050537.55 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 32.0 2023-11-21 16:18:23,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235850 2023-11-21 16:18:50,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.093e+01 8.770e+01 9.313e+01 1.126e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 16:18:56,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1572446.6666666667, ans=0.0 2023-11-21 16:19:02,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1572513.3333333333, ans=0.125 2023-11-21 16:19:20,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1572580.0, ans=0.1 2023-11-21 16:19:26,609 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7450, loss[loss=0.07898, simple_loss=0.1039, pruned_loss=0.019, audio_tagging_loss=0.008009, over 16915.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09402, pruned_loss=0.0161, audio_tagging_loss=0.009551, over 3041604.13 frames. ], batch size: 62, lr: 3.41e-03, grad_scale: 32.0 2023-11-21 16:19:27,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235900 2023-11-21 16:19:39,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1572713.3333333333, ans=0.0 2023-11-21 16:19:58,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1572780.0, ans=0.125 2023-11-21 16:20:11,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-21 16:20:12,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1572846.6666666667, ans=0.1 2023-11-21 16:20:22,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1572913.3333333333, ans=0.0 2023-11-21 16:20:30,981 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7500, loss[loss=0.06199, simple_loss=0.08044, pruned_loss=0.01, audio_tagging_loss=0.01177, over 15268.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.0939, pruned_loss=0.01602, audio_tagging_loss=0.009487, over 3042461.68 frames. ], batch size: 56, lr: 3.41e-03, grad_scale: 32.0 2023-11-21 16:20:32,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 235950 2023-11-21 16:20:45,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1573046.6666666667, ans=0.1 2023-11-21 16:20:51,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-21 16:20:59,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.351e+01 8.857e+01 9.417e+01 1.207e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-21 16:21:34,628 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7550, loss[loss=0.09256, simple_loss=0.1241, pruned_loss=0.02191, audio_tagging_loss=0.008621, over 14678.00 frames. ], tot_loss[loss=0.07206, simple_loss=0.09339, pruned_loss=0.01583, audio_tagging_loss=0.009538, over 3046017.90 frames. ], batch size: 53, lr: 3.41e-03, grad_scale: 32.0 2023-11-21 16:21:35,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236000 2023-11-21 16:21:41,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1573313.3333333333, ans=0.125 2023-11-21 16:21:48,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1573313.3333333333, ans=0.1 2023-11-21 16:21:55,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1573380.0, ans=0.125 2023-11-21 16:22:41,876 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7600, loss[loss=0.07114, simple_loss=0.09766, pruned_loss=0.01374, audio_tagging_loss=0.008579, over 15258.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09277, pruned_loss=0.01561, audio_tagging_loss=0.009514, over 3046357.03 frames. ], batch size: 56, lr: 3.41e-03, grad_scale: 32.0 2023-11-21 16:22:43,230 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236050 2023-11-21 16:22:52,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1573646.6666666667, ans=0.2 2023-11-21 16:23:04,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1573713.3333333333, ans=0.125 2023-11-21 16:23:09,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.352e+01 8.154e+01 8.768e+01 9.478e+01 1.233e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 16:23:10,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1573780.0, ans=0.125 2023-11-21 16:23:14,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-11-21 16:23:27,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-21 16:23:34,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-21 16:23:46,826 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7650, loss[loss=0.07163, simple_loss=0.09447, pruned_loss=0.01457, audio_tagging_loss=0.009829, over 15256.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09347, pruned_loss=0.01584, audio_tagging_loss=0.009508, over 3048279.70 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:23:48,147 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236100 2023-11-21 16:24:01,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1574046.6666666667, ans=0.125 2023-11-21 16:24:13,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1574113.3333333333, ans=0.1 2023-11-21 16:24:28,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-21 16:24:29,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:24:44,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1574246.6666666667, ans=10.0 2023-11-21 16:24:46,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-21 16:24:51,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1574313.3333333333, ans=0.125 2023-11-21 16:24:52,307 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7700, loss[loss=0.1001, simple_loss=0.1396, pruned_loss=0.02126, audio_tagging_loss=0.00902, over 16053.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09502, pruned_loss=0.01613, audio_tagging_loss=0.009439, over 3052217.63 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:24:53,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236150 2023-11-21 16:25:00,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1574313.3333333333, ans=0.125 2023-11-21 16:25:22,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 7.998e+01 8.716e+01 9.349e+01 1.459e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 16:25:32,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1574513.3333333333, ans=0.0 2023-11-21 16:25:32,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1574513.3333333333, ans=0.125 2023-11-21 16:25:38,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=12.0 2023-11-21 16:25:39,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1574513.3333333333, ans=0.0 2023-11-21 16:25:43,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1574580.0, ans=0.2 2023-11-21 16:25:43,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1574580.0, ans=15.0 2023-11-21 16:25:44,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1574580.0, ans=0.2 2023-11-21 16:25:45,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1574580.0, ans=0.1 2023-11-21 16:25:57,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2023-11-21 16:25:57,890 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7750, loss[loss=0.06642, simple_loss=0.0848, pruned_loss=0.01171, audio_tagging_loss=0.01231, over 13848.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.0961, pruned_loss=0.01634, audio_tagging_loss=0.009579, over 3040082.50 frames. ], batch size: 54, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:25:59,195 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236200 2023-11-21 16:26:49,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1574913.3333333333, ans=0.1 2023-11-21 16:26:49,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1574913.3333333333, ans=0.0 2023-11-21 16:26:55,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1574913.3333333333, ans=0.125 2023-11-21 16:27:02,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1574980.0, ans=0.0 2023-11-21 16:27:03,063 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7800, loss[loss=0.07942, simple_loss=0.1083, pruned_loss=0.01835, audio_tagging_loss=0.006898, over 14566.00 frames. ], tot_loss[loss=0.07438, simple_loss=0.09655, pruned_loss=0.0165, audio_tagging_loss=0.009609, over 3033045.93 frames. ], batch size: 52, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:27:04,444 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236250 2023-11-21 16:27:04,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1574980.0, ans=0.0 2023-11-21 16:27:09,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1574980.0, ans=0.0 2023-11-21 16:27:18,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1575046.6666666667, ans=0.125 2023-11-21 16:27:19,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1575046.6666666667, ans=15.0 2023-11-21 16:27:20,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1575046.6666666667, ans=0.0 2023-11-21 16:27:27,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1575113.3333333333, ans=0.125 2023-11-21 16:27:27,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1575113.3333333333, ans=0.125 2023-11-21 16:27:29,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1575113.3333333333, ans=0.125 2023-11-21 16:27:32,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.260e+01 8.924e+01 9.487e+01 1.142e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-21 16:28:03,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1575246.6666666667, ans=0.2 2023-11-21 16:28:06,788 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7850, loss[loss=0.06453, simple_loss=0.08687, pruned_loss=0.01239, audio_tagging_loss=0.008704, over 15090.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09699, pruned_loss=0.01664, audio_tagging_loss=0.009538, over 3037865.92 frames. ], batch size: 55, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:28:08,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236300 2023-11-21 16:28:13,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1575313.3333333333, ans=0.5 2023-11-21 16:29:08,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-21 16:29:12,456 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7900, loss[loss=0.07267, simple_loss=0.09297, pruned_loss=0.0127, audio_tagging_loss=0.01349, over 14424.00 frames. ], tot_loss[loss=0.07418, simple_loss=0.09601, pruned_loss=0.01651, audio_tagging_loss=0.009663, over 3040545.83 frames. ], batch size: 55, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:29:12,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1575646.6666666667, ans=0.125 2023-11-21 16:29:13,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236350 2023-11-21 16:29:21,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1575646.6666666667, ans=0.0 2023-11-21 16:29:38,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1575780.0, ans=0.125 2023-11-21 16:29:38,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1575780.0, ans=0.04949747468305833 2023-11-21 16:29:42,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.695e+01 8.223e+01 8.741e+01 9.737e+01 2.619e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-21 16:29:52,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1575846.6666666667, ans=0.1 2023-11-21 16:29:55,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1575846.6666666667, ans=0.125 2023-11-21 16:30:09,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575913.3333333333, ans=0.1 2023-11-21 16:30:16,761 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 7950, loss[loss=0.08156, simple_loss=0.1116, pruned_loss=0.01895, audio_tagging_loss=0.006826, over 15525.00 frames. ], tot_loss[loss=0.07457, simple_loss=0.09647, pruned_loss=0.01672, audio_tagging_loss=0.009611, over 3039350.21 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:30:18,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236400 2023-11-21 16:30:19,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1575980.0, ans=0.2 2023-11-21 16:30:31,801 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 16:30:43,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1576113.3333333333, ans=0.125 2023-11-21 16:31:16,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-21 16:31:20,501 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8000, loss[loss=0.09964, simple_loss=0.1302, pruned_loss=0.02398, audio_tagging_loss=0.01055, over 15826.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09539, pruned_loss=0.01631, audio_tagging_loss=0.009698, over 3035955.42 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:31:21,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236450 2023-11-21 16:31:52,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.291e+01 7.990e+01 8.622e+01 9.652e+01 1.226e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-21 16:32:23,865 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8050, loss[loss=0.06806, simple_loss=0.09378, pruned_loss=0.0136, audio_tagging_loss=0.00758, over 15470.00 frames. ], tot_loss[loss=0.07409, simple_loss=0.09548, pruned_loss=0.01657, audio_tagging_loss=0.009781, over 3032672.53 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:32:25,831 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236500 2023-11-21 16:32:46,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1576713.3333333333, ans=0.1 2023-11-21 16:32:47,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-21 16:32:51,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1576780.0, ans=0.125 2023-11-21 16:32:54,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1576780.0, ans=0.125 2023-11-21 16:32:55,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1576780.0, ans=0.1 2023-11-21 16:33:28,740 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8100, loss[loss=0.06793, simple_loss=0.09129, pruned_loss=0.01151, audio_tagging_loss=0.01078, over 14729.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09591, pruned_loss=0.01675, audio_tagging_loss=0.009718, over 3034288.08 frames. ], batch size: 55, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:33:30,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236550 2023-11-21 16:33:47,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1577046.6666666667, ans=0.2 2023-11-21 16:33:55,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1577113.3333333333, ans=0.125 2023-11-21 16:33:58,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.80 vs. limit=10.0 2023-11-21 16:33:59,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.498e+01 8.221e+01 8.771e+01 9.447e+01 1.298e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 16:34:04,447 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:34:07,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1577180.0, ans=0.125 2023-11-21 16:34:15,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1577180.0, ans=0.0 2023-11-21 16:34:31,757 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8150, loss[loss=0.09136, simple_loss=0.1209, pruned_loss=0.02148, audio_tagging_loss=0.009436, over 15612.00 frames. ], tot_loss[loss=0.0742, simple_loss=0.09599, pruned_loss=0.01662, audio_tagging_loss=0.009585, over 3039535.58 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:34:33,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236600 2023-11-21 16:34:45,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1577380.0, ans=0.125 2023-11-21 16:35:00,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-21 16:35:03,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1577446.6666666667, ans=0.125 2023-11-21 16:35:03,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1577446.6666666667, ans=0.125 2023-11-21 16:35:06,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-11-21 16:35:10,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2023-11-21 16:35:12,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1577513.3333333333, ans=0.125 2023-11-21 16:35:19,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1577513.3333333333, ans=0.125 2023-11-21 16:35:34,910 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8200, loss[loss=0.08473, simple_loss=0.1139, pruned_loss=0.02172, audio_tagging_loss=0.006071, over 14071.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.0961, pruned_loss=0.01656, audio_tagging_loss=0.00951, over 3038234.21 frames. ], batch size: 54, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:35:34,976 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 16:35:36,159 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236650 2023-11-21 16:35:54,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1577713.3333333333, ans=0.125 2023-11-21 16:36:04,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-21 16:36:07,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 8.090e+01 8.732e+01 9.271e+01 1.131e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 16:36:31,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1577913.3333333333, ans=0.125 2023-11-21 16:36:40,257 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8250, loss[loss=0.08011, simple_loss=0.1023, pruned_loss=0.01435, audio_tagging_loss=0.01459, over 15488.00 frames. ], tot_loss[loss=0.07411, simple_loss=0.09626, pruned_loss=0.01653, audio_tagging_loss=0.009449, over 3045274.36 frames. ], batch size: 54, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:36:41,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236700 2023-11-21 16:37:06,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1578113.3333333333, ans=0.125 2023-11-21 16:37:11,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1578113.3333333333, ans=0.125 2023-11-21 16:37:20,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1578180.0, ans=0.125 2023-11-21 16:37:43,356 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8300, loss[loss=0.06951, simple_loss=0.08501, pruned_loss=0.01757, audio_tagging_loss=0.009428, over 15653.00 frames. ], tot_loss[loss=0.07388, simple_loss=0.09592, pruned_loss=0.01642, audio_tagging_loss=0.009495, over 3044523.83 frames. ], batch size: 61, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:37:43,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-11-21 16:37:44,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236750 2023-11-21 16:37:48,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1578313.3333333333, ans=0.2 2023-11-21 16:37:48,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2023-11-21 16:38:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578380.0, ans=0.1 2023-11-21 16:38:04,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-11-21 16:38:14,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.835e+01 8.193e+01 8.671e+01 9.431e+01 1.215e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 16:38:21,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578513.3333333333, ans=0.1 2023-11-21 16:38:31,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1578513.3333333333, ans=0.0 2023-11-21 16:38:45,468 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8350, loss[loss=0.07625, simple_loss=0.1042, pruned_loss=0.01535, audio_tagging_loss=0.008782, over 15006.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09532, pruned_loss=0.01613, audio_tagging_loss=0.009482, over 3041747.92 frames. ], batch size: 54, lr: 3.41e-03, grad_scale: 8.0 2023-11-21 16:38:46,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236800 2023-11-21 16:38:48,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-21 16:38:58,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578713.3333333333, ans=0.1 2023-11-21 16:39:05,611 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:39:35,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1578913.3333333333, ans=0.95 2023-11-21 16:39:49,512 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8400, loss[loss=0.03781, simple_loss=0.03781, pruned_loss=0.006619, audio_tagging_loss=0.01228, over 14000.00 frames. ], tot_loss[loss=0.07416, simple_loss=0.0969, pruned_loss=0.01636, audio_tagging_loss=0.009352, over 3041368.40 frames. ], batch size: 54, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:39:50,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236850 2023-11-21 16:40:00,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1578980.0, ans=0.0 2023-11-21 16:40:12,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1579046.6666666667, ans=0.125 2023-11-21 16:40:15,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2023-11-21 16:40:21,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 7.886e+01 8.656e+01 9.499e+01 1.187e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 16:40:26,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-21 16:40:45,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1579246.6666666667, ans=0.125 2023-11-21 16:40:53,310 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8450, loss[loss=0.1033, simple_loss=0.142, pruned_loss=0.02615, audio_tagging_loss=0.006104, over 16213.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09713, pruned_loss=0.01639, audio_tagging_loss=0.009361, over 3038589.13 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:40:54,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236900 2023-11-21 16:41:03,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1579313.3333333333, ans=0.125 2023-11-21 16:41:10,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1579380.0, ans=0.0 2023-11-21 16:41:16,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1579446.6666666667, ans=0.1 2023-11-21 16:41:36,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1579513.3333333333, ans=0.0 2023-11-21 16:41:44,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1579580.0, ans=0.0 2023-11-21 16:41:54,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1579580.0, ans=0.125 2023-11-21 16:41:56,704 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8500, loss[loss=0.08149, simple_loss=0.1092, pruned_loss=0.01833, audio_tagging_loss=0.008566, over 15550.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.0962, pruned_loss=0.01625, audio_tagging_loss=0.00946, over 3041813.76 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:41:58,196 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 236950 2023-11-21 16:42:17,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:42:19,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-21 16:42:27,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1579780.0, ans=0.125 2023-11-21 16:42:29,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 7.919e+01 8.659e+01 9.250e+01 1.200e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 16:42:40,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1579846.6666666667, ans=0.125 2023-11-21 16:42:44,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2023-11-21 16:43:01,482 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8550, loss[loss=0.07902, simple_loss=0.1078, pruned_loss=0.01508, audio_tagging_loss=0.01003, over 15681.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.09646, pruned_loss=0.01626, audio_tagging_loss=0.00946, over 3044097.35 frames. ], batch size: 58, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:43:02,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237000 2023-11-21 16:43:05,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1579980.0, ans=0.125 2023-11-21 16:43:15,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1580046.6666666667, ans=0.0 2023-11-21 16:43:18,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1580046.6666666667, ans=0.07 2023-11-21 16:43:22,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-11-21 16:43:44,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1580180.0, ans=0.125 2023-11-21 16:43:56,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1580246.6666666667, ans=0.0 2023-11-21 16:44:06,842 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8600, loss[loss=0.06413, simple_loss=0.08481, pruned_loss=0.01269, audio_tagging_loss=0.009041, over 15168.00 frames. ], tot_loss[loss=0.07384, simple_loss=0.09626, pruned_loss=0.01623, audio_tagging_loss=0.009483, over 3046440.09 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:44:08,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237050 2023-11-21 16:44:38,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.198e+01 8.655e+01 9.570e+01 1.230e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 16:44:49,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1580513.3333333333, ans=0.125 2023-11-21 16:44:54,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580513.3333333333, ans=0.1 2023-11-21 16:45:06,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580580.0, ans=0.1 2023-11-21 16:45:09,723 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8650, loss[loss=0.08648, simple_loss=0.1025, pruned_loss=0.02269, audio_tagging_loss=0.01251, over 14912.00 frames. ], tot_loss[loss=0.07446, simple_loss=0.097, pruned_loss=0.01649, audio_tagging_loss=0.009467, over 3047573.45 frames. ], batch size: 56, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:45:11,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237100 2023-11-21 16:45:39,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1580780.0, ans=0.1 2023-11-21 16:45:44,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1580780.0, ans=0.0 2023-11-21 16:45:59,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1580913.3333333333, ans=0.1 2023-11-21 16:46:01,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1580913.3333333333, ans=0.2 2023-11-21 16:46:07,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-21 16:46:14,118 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8700, loss[loss=0.08615, simple_loss=0.1232, pruned_loss=0.01841, audio_tagging_loss=0.006135, over 16150.00 frames. ], tot_loss[loss=0.07451, simple_loss=0.09695, pruned_loss=0.0164, audio_tagging_loss=0.009631, over 3043538.81 frames. ], batch size: 57, lr: 3.41e-03, grad_scale: 16.0 2023-11-21 16:46:15,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237150 2023-11-21 16:46:45,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.271e+01 8.853e+01 9.798e+01 1.350e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-21 16:47:17,481 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8750, loss[loss=0.05783, simple_loss=0.0683, pruned_loss=0.01165, audio_tagging_loss=0.01204, over 14515.00 frames. ], tot_loss[loss=0.07469, simple_loss=0.09729, pruned_loss=0.01644, audio_tagging_loss=0.009605, over 3043518.99 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 16:47:18,766 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237200 2023-11-21 16:47:38,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1581380.0, ans=0.125 2023-11-21 16:47:41,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1581446.6666666667, ans=0.125 2023-11-21 16:48:01,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1581513.3333333333, ans=0.0 2023-11-21 16:48:01,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1581513.3333333333, ans=0.025 2023-11-21 16:48:15,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1581580.0, ans=0.125 2023-11-21 16:48:20,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1581646.6666666667, ans=0.125 2023-11-21 16:48:21,635 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8800, loss[loss=0.09998, simple_loss=0.1404, pruned_loss=0.02473, audio_tagging_loss=0.005056, over 14347.00 frames. ], tot_loss[loss=0.07538, simple_loss=0.09814, pruned_loss=0.01662, audio_tagging_loss=0.009686, over 3049253.00 frames. ], batch size: 54, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:48:22,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237250 2023-11-21 16:48:23,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2023-11-21 16:48:48,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1581780.0, ans=0.05 2023-11-21 16:48:53,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.270e+01 9.064e+01 1.008e+02 1.291e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-21 16:49:08,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-21 16:49:17,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-21 16:49:25,657 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8850, loss[loss=0.06893, simple_loss=0.08265, pruned_loss=0.01649, audio_tagging_loss=0.01112, over 15956.00 frames. ], tot_loss[loss=0.07464, simple_loss=0.09708, pruned_loss=0.01645, audio_tagging_loss=0.009646, over 3051031.12 frames. ], batch size: 60, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:49:26,932 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237300 2023-11-21 16:49:33,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1581980.0, ans=0.125 2023-11-21 16:49:38,442 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 16:49:41,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1582046.6666666667, ans=0.0 2023-11-21 16:49:58,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-11-21 16:50:00,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2023-11-21 16:50:07,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2023-11-21 16:50:11,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1582180.0, ans=10.0 2023-11-21 16:50:17,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1582246.6666666667, ans=0.125 2023-11-21 16:50:30,400 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8900, loss[loss=0.0728, simple_loss=0.09869, pruned_loss=0.01423, audio_tagging_loss=0.009231, over 14772.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09617, pruned_loss=0.0162, audio_tagging_loss=0.009686, over 3047625.41 frames. ], batch size: 54, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:50:31,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237350 2023-11-21 16:50:45,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1582380.0, ans=0.2 2023-11-21 16:51:01,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 7.931e+01 8.617e+01 9.547e+01 1.329e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 16:51:20,251 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:51:31,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-21 16:51:33,471 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 8950, loss[loss=0.07798, simple_loss=0.1106, pruned_loss=0.01559, audio_tagging_loss=0.0071, over 14990.00 frames. ], tot_loss[loss=0.07386, simple_loss=0.09613, pruned_loss=0.01624, audio_tagging_loss=0.009553, over 3041386.32 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:51:34,825 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237400 2023-11-21 16:51:38,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2023-11-21 16:51:46,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1582713.3333333333, ans=0.125 2023-11-21 16:51:46,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1582713.3333333333, ans=0.04949747468305833 2023-11-21 16:51:55,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1582713.3333333333, ans=0.0 2023-11-21 16:51:56,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1582713.3333333333, ans=0.2 2023-11-21 16:51:56,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1582713.3333333333, ans=0.125 2023-11-21 16:52:23,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1582846.6666666667, ans=0.125 2023-11-21 16:52:25,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.29 vs. limit=10.0 2023-11-21 16:52:38,619 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9000, loss[loss=0.06355, simple_loss=0.0884, pruned_loss=0.0106, audio_tagging_loss=0.008749, over 15611.00 frames. ], tot_loss[loss=0.07405, simple_loss=0.09655, pruned_loss=0.01632, audio_tagging_loss=0.009465, over 3046476.02 frames. ], batch size: 60, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:52:38,620 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 16:53:19,760 INFO [train_asr.py:1253] (1/4) Epoch 20, validation: loss=0.06046, simple_loss=0.05217, pruned_loss=0.005278, audio_tagging_loss=0.0291, over 4681554.00 frames. 2023-11-21 16:53:19,761 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 16:53:21,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237450 2023-11-21 16:53:23,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1582980.0, ans=0.2 2023-11-21 16:53:34,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1583046.6666666667, ans=0.125 2023-11-21 16:53:52,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.325e+01 8.299e+01 8.982e+01 9.463e+01 1.306e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-21 16:53:58,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1583180.0, ans=0.0 2023-11-21 16:53:59,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1583180.0, ans=0.2 2023-11-21 16:54:02,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1583180.0, ans=0.2 2023-11-21 16:54:22,591 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9050, loss[loss=0.06323, simple_loss=0.07743, pruned_loss=0.01743, audio_tagging_loss=0.007083, over 14817.00 frames. ], tot_loss[loss=0.07405, simple_loss=0.09647, pruned_loss=0.01638, audio_tagging_loss=0.009432, over 3051733.39 frames. ], batch size: 59, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 16:54:23,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237500 2023-11-21 16:54:24,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-21 16:54:26,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1583313.3333333333, ans=0.0 2023-11-21 16:54:31,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1583313.3333333333, ans=0.125 2023-11-21 16:54:33,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1583313.3333333333, ans=0.2 2023-11-21 16:54:34,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.90 vs. limit=22.5 2023-11-21 16:55:10,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1583513.3333333333, ans=0.0 2023-11-21 16:55:15,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1583580.0, ans=0.1 2023-11-21 16:55:27,158 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9100, loss[loss=0.05456, simple_loss=0.07238, pruned_loss=0.01072, audio_tagging_loss=0.00765, over 14778.00 frames. ], tot_loss[loss=0.07385, simple_loss=0.09648, pruned_loss=0.01627, audio_tagging_loss=0.00934, over 3046438.33 frames. ], batch size: 58, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 16:55:28,498 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237550 2023-11-21 16:55:35,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1583646.6666666667, ans=0.2 2023-11-21 16:55:48,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1583713.3333333333, ans=0.125 2023-11-21 16:56:00,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.100e+01 8.666e+01 9.594e+01 1.245e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 16:56:00,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1583780.0, ans=0.0 2023-11-21 16:56:01,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1583780.0, ans=0.2 2023-11-21 16:56:13,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1583846.6666666667, ans=0.125 2023-11-21 16:56:23,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1583913.3333333333, ans=0.125 2023-11-21 16:56:32,102 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9150, loss[loss=0.06689, simple_loss=0.09441, pruned_loss=0.01201, audio_tagging_loss=0.007683, over 14265.00 frames. ], tot_loss[loss=0.07389, simple_loss=0.09664, pruned_loss=0.01627, audio_tagging_loss=0.009303, over 3053020.39 frames. ], batch size: 53, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 16:56:33,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237600 2023-11-21 16:56:41,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1583980.0, ans=0.125 2023-11-21 16:56:45,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-21 16:56:49,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-21 16:56:56,761 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:56:58,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1584113.3333333333, ans=0.125 2023-11-21 16:57:06,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1584113.3333333333, ans=0.125 2023-11-21 16:57:25,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2023-11-21 16:57:31,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2023-11-21 16:57:35,472 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9200, loss[loss=0.0927, simple_loss=0.1203, pruned_loss=0.02489, audio_tagging_loss=0.007632, over 15396.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09655, pruned_loss=0.0163, audio_tagging_loss=0.009341, over 3055161.90 frames. ], batch size: 54, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:57:36,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237650 2023-11-21 16:57:38,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1584313.3333333333, ans=0.07 2023-11-21 16:57:49,144 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 16:57:50,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1584380.0, ans=0.0 2023-11-21 16:58:08,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.012e+01 8.576e+01 9.231e+01 1.366e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-21 16:58:18,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1584513.3333333333, ans=0.2 2023-11-21 16:58:29,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1584580.0, ans=0.0 2023-11-21 16:58:37,805 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9250, loss[loss=0.04439, simple_loss=0.06144, pruned_loss=0.005075, audio_tagging_loss=0.008599, over 15078.00 frames. ], tot_loss[loss=0.07336, simple_loss=0.0957, pruned_loss=0.01614, audio_tagging_loss=0.009368, over 3062102.49 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:58:39,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237700 2023-11-21 16:58:56,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-21 16:59:09,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1584780.0, ans=0.0 2023-11-21 16:59:16,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1584846.6666666667, ans=0.125 2023-11-21 16:59:20,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1584846.6666666667, ans=0.125 2023-11-21 16:59:22,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1584846.6666666667, ans=0.125 2023-11-21 16:59:42,838 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9300, loss[loss=0.06125, simple_loss=0.08077, pruned_loss=0.0122, audio_tagging_loss=0.008664, over 13731.00 frames. ], tot_loss[loss=0.07366, simple_loss=0.09628, pruned_loss=0.0162, audio_tagging_loss=0.009315, over 3065063.43 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 16:59:44,230 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237750 2023-11-21 16:59:44,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1584980.0, ans=0.125 2023-11-21 17:00:14,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 8.079e+01 8.667e+01 9.296e+01 1.099e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 17:00:43,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1585246.6666666667, ans=0.125 2023-11-21 17:00:45,967 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9350, loss[loss=0.06956, simple_loss=0.08175, pruned_loss=0.01777, audio_tagging_loss=0.01091, over 16492.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09543, pruned_loss=0.01596, audio_tagging_loss=0.009407, over 3065106.02 frames. ], batch size: 62, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:00:46,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=22.5 2023-11-21 17:00:47,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237800 2023-11-21 17:00:57,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-11-21 17:01:03,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1585380.0, ans=0.0 2023-11-21 17:01:48,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1585646.6666666667, ans=0.2 2023-11-21 17:01:49,446 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9400, loss[loss=0.09262, simple_loss=0.1189, pruned_loss=0.02321, audio_tagging_loss=0.00998, over 15386.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09552, pruned_loss=0.01602, audio_tagging_loss=0.00951, over 3060815.70 frames. ], batch size: 57, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:01:50,855 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237850 2023-11-21 17:01:52,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1585646.6666666667, ans=0.0 2023-11-21 17:01:55,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1585646.6666666667, ans=0.125 2023-11-21 17:01:58,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1585646.6666666667, ans=0.05 2023-11-21 17:01:59,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1585646.6666666667, ans=0.0 2023-11-21 17:02:00,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-21 17:02:24,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.317e+01 9.011e+01 9.971e+01 1.419e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-21 17:02:26,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1585780.0, ans=0.0 2023-11-21 17:02:29,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-21 17:02:34,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-21 17:02:42,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1585913.3333333333, ans=0.125 2023-11-21 17:02:44,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1585913.3333333333, ans=0.125 2023-11-21 17:02:53,504 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:02:55,317 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9450, loss[loss=0.06761, simple_loss=0.07813, pruned_loss=0.01434, audio_tagging_loss=0.0142, over 15224.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09536, pruned_loss=0.0159, audio_tagging_loss=0.009612, over 3047177.02 frames. ], batch size: 58, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:02:56,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237900 2023-11-21 17:02:58,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1585980.0, ans=0.0 2023-11-21 17:03:09,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1586046.6666666667, ans=0.125 2023-11-21 17:03:59,159 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9500, loss[loss=0.1003, simple_loss=0.1222, pruned_loss=0.02986, audio_tagging_loss=0.009307, over 15216.00 frames. ], tot_loss[loss=0.07358, simple_loss=0.09574, pruned_loss=0.01599, audio_tagging_loss=0.009726, over 3052205.07 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:04:00,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 237950 2023-11-21 17:04:07,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1586313.3333333333, ans=0.125 2023-11-21 17:04:19,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-11-21 17:04:28,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1586446.6666666667, ans=0.0 2023-11-21 17:04:31,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.163e+01 8.915e+01 9.675e+01 1.127e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-21 17:04:35,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1586446.6666666667, ans=0.0 2023-11-21 17:04:43,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1586513.3333333333, ans=0.0 2023-11-21 17:04:45,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1586513.3333333333, ans=0.125 2023-11-21 17:04:52,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1586580.0, ans=0.125 2023-11-21 17:04:54,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1586580.0, ans=0.0 2023-11-21 17:05:01,656 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9550, loss[loss=0.09679, simple_loss=0.1261, pruned_loss=0.02568, audio_tagging_loss=0.008071, over 14782.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.09525, pruned_loss=0.01587, audio_tagging_loss=0.009849, over 3052220.47 frames. ], batch size: 54, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:05:03,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238000 2023-11-21 17:05:04,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1586646.6666666667, ans=0.125 2023-11-21 17:05:35,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1586780.0, ans=0.125 2023-11-21 17:05:37,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1586780.0, ans=0.04949747468305833 2023-11-21 17:05:43,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1586846.6666666667, ans=0.0 2023-11-21 17:05:45,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-21 17:05:48,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1586846.6666666667, ans=0.0 2023-11-21 17:05:51,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-21 17:05:56,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1586913.3333333333, ans=0.1 2023-11-21 17:06:06,220 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9600, loss[loss=0.07531, simple_loss=0.09549, pruned_loss=0.01811, audio_tagging_loss=0.009457, over 16374.00 frames. ], tot_loss[loss=0.07357, simple_loss=0.09558, pruned_loss=0.01594, audio_tagging_loss=0.009842, over 3054785.26 frames. ], batch size: 60, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:06:07,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238050 2023-11-21 17:06:38,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.410e+01 9.133e+01 9.513e+01 1.325e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-21 17:07:00,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2023-11-21 17:07:07,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1587246.6666666667, ans=0.0 2023-11-21 17:07:10,201 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9650, loss[loss=0.05026, simple_loss=0.05854, pruned_loss=0.008865, audio_tagging_loss=0.01212, over 14107.00 frames. ], tot_loss[loss=0.07389, simple_loss=0.09627, pruned_loss=0.01604, audio_tagging_loss=0.009713, over 3056994.28 frames. ], batch size: 55, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:07:11,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238100 2023-11-21 17:07:14,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1587313.3333333333, ans=0.125 2023-11-21 17:07:21,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1587380.0, ans=0.5 2023-11-21 17:07:25,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1587380.0, ans=0.0 2023-11-21 17:07:30,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1587380.0, ans=0.125 2023-11-21 17:07:48,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1587513.3333333333, ans=0.05 2023-11-21 17:07:52,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1587513.3333333333, ans=0.125 2023-11-21 17:07:52,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-11-21 17:08:10,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1587580.0, ans=0.09899494936611666 2023-11-21 17:08:13,224 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9700, loss[loss=0.08717, simple_loss=0.1197, pruned_loss=0.01987, audio_tagging_loss=0.007458, over 16194.00 frames. ], tot_loss[loss=0.07419, simple_loss=0.09654, pruned_loss=0.01633, audio_tagging_loss=0.009598, over 3048095.82 frames. ], batch size: 59, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:08:14,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238150 2023-11-21 17:08:14,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1587646.6666666667, ans=0.2 2023-11-21 17:08:15,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-21 17:08:21,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587646.6666666667, ans=0.1 2023-11-21 17:08:46,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.435e+01 8.277e+01 8.905e+01 9.605e+01 1.183e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-21 17:09:04,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1587913.3333333333, ans=0.125 2023-11-21 17:09:17,303 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9750, loss[loss=0.08187, simple_loss=0.1202, pruned_loss=0.01462, audio_tagging_loss=0.00716, over 14919.00 frames. ], tot_loss[loss=0.07388, simple_loss=0.09632, pruned_loss=0.01622, audio_tagging_loss=0.009509, over 3042350.38 frames. ], batch size: 55, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:09:18,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238200 2023-11-21 17:09:25,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1587980.0, ans=0.1 2023-11-21 17:09:45,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1588113.3333333333, ans=0.2 2023-11-21 17:10:03,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=8.0 2023-11-21 17:10:03,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1588180.0, ans=0.125 2023-11-21 17:10:11,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1588246.6666666667, ans=0.125 2023-11-21 17:10:19,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-21 17:10:21,268 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9800, loss[loss=0.0525, simple_loss=0.07227, pruned_loss=0.009178, audio_tagging_loss=0.007189, over 14522.00 frames. ], tot_loss[loss=0.07371, simple_loss=0.09631, pruned_loss=0.01619, audio_tagging_loss=0.009362, over 3037505.33 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 32.0 2023-11-21 17:10:22,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238250 2023-11-21 17:10:24,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-21 17:10:29,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1588313.3333333333, ans=0.125 2023-11-21 17:10:35,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1588380.0, ans=0.125 2023-11-21 17:10:35,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-11-21 17:10:54,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1588446.6666666667, ans=0.125 2023-11-21 17:10:55,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.038e+01 8.658e+01 9.399e+01 1.112e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 17:10:59,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1588513.3333333333, ans=0.2 2023-11-21 17:11:05,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1588513.3333333333, ans=0.125 2023-11-21 17:11:17,618 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:11:22,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1588580.0, ans=0.0 2023-11-21 17:11:24,834 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9850, loss[loss=0.0802, simple_loss=0.105, pruned_loss=0.0168, audio_tagging_loss=0.01088, over 13931.00 frames. ], tot_loss[loss=0.07408, simple_loss=0.09693, pruned_loss=0.01639, audio_tagging_loss=0.009223, over 3045550.19 frames. ], batch size: 52, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 17:11:26,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238300 2023-11-21 17:11:43,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1588713.3333333333, ans=0.125 2023-11-21 17:11:51,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1588780.0, ans=0.125 2023-11-21 17:12:29,621 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9900, loss[loss=0.07502, simple_loss=0.09672, pruned_loss=0.01665, audio_tagging_loss=0.01001, over 16350.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09699, pruned_loss=0.01651, audio_tagging_loss=0.009328, over 3048912.86 frames. ], batch size: 60, lr: 3.40e-03, grad_scale: 8.0 2023-11-21 17:12:30,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238350 2023-11-21 17:12:36,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2023-11-21 17:13:04,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 7.926e+01 8.564e+01 9.409e+01 1.276e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 17:13:05,372 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:13:13,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1589180.0, ans=0.1 2023-11-21 17:13:31,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1589246.6666666667, ans=0.0 2023-11-21 17:13:33,214 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 9950, loss[loss=0.07369, simple_loss=0.1055, pruned_loss=0.01253, audio_tagging_loss=0.008416, over 15448.00 frames. ], tot_loss[loss=0.07396, simple_loss=0.09655, pruned_loss=0.01636, audio_tagging_loss=0.009321, over 3048760.83 frames. ], batch size: 55, lr: 3.40e-03, grad_scale: 8.0 2023-11-21 17:13:34,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238400 2023-11-21 17:13:42,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1589313.3333333333, ans=0.125 2023-11-21 17:13:50,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-21 17:14:04,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1589446.6666666667, ans=0.125 2023-11-21 17:14:22,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1589513.3333333333, ans=0.1 2023-11-21 17:14:24,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1589580.0, ans=0.125 2023-11-21 17:14:28,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1589580.0, ans=0.125 2023-11-21 17:14:36,962 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10000, loss[loss=0.09849, simple_loss=0.1291, pruned_loss=0.02196, audio_tagging_loss=0.01198, over 16269.00 frames. ], tot_loss[loss=0.07382, simple_loss=0.09611, pruned_loss=0.0164, audio_tagging_loss=0.009376, over 3050800.40 frames. ], batch size: 56, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 17:14:38,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238450 2023-11-21 17:14:42,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-21 17:14:49,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1589713.3333333333, ans=0.1 2023-11-21 17:14:49,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1589713.3333333333, ans=0.0 2023-11-21 17:15:12,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1589780.0, ans=0.125 2023-11-21 17:15:13,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.031e+01 7.913e+01 8.716e+01 9.510e+01 1.205e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 17:15:28,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1589913.3333333333, ans=0.035 2023-11-21 17:15:28,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1589913.3333333333, ans=0.0 2023-11-21 17:15:41,446 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10050, loss[loss=0.06287, simple_loss=0.08662, pruned_loss=0.01237, audio_tagging_loss=0.007185, over 15020.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09512, pruned_loss=0.01612, audio_tagging_loss=0.009413, over 3041954.96 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 17:15:42,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238500 2023-11-21 17:15:44,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1589980.0, ans=0.07 2023-11-21 17:15:54,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1590046.6666666667, ans=0.1 2023-11-21 17:16:09,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-21 17:16:11,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1590113.3333333333, ans=0.125 2023-11-21 17:16:23,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1590180.0, ans=0.125 2023-11-21 17:16:34,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-21 17:16:40,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2023-11-21 17:16:46,159 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10100, loss[loss=0.0737, simple_loss=0.0982, pruned_loss=0.01642, audio_tagging_loss=0.008182, over 15633.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.0953, pruned_loss=0.01616, audio_tagging_loss=0.009507, over 3042349.17 frames. ], batch size: 58, lr: 3.40e-03, grad_scale: 16.0 2023-11-21 17:16:47,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238550 2023-11-21 17:17:01,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1590380.0, ans=0.125 2023-11-21 17:17:01,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2023-11-21 17:17:03,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1590380.0, ans=0.125 2023-11-21 17:17:21,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-11-21 17:17:22,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.174e+01 9.110e+01 9.882e+01 1.190e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-21 17:17:23,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1590446.6666666667, ans=0.125 2023-11-21 17:17:25,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-21 17:17:27,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1590513.3333333333, ans=0.125 2023-11-21 17:17:37,894 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:17:41,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1590580.0, ans=0.125 2023-11-21 17:17:49,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-21 17:17:49,986 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10150, loss[loss=0.07499, simple_loss=0.1001, pruned_loss=0.01697, audio_tagging_loss=0.007964, over 15079.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09513, pruned_loss=0.01609, audio_tagging_loss=0.00964, over 3044674.32 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:17:51,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238600 2023-11-21 17:17:58,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-21 17:18:20,039 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:18:32,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1590846.6666666667, ans=0.125 2023-11-21 17:18:35,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1590846.6666666667, ans=0.125 2023-11-21 17:18:39,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1590846.6666666667, ans=0.125 2023-11-21 17:18:54,010 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10200, loss[loss=0.05852, simple_loss=0.07885, pruned_loss=0.01028, audio_tagging_loss=0.008816, over 15853.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.0953, pruned_loss=0.01618, audio_tagging_loss=0.009594, over 3048727.96 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:18:55,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238650 2023-11-21 17:18:58,605 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:19:17,850 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:19:22,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1591113.3333333333, ans=0.125 2023-11-21 17:19:27,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1591113.3333333333, ans=0.0 2023-11-21 17:19:29,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.636e+01 9.230e+01 9.917e+01 1.306e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-21 17:19:35,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1591180.0, ans=0.2 2023-11-21 17:19:38,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-21 17:19:58,392 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10250, loss[loss=0.05233, simple_loss=0.06358, pruned_loss=0.01017, audio_tagging_loss=0.01037, over 14466.00 frames. ], tot_loss[loss=0.07323, simple_loss=0.09492, pruned_loss=0.0161, audio_tagging_loss=0.009676, over 3049198.15 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:19:59,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238700 2023-11-21 17:19:59,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1591313.3333333333, ans=0.125 2023-11-21 17:20:01,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1591313.3333333333, ans=0.0 2023-11-21 17:20:11,219 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:20:23,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1591446.6666666667, ans=0.0 2023-11-21 17:20:34,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1591446.6666666667, ans=0.1 2023-11-21 17:21:01,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-11-21 17:21:01,972 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10300, loss[loss=0.07914, simple_loss=0.1066, pruned_loss=0.01616, audio_tagging_loss=0.009686, over 14712.00 frames. ], tot_loss[loss=0.07343, simple_loss=0.09501, pruned_loss=0.01621, audio_tagging_loss=0.009718, over 3048715.95 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:21:03,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238750 2023-11-21 17:21:06,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-21 17:21:38,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 7.986e+01 8.562e+01 9.250e+01 1.387e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 17:21:40,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-21 17:21:44,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1591846.6666666667, ans=0.0 2023-11-21 17:21:55,871 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.489e-02 2023-11-21 17:22:05,304 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10350, loss[loss=0.07196, simple_loss=0.08621, pruned_loss=0.0148, audio_tagging_loss=0.01405, over 16077.00 frames. ], tot_loss[loss=0.07403, simple_loss=0.0959, pruned_loss=0.01633, audio_tagging_loss=0.009749, over 3047719.04 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:22:06,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238800 2023-11-21 17:22:15,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1591980.0, ans=0.125 2023-11-21 17:22:46,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1592180.0, ans=0.0 2023-11-21 17:23:10,375 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10400, loss[loss=0.05484, simple_loss=0.07153, pruned_loss=0.009549, audio_tagging_loss=0.009529, over 14269.00 frames. ], tot_loss[loss=0.07372, simple_loss=0.09549, pruned_loss=0.01617, audio_tagging_loss=0.009804, over 3042678.19 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:23:10,651 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:23:11,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238850 2023-11-21 17:23:46,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 7.924e+01 8.815e+01 9.464e+01 2.008e+02, threshold=1.763e+02, percent-clipped=1.0 2023-11-21 17:23:54,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1592513.3333333333, ans=0.2 2023-11-21 17:24:03,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1592580.0, ans=0.125 2023-11-21 17:24:14,062 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10450, loss[loss=0.09619, simple_loss=0.1195, pruned_loss=0.02955, audio_tagging_loss=0.006912, over 14847.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.0961, pruned_loss=0.01624, audio_tagging_loss=0.009656, over 3044912.07 frames. ], batch size: 57, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:24:15,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238900 2023-11-21 17:24:34,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2023-11-21 17:24:34,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1592713.3333333333, ans=0.0 2023-11-21 17:24:56,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1592846.6666666667, ans=0.125 2023-11-21 17:25:03,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.96 vs. limit=10.0 2023-11-21 17:25:12,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1592913.3333333333, ans=0.2 2023-11-21 17:25:16,678 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10500, loss[loss=0.07309, simple_loss=0.09434, pruned_loss=0.01374, audio_tagging_loss=0.01218, over 15104.00 frames. ], tot_loss[loss=0.07303, simple_loss=0.09497, pruned_loss=0.01599, audio_tagging_loss=0.009556, over 3044131.01 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:25:17,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 238950 2023-11-21 17:25:38,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1593046.6666666667, ans=0.125 2023-11-21 17:25:54,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 7.981e+01 8.666e+01 9.446e+01 1.198e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 17:26:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1593246.6666666667, ans=0.0 2023-11-21 17:26:22,630 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10550, loss[loss=0.05634, simple_loss=0.06455, pruned_loss=0.01186, audio_tagging_loss=0.01221, over 14480.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.09512, pruned_loss=0.01594, audio_tagging_loss=0.009519, over 3048674.71 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:26:22,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1593313.3333333333, ans=0.0 2023-11-21 17:26:23,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239000 2023-11-21 17:26:33,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=22.5 2023-11-21 17:26:46,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1593446.6666666667, ans=0.0 2023-11-21 17:27:06,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1593513.3333333333, ans=0.07 2023-11-21 17:27:08,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-21 17:27:10,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-21 17:27:23,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1593580.0, ans=0.125 2023-11-21 17:27:26,927 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10600, loss[loss=0.06911, simple_loss=0.08434, pruned_loss=0.01553, audio_tagging_loss=0.01141, over 15640.00 frames. ], tot_loss[loss=0.07346, simple_loss=0.09576, pruned_loss=0.01617, audio_tagging_loss=0.009412, over 3047623.66 frames. ], batch size: 59, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:27:28,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239050 2023-11-21 17:27:48,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1593713.3333333333, ans=0.0 2023-11-21 17:28:04,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.229e+01 8.607e+01 9.453e+01 1.327e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-21 17:28:13,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1593846.6666666667, ans=0.0 2023-11-21 17:28:30,400 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10650, loss[loss=0.05363, simple_loss=0.06655, pruned_loss=0.009002, audio_tagging_loss=0.01135, over 16335.00 frames. ], tot_loss[loss=0.07347, simple_loss=0.09593, pruned_loss=0.01614, audio_tagging_loss=0.009366, over 3045036.58 frames. ], batch size: 62, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:28:31,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239100 2023-11-21 17:28:41,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-11-21 17:28:42,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1594046.6666666667, ans=0.125 2023-11-21 17:28:58,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=12.0 2023-11-21 17:29:08,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1594180.0, ans=0.0 2023-11-21 17:29:16,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594180.0, ans=0.1 2023-11-21 17:29:19,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1594180.0, ans=0.0 2023-11-21 17:29:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594246.6666666667, ans=0.1 2023-11-21 17:29:34,668 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10700, loss[loss=0.05517, simple_loss=0.06178, pruned_loss=0.00934, audio_tagging_loss=0.01493, over 15214.00 frames. ], tot_loss[loss=0.07366, simple_loss=0.09626, pruned_loss=0.01619, audio_tagging_loss=0.009344, over 3042817.05 frames. ], batch size: 57, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:29:36,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239150 2023-11-21 17:29:56,869 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:30:11,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.191e+01 8.750e+01 9.714e+01 1.316e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 17:30:18,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1594513.3333333333, ans=0.0 2023-11-21 17:30:18,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-11-21 17:30:39,177 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10750, loss[loss=0.07949, simple_loss=0.1158, pruned_loss=0.01385, audio_tagging_loss=0.007721, over 15750.00 frames. ], tot_loss[loss=0.07352, simple_loss=0.0963, pruned_loss=0.01613, audio_tagging_loss=0.009234, over 3048213.93 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:30:40,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239200 2023-11-21 17:30:55,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-21 17:31:05,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1594780.0, ans=0.0 2023-11-21 17:31:05,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-11-21 17:31:07,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1594780.0, ans=0.0 2023-11-21 17:31:35,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=8.0 2023-11-21 17:31:43,372 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10800, loss[loss=0.0688, simple_loss=0.09475, pruned_loss=0.01387, audio_tagging_loss=0.007556, over 15126.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.09517, pruned_loss=0.01582, audio_tagging_loss=0.009334, over 3047870.89 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:31:44,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239250 2023-11-21 17:32:16,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1595113.3333333333, ans=0.0 2023-11-21 17:32:21,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.126e+01 8.721e+01 9.296e+01 1.155e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 17:32:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1595180.0, ans=0.1 2023-11-21 17:32:32,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1595180.0, ans=0.0 2023-11-21 17:32:48,041 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10850, loss[loss=0.06178, simple_loss=0.07288, pruned_loss=0.01383, audio_tagging_loss=0.01151, over 14040.00 frames. ], tot_loss[loss=0.07276, simple_loss=0.0951, pruned_loss=0.0159, audio_tagging_loss=0.009314, over 3043445.00 frames. ], batch size: 54, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:32:49,344 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239300 2023-11-21 17:33:10,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1595380.0, ans=0.0 2023-11-21 17:33:23,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1595446.6666666667, ans=0.125 2023-11-21 17:33:42,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1595580.0, ans=0.035 2023-11-21 17:33:46,327 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:33:51,746 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10900, loss[loss=0.07526, simple_loss=0.1028, pruned_loss=0.01492, audio_tagging_loss=0.008946, over 15870.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09522, pruned_loss=0.01597, audio_tagging_loss=0.009438, over 3044973.70 frames. ], batch size: 58, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:33:53,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239350 2023-11-21 17:33:57,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1595646.6666666667, ans=0.125 2023-11-21 17:34:08,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1595713.3333333333, ans=0.125 2023-11-21 17:34:28,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-21 17:34:28,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.126e+01 8.588e+01 9.275e+01 1.184e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-21 17:34:33,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-21 17:34:35,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1595846.6666666667, ans=0.125 2023-11-21 17:34:38,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1595846.6666666667, ans=0.0 2023-11-21 17:34:39,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1595846.6666666667, ans=0.0 2023-11-21 17:34:50,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1595913.3333333333, ans=0.07 2023-11-21 17:34:52,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1595913.3333333333, ans=0.0 2023-11-21 17:34:53,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595913.3333333333, ans=0.1 2023-11-21 17:34:56,179 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 10950, loss[loss=0.06048, simple_loss=0.07749, pruned_loss=0.01206, audio_tagging_loss=0.009676, over 15389.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.09467, pruned_loss=0.01585, audio_tagging_loss=0.009459, over 3042304.02 frames. ], batch size: 57, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:34:57,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239400 2023-11-21 17:35:05,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1595980.0, ans=0.125 2023-11-21 17:35:06,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1595980.0, ans=0.0 2023-11-21 17:35:06,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1595980.0, ans=0.2 2023-11-21 17:35:21,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1596113.3333333333, ans=0.1 2023-11-21 17:35:22,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1596113.3333333333, ans=0.125 2023-11-21 17:35:23,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2023-11-21 17:35:37,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1596180.0, ans=0.125 2023-11-21 17:35:40,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1596180.0, ans=0.04949747468305833 2023-11-21 17:35:45,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1596180.0, ans=0.1 2023-11-21 17:36:00,727 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11000, loss[loss=0.0485, simple_loss=0.06086, pruned_loss=0.009017, audio_tagging_loss=0.009055, over 15971.00 frames. ], tot_loss[loss=0.07347, simple_loss=0.09585, pruned_loss=0.01602, audio_tagging_loss=0.009516, over 3049766.94 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:36:02,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239450 2023-11-21 17:36:04,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1596313.3333333333, ans=0.05 2023-11-21 17:36:11,852 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:36:15,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1596380.0, ans=0.1 2023-11-21 17:36:39,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1596513.3333333333, ans=0.125 2023-11-21 17:36:39,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1596513.3333333333, ans=0.125 2023-11-21 17:36:40,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 8.031e+01 8.533e+01 9.347e+01 2.206e+02, threshold=1.707e+02, percent-clipped=1.0 2023-11-21 17:36:49,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1596513.3333333333, ans=0.1 2023-11-21 17:37:06,728 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11050, loss[loss=0.05875, simple_loss=0.07264, pruned_loss=0.01246, audio_tagging_loss=0.009964, over 14690.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09484, pruned_loss=0.01582, audio_tagging_loss=0.009682, over 3045065.13 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:37:08,077 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239500 2023-11-21 17:37:36,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1596780.0, ans=0.2 2023-11-21 17:37:37,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1596780.0, ans=0.0 2023-11-21 17:37:38,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1596780.0, ans=0.0 2023-11-21 17:37:40,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1596780.0, ans=0.1 2023-11-21 17:37:47,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-21 17:38:07,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:38:11,234 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11100, loss[loss=0.07358, simple_loss=0.09205, pruned_loss=0.01746, audio_tagging_loss=0.0101, over 14915.00 frames. ], tot_loss[loss=0.0731, simple_loss=0.09482, pruned_loss=0.01595, audio_tagging_loss=0.009734, over 3050409.04 frames. ], batch size: 60, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:38:12,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239550 2023-11-21 17:38:27,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1597046.6666666667, ans=0.0 2023-11-21 17:38:34,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1597046.6666666667, ans=0.07 2023-11-21 17:38:49,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.177e+01 8.722e+01 9.312e+01 1.121e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 17:39:14,671 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11150, loss[loss=0.06143, simple_loss=0.08566, pruned_loss=0.01002, audio_tagging_loss=0.008581, over 15927.00 frames. ], tot_loss[loss=0.07361, simple_loss=0.09508, pruned_loss=0.01627, audio_tagging_loss=0.009799, over 3046550.52 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:39:15,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239600 2023-11-21 17:39:16,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1597313.3333333333, ans=0.2 2023-11-21 17:39:16,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-11-21 17:39:20,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1597313.3333333333, ans=0.125 2023-11-21 17:39:53,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-21 17:40:19,216 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11200, loss[loss=0.06481, simple_loss=0.07869, pruned_loss=0.01462, audio_tagging_loss=0.01084, over 15257.00 frames. ], tot_loss[loss=0.07365, simple_loss=0.09516, pruned_loss=0.01622, audio_tagging_loss=0.009855, over 3048964.58 frames. ], batch size: 60, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:40:19,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1597646.6666666667, ans=0.1 2023-11-21 17:40:20,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239650 2023-11-21 17:40:27,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1597646.6666666667, ans=0.02 2023-11-21 17:40:41,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1597713.3333333333, ans=0.125 2023-11-21 17:40:45,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1597780.0, ans=0.125 2023-11-21 17:40:53,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1597780.0, ans=0.0 2023-11-21 17:40:57,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-21 17:40:58,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.115e+01 8.678e+01 9.817e+01 1.232e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 17:41:23,464 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11250, loss[loss=0.08198, simple_loss=0.09838, pruned_loss=0.02063, audio_tagging_loss=0.01215, over 14870.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09534, pruned_loss=0.0164, audio_tagging_loss=0.009907, over 3046377.86 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 32.0 2023-11-21 17:41:23,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1597980.0, ans=0.125 2023-11-21 17:41:24,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239700 2023-11-21 17:41:33,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1597980.0, ans=0.1 2023-11-21 17:42:04,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1598180.0, ans=0.025 2023-11-21 17:42:17,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1598246.6666666667, ans=0.125 2023-11-21 17:42:19,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-21 17:42:27,186 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11300, loss[loss=0.09084, simple_loss=0.1126, pruned_loss=0.02418, audio_tagging_loss=0.01034, over 15575.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09632, pruned_loss=0.01648, audio_tagging_loss=0.009676, over 3047768.99 frames. ], batch size: 57, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:42:28,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239750 2023-11-21 17:42:38,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1598380.0, ans=0.125 2023-11-21 17:43:06,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.360e+01 8.263e+01 8.887e+01 9.511e+01 1.252e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 17:43:29,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1598646.6666666667, ans=0.0 2023-11-21 17:43:30,681 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11350, loss[loss=0.06403, simple_loss=0.07944, pruned_loss=0.0131, audio_tagging_loss=0.01121, over 14836.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09588, pruned_loss=0.01646, audio_tagging_loss=0.009586, over 3044050.49 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:43:31,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239800 2023-11-21 17:43:33,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1598646.6666666667, ans=0.125 2023-11-21 17:43:33,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1598646.6666666667, ans=0.125 2023-11-21 17:43:33,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1598646.6666666667, ans=0.125 2023-11-21 17:43:39,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1598646.6666666667, ans=0.1 2023-11-21 17:44:00,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1598780.0, ans=0.0 2023-11-21 17:44:33,789 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11400, loss[loss=0.05523, simple_loss=0.06181, pruned_loss=0.01095, audio_tagging_loss=0.01338, over 14694.00 frames. ], tot_loss[loss=0.07452, simple_loss=0.09651, pruned_loss=0.01686, audio_tagging_loss=0.00941, over 3049574.93 frames. ], batch size: 58, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:44:35,074 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239850 2023-11-21 17:45:04,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1599113.3333333333, ans=0.125 2023-11-21 17:45:12,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=22.5 2023-11-21 17:45:12,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.120e+01 8.500e+01 9.528e+01 1.320e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-21 17:45:15,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-21 17:45:25,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-21 17:45:29,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-21 17:45:31,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2023-11-21 17:45:37,191 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11450, loss[loss=0.06857, simple_loss=0.0889, pruned_loss=0.01569, audio_tagging_loss=0.008426, over 15335.00 frames. ], tot_loss[loss=0.07446, simple_loss=0.09652, pruned_loss=0.0168, audio_tagging_loss=0.009396, over 3049688.32 frames. ], batch size: 58, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:45:38,531 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239900 2023-11-21 17:46:21,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-21 17:46:23,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1599513.3333333333, ans=0.2 2023-11-21 17:46:40,478 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11500, loss[loss=0.07704, simple_loss=0.1054, pruned_loss=0.01881, audio_tagging_loss=0.00551, over 15262.00 frames. ], tot_loss[loss=0.0742, simple_loss=0.0961, pruned_loss=0.01676, audio_tagging_loss=0.009386, over 3055672.16 frames. ], batch size: 56, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:46:41,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 239950 2023-11-21 17:46:42,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1599646.6666666667, ans=0.0 2023-11-21 17:46:48,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1599646.6666666667, ans=0.125 2023-11-21 17:47:08,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-21 17:47:16,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2023-11-21 17:47:19,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 7.879e+01 8.671e+01 9.205e+01 1.101e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 17:47:19,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1599846.6666666667, ans=0.0 2023-11-21 17:47:26,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=22.5 2023-11-21 17:47:42,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1599980.0, ans=0.0 2023-11-21 17:47:43,799 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11550, loss[loss=0.05883, simple_loss=0.08139, pruned_loss=0.009508, audio_tagging_loss=0.008625, over 15599.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09708, pruned_loss=0.01691, audio_tagging_loss=0.009279, over 3054321.12 frames. ], batch size: 60, lr: 3.39e-03, grad_scale: 16.0 2023-11-21 17:47:45,084 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240000 2023-11-21 17:47:51,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1599980.0, ans=0.125 2023-11-21 17:47:53,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1599980.0, ans=0.125 2023-11-21 17:48:11,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1600113.3333333333, ans=0.09899494936611666 2023-11-21 17:48:13,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-21 17:48:24,242 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 17:48:38,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-21 17:48:50,296 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11600, loss[loss=0.08147, simple_loss=0.114, pruned_loss=0.01772, audio_tagging_loss=0.006755, over 14945.00 frames. ], tot_loss[loss=0.07484, simple_loss=0.09713, pruned_loss=0.01691, audio_tagging_loss=0.009373, over 3060404.76 frames. ], batch size: 54, lr: 3.38e-03, grad_scale: 32.0 2023-11-21 17:48:51,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240050 2023-11-21 17:48:54,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1600313.3333333333, ans=0.0 2023-11-21 17:49:11,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1600380.0, ans=0.1 2023-11-21 17:49:19,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-21 17:49:23,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1600446.6666666667, ans=0.125 2023-11-21 17:49:28,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1600513.3333333333, ans=0.125 2023-11-21 17:49:29,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.173e+01 8.910e+01 9.625e+01 1.220e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-21 17:49:54,064 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11650, loss[loss=0.09406, simple_loss=0.1266, pruned_loss=0.02075, audio_tagging_loss=0.01, over 15872.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09667, pruned_loss=0.01667, audio_tagging_loss=0.009422, over 3050713.22 frames. ], batch size: 57, lr: 3.38e-03, grad_scale: 32.0 2023-11-21 17:49:55,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240100 2023-11-21 17:50:00,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1600646.6666666667, ans=0.125 2023-11-21 17:50:26,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1600780.0, ans=0.0 2023-11-21 17:50:28,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1600780.0, ans=0.125 2023-11-21 17:50:36,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1600846.6666666667, ans=0.0 2023-11-21 17:50:58,005 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11700, loss[loss=0.05322, simple_loss=0.07357, pruned_loss=0.005916, audio_tagging_loss=0.01052, over 15103.00 frames. ], tot_loss[loss=0.07468, simple_loss=0.09729, pruned_loss=0.01669, audio_tagging_loss=0.009346, over 3053083.81 frames. ], batch size: 57, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:50:59,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240150 2023-11-21 17:51:00,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1600980.0, ans=0.125 2023-11-21 17:51:13,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1601046.6666666667, ans=0.125 2023-11-21 17:51:16,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1601046.6666666667, ans=0.95 2023-11-21 17:51:37,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2023-11-21 17:51:38,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.589e+01 8.116e+01 8.836e+01 9.594e+01 1.189e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 17:51:38,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1601180.0, ans=0.0 2023-11-21 17:51:44,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1601180.0, ans=0.0 2023-11-21 17:51:50,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1601246.6666666667, ans=0.0 2023-11-21 17:51:53,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1601246.6666666667, ans=0.125 2023-11-21 17:52:00,595 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11750, loss[loss=0.08272, simple_loss=0.1094, pruned_loss=0.01939, audio_tagging_loss=0.008613, over 15486.00 frames. ], tot_loss[loss=0.07402, simple_loss=0.0958, pruned_loss=0.01659, audio_tagging_loss=0.009526, over 3043773.10 frames. ], batch size: 59, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:52:01,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240200 2023-11-21 17:52:06,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1601313.3333333333, ans=0.09899494936611666 2023-11-21 17:52:10,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1601313.3333333333, ans=0.125 2023-11-21 17:52:19,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-11-21 17:52:23,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1601380.0, ans=0.125 2023-11-21 17:52:37,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1601446.6666666667, ans=0.05 2023-11-21 17:52:40,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1601513.3333333333, ans=0.05 2023-11-21 17:52:40,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1601513.3333333333, ans=0.125 2023-11-21 17:52:45,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1601513.3333333333, ans=0.125 2023-11-21 17:52:55,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1601580.0, ans=0.125 2023-11-21 17:53:03,893 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11800, loss[loss=0.06742, simple_loss=0.08582, pruned_loss=0.01508, audio_tagging_loss=0.009425, over 14938.00 frames. ], tot_loss[loss=0.07452, simple_loss=0.09652, pruned_loss=0.01682, audio_tagging_loss=0.009438, over 3047783.87 frames. ], batch size: 57, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:53:05,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240250 2023-11-21 17:53:18,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1601713.3333333333, ans=0.0 2023-11-21 17:53:18,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1601713.3333333333, ans=0.125 2023-11-21 17:53:20,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1601713.3333333333, ans=0.0 2023-11-21 17:53:21,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1601713.3333333333, ans=0.1 2023-11-21 17:53:44,067 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.430e+01 8.055e+01 8.902e+01 9.602e+01 1.321e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 17:53:48,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1601846.6666666667, ans=0.0 2023-11-21 17:53:54,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1601913.3333333333, ans=0.125 2023-11-21 17:54:07,648 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11850, loss[loss=0.05764, simple_loss=0.06112, pruned_loss=0.01353, audio_tagging_loss=0.01355, over 13396.00 frames. ], tot_loss[loss=0.07474, simple_loss=0.0967, pruned_loss=0.01683, audio_tagging_loss=0.009558, over 3047276.01 frames. ], batch size: 53, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:54:08,970 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240300 2023-11-21 17:54:11,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1601980.0, ans=0.0 2023-11-21 17:54:18,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1602046.6666666667, ans=0.0 2023-11-21 17:54:37,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1602113.3333333333, ans=0.0 2023-11-21 17:54:51,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1602180.0, ans=0.0 2023-11-21 17:55:01,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1602246.6666666667, ans=0.1 2023-11-21 17:55:11,000 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11900, loss[loss=0.09183, simple_loss=0.1172, pruned_loss=0.02364, audio_tagging_loss=0.009598, over 15298.00 frames. ], tot_loss[loss=0.0746, simple_loss=0.09637, pruned_loss=0.01667, audio_tagging_loss=0.009743, over 3041663.94 frames. ], batch size: 58, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:55:11,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1602313.3333333333, ans=0.0 2023-11-21 17:55:12,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240350 2023-11-21 17:55:15,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1602313.3333333333, ans=0.125 2023-11-21 17:55:23,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1602380.0, ans=0.125 2023-11-21 17:55:49,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1602513.3333333333, ans=0.125 2023-11-21 17:55:52,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 7.826e+01 8.536e+01 9.227e+01 1.162e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-21 17:56:06,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-21 17:56:14,807 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 11950, loss[loss=0.077, simple_loss=0.09271, pruned_loss=0.02043, audio_tagging_loss=0.01021, over 15327.00 frames. ], tot_loss[loss=0.0745, simple_loss=0.09609, pruned_loss=0.01667, audio_tagging_loss=0.00979, over 3039441.71 frames. ], batch size: 56, lr: 3.38e-03, grad_scale: 16.0 2023-11-21 17:56:16,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240400 2023-11-21 17:56:33,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1602713.3333333333, ans=0.04949747468305833 2023-11-21 17:56:35,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1602713.3333333333, ans=0.0 2023-11-21 17:56:35,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1602713.3333333333, ans=0.0 2023-11-21 17:56:38,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-21 17:56:44,197 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 17:56:51,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1602780.0, ans=0.2 2023-11-21 17:57:02,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1602846.6666666667, ans=0.125 2023-11-21 17:57:16,243 INFO [train_asr.py:1221] (1/4) Epoch 20, batch 12000, loss[loss=0.09569, simple_loss=0.1256, pruned_loss=0.02543, audio_tagging_loss=0.007467, over 15105.00 frames. ], tot_loss[loss=0.07511, simple_loss=0.09673, pruned_loss=0.01692, audio_tagging_loss=0.009828, over 3042988.20 frames. ], batch size: 56, lr: 3.38e-03, grad_scale: 32.0 2023-11-21 17:57:16,243 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 17:57:37,296 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4329, 3.7073, 3.7178, 3.6192], device='cuda:1') 2023-11-21 17:57:57,980 INFO [train_asr.py:1253] (1/4) Epoch 20, validation: loss=0.06015, simple_loss=0.05214, pruned_loss=0.005246, audio_tagging_loss=0.02884, over 4681554.00 frames. 2023-11-21 17:57:57,981 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 17:57:59,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240450 2023-11-21 17:58:06,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1602980.0, ans=0.1 2023-11-21 17:58:07,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1602980.0, ans=0.0 2023-11-21 17:58:58,664 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 0, loss[loss=0.08815, simple_loss=0.1092, pruned_loss=0.01514, audio_tagging_loss=0.01839, over 17009.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1092, pruned_loss=0.01514, audio_tagging_loss=0.01839, over 17009.00 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 32.0 2023-11-21 17:58:58,665 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 17:59:34,267 INFO [train_asr.py:1253] (1/4) Epoch 21, validation: loss=0.05942, simple_loss=0.05208, pruned_loss=0.00519, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-21 17:59:34,268 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 17:59:45,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1603133.3333333333, ans=0.125 2023-11-21 17:59:46,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.163e+01 8.736e+01 1.018e+02 1.443e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 18:00:04,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1603266.6666666667, ans=0.125 2023-11-21 18:00:10,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240500 2023-11-21 18:00:29,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1603400.0, ans=0.07 2023-11-21 18:00:29,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1603400.0, ans=0.1 2023-11-21 18:00:39,143 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 50, loss[loss=0.06618, simple_loss=0.07247, pruned_loss=0.01003, audio_tagging_loss=0.01991, over 15359.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.09577, pruned_loss=0.01589, audio_tagging_loss=0.01836, over 689540.74 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 32.0 2023-11-21 18:00:58,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-11-21 18:01:15,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240550 2023-11-21 18:01:16,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1603666.6666666667, ans=0.0 2023-11-21 18:01:43,928 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 100, loss[loss=0.07634, simple_loss=0.09033, pruned_loss=0.0183, audio_tagging_loss=0.01287, over 13995.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.09556, pruned_loss=0.01559, audio_tagging_loss=0.01756, over 1204248.00 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 32.0 2023-11-21 18:01:54,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 8.663e+01 9.262e+01 1.008e+02 1.245e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-21 18:02:08,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2023-11-21 18:02:19,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240600 2023-11-21 18:02:27,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1604000.0, ans=0.0 2023-11-21 18:02:34,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1604066.6666666667, ans=0.1 2023-11-21 18:02:47,863 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 150, loss[loss=0.07885, simple_loss=0.1094, pruned_loss=0.01302, audio_tagging_loss=0.01113, over 15187.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.09549, pruned_loss=0.01552, audio_tagging_loss=0.01578, over 1618719.43 frames. ], batch size: 57, lr: 3.30e-03, grad_scale: 16.0 2023-11-21 18:02:55,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1604133.3333333333, ans=0.125 2023-11-21 18:02:55,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1604133.3333333333, ans=0.0 2023-11-21 18:02:59,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-21 18:03:05,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1604200.0, ans=0.125 2023-11-21 18:03:20,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1604266.6666666667, ans=0.125 2023-11-21 18:03:21,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1604266.6666666667, ans=0.05 2023-11-21 18:03:24,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240650 2023-11-21 18:03:37,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1604400.0, ans=0.0 2023-11-21 18:03:44,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1604400.0, ans=0.1 2023-11-21 18:03:51,998 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 200, loss[loss=0.07231, simple_loss=0.09322, pruned_loss=0.01698, audio_tagging_loss=0.008713, over 14980.00 frames. ], tot_loss[loss=0.07793, simple_loss=0.0962, pruned_loss=0.01588, audio_tagging_loss=0.01395, over 1937032.70 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 16.0 2023-11-21 18:03:53,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2023-11-21 18:03:54,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1604466.6666666667, ans=0.125 2023-11-21 18:04:01,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2023-11-21 18:04:03,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.168e+01 8.831e+01 9.644e+01 1.295e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-21 18:04:21,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-21 18:04:27,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240700 2023-11-21 18:04:32,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1604666.6666666667, ans=0.2 2023-11-21 18:04:43,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1604733.3333333333, ans=0.1 2023-11-21 18:04:55,607 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 250, loss[loss=0.05064, simple_loss=0.05676, pruned_loss=0.01186, audio_tagging_loss=0.0104, over 14925.00 frames. ], tot_loss[loss=0.07672, simple_loss=0.09653, pruned_loss=0.01596, audio_tagging_loss=0.01249, over 2185478.19 frames. ], batch size: 60, lr: 3.30e-03, grad_scale: 8.0 2023-11-21 18:05:04,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=22.5 2023-11-21 18:05:27,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-21 18:05:31,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240750 2023-11-21 18:05:53,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-11-21 18:05:59,420 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 300, loss[loss=0.07271, simple_loss=0.08866, pruned_loss=0.01797, audio_tagging_loss=0.0104, over 14075.00 frames. ], tot_loss[loss=0.07646, simple_loss=0.09706, pruned_loss=0.0163, audio_tagging_loss=0.01164, over 2377445.30 frames. ], batch size: 52, lr: 3.30e-03, grad_scale: 8.0 2023-11-21 18:06:03,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1605133.3333333333, ans=0.0 2023-11-21 18:06:10,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1605133.3333333333, ans=15.0 2023-11-21 18:06:14,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.194e+01 8.148e+01 8.758e+01 9.398e+01 1.354e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 18:06:14,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1605200.0, ans=0.0 2023-11-21 18:06:35,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240800 2023-11-21 18:06:39,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1605333.3333333333, ans=0.125 2023-11-21 18:07:04,348 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 350, loss[loss=0.06804, simple_loss=0.09164, pruned_loss=0.0136, audio_tagging_loss=0.008618, over 16190.00 frames. ], tot_loss[loss=0.0753, simple_loss=0.09613, pruned_loss=0.01614, audio_tagging_loss=0.01109, over 2523990.63 frames. ], batch size: 61, lr: 3.30e-03, grad_scale: 8.0 2023-11-21 18:07:11,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-21 18:07:14,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-21 18:07:29,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2023-11-21 18:07:29,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1605600.0, ans=0.125 2023-11-21 18:07:37,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-21 18:07:39,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240850 2023-11-21 18:07:46,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1605666.6666666667, ans=0.0 2023-11-21 18:08:08,090 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 400, loss[loss=0.09792, simple_loss=0.1314, pruned_loss=0.02366, audio_tagging_loss=0.008557, over 14923.00 frames. ], tot_loss[loss=0.07471, simple_loss=0.09586, pruned_loss=0.01607, audio_tagging_loss=0.0107, over 2636712.20 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 16.0 2023-11-21 18:08:19,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-21 18:08:22,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.012e+01 8.744e+01 9.405e+01 1.302e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-21 18:08:44,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240900 2023-11-21 18:08:45,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1605933.3333333333, ans=0.125 2023-11-21 18:08:49,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1606000.0, ans=0.0 2023-11-21 18:08:52,486 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:09:07,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1606066.6666666667, ans=0.2 2023-11-21 18:09:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1606066.6666666667, ans=0.0 2023-11-21 18:09:11,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2023-11-21 18:09:12,836 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 450, loss[loss=0.08783, simple_loss=0.1147, pruned_loss=0.01839, audio_tagging_loss=0.01208, over 14651.00 frames. ], tot_loss[loss=0.07459, simple_loss=0.09609, pruned_loss=0.01612, audio_tagging_loss=0.01042, over 2736218.82 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 16.0 2023-11-21 18:09:40,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1606266.6666666667, ans=0.2 2023-11-21 18:09:49,220 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 240950 2023-11-21 18:09:51,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1606333.3333333333, ans=0.125 2023-11-21 18:10:09,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1606400.0, ans=0.125 2023-11-21 18:10:17,050 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 500, loss[loss=0.07673, simple_loss=0.1082, pruned_loss=0.01409, audio_tagging_loss=0.008554, over 15417.00 frames. ], tot_loss[loss=0.07383, simple_loss=0.09524, pruned_loss=0.01589, audio_tagging_loss=0.01031, over 2803716.39 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 16.0 2023-11-21 18:10:19,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1606466.6666666667, ans=0.1 2023-11-21 18:10:23,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2023-11-21 18:10:31,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.140e+01 8.109e+01 8.628e+01 9.380e+01 1.335e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-21 18:10:34,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1606533.3333333333, ans=0.1 2023-11-21 18:10:53,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241000 2023-11-21 18:11:06,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1606666.6666666667, ans=0.125 2023-11-21 18:11:13,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1606733.3333333333, ans=0.125 2023-11-21 18:11:13,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1606733.3333333333, ans=0.2 2023-11-21 18:11:15,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-21 18:11:21,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1606800.0, ans=0.1 2023-11-21 18:11:22,063 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 550, loss[loss=0.07131, simple_loss=0.09784, pruned_loss=0.01595, audio_tagging_loss=0.006448, over 15387.00 frames. ], tot_loss[loss=0.07353, simple_loss=0.09506, pruned_loss=0.01587, audio_tagging_loss=0.01013, over 2854376.20 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:11:46,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1606933.3333333333, ans=0.125 2023-11-21 18:11:58,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241050 2023-11-21 18:12:16,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-21 18:12:19,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-11-21 18:12:21,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1607066.6666666667, ans=0.0 2023-11-21 18:12:22,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1607066.6666666667, ans=0.1 2023-11-21 18:12:22,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1607066.6666666667, ans=15.0 2023-11-21 18:12:25,547 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 600, loss[loss=0.06761, simple_loss=0.08278, pruned_loss=0.01744, audio_tagging_loss=0.008781, over 14477.00 frames. ], tot_loss[loss=0.0728, simple_loss=0.09393, pruned_loss=0.0158, audio_tagging_loss=0.01004, over 2894095.12 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:12:39,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.139e+01 8.699e+01 9.445e+01 1.159e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 18:13:01,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241100 2023-11-21 18:13:05,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1607333.3333333333, ans=0.09899494936611666 2023-11-21 18:13:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1607333.3333333333, ans=0.125 2023-11-21 18:13:25,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1607400.0, ans=0.125 2023-11-21 18:13:30,508 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 650, loss[loss=0.08133, simple_loss=0.1025, pruned_loss=0.01906, audio_tagging_loss=0.01104, over 14814.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09439, pruned_loss=0.01606, audio_tagging_loss=0.01003, over 2929704.34 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:13:48,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1607533.3333333333, ans=0.125 2023-11-21 18:13:52,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1607533.3333333333, ans=0.0 2023-11-21 18:14:05,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241150 2023-11-21 18:14:07,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1607666.6666666667, ans=0.0 2023-11-21 18:14:10,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1607666.6666666667, ans=0.2 2023-11-21 18:14:21,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1607733.3333333333, ans=0.0 2023-11-21 18:14:23,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1607733.3333333333, ans=0.2 2023-11-21 18:14:26,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1607733.3333333333, ans=0.125 2023-11-21 18:14:34,267 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 700, loss[loss=0.0573, simple_loss=0.0626, pruned_loss=0.008403, audio_tagging_loss=0.01759, over 14927.00 frames. ], tot_loss[loss=0.07285, simple_loss=0.09404, pruned_loss=0.01584, audio_tagging_loss=0.00999, over 2957923.74 frames. ], batch size: 58, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:14:48,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.012e+01 8.519e+01 9.265e+01 1.175e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-21 18:15:10,800 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241200 2023-11-21 18:15:20,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1608000.0, ans=0.1 2023-11-21 18:15:33,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-21 18:15:39,223 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 750, loss[loss=0.07801, simple_loss=0.1094, pruned_loss=0.01322, audio_tagging_loss=0.01007, over 15427.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.09449, pruned_loss=0.01593, audio_tagging_loss=0.00989, over 2983529.26 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:16:07,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1608266.6666666667, ans=0.0 2023-11-21 18:16:14,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241250 2023-11-21 18:16:23,985 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:16:42,241 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 800, loss[loss=0.06586, simple_loss=0.07783, pruned_loss=0.01682, audio_tagging_loss=0.01012, over 16185.00 frames. ], tot_loss[loss=0.07339, simple_loss=0.0947, pruned_loss=0.01618, audio_tagging_loss=0.009862, over 2995787.30 frames. ], batch size: 62, lr: 3.29e-03, grad_scale: 32.0 2023-11-21 18:16:57,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 8.159e+01 8.981e+01 9.751e+01 1.353e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-21 18:17:00,164 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:17:15,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.22 vs. limit=22.5 2023-11-21 18:17:18,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241300 2023-11-21 18:17:22,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1608666.6666666667, ans=0.125 2023-11-21 18:17:29,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1608666.6666666667, ans=0.1 2023-11-21 18:17:36,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-21 18:17:40,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-21 18:17:47,489 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 850, loss[loss=0.05444, simple_loss=0.06404, pruned_loss=0.009426, audio_tagging_loss=0.01299, over 14024.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09363, pruned_loss=0.01585, audio_tagging_loss=0.009976, over 3002785.78 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 32.0 2023-11-21 18:17:58,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1608800.0, ans=0.125 2023-11-21 18:18:06,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1608866.6666666667, ans=10.0 2023-11-21 18:18:08,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1608866.6666666667, ans=0.125 2023-11-21 18:18:12,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2023-11-21 18:18:19,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1608933.3333333333, ans=0.07 2023-11-21 18:18:19,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1608933.3333333333, ans=0.09899494936611666 2023-11-21 18:18:23,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241350 2023-11-21 18:18:26,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1609000.0, ans=0.1 2023-11-21 18:18:47,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1609066.6666666667, ans=10.0 2023-11-21 18:18:52,215 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 900, loss[loss=0.04777, simple_loss=0.0522, pruned_loss=0.008752, audio_tagging_loss=0.01291, over 14197.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.09342, pruned_loss=0.01589, audio_tagging_loss=0.01006, over 3010415.76 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 32.0 2023-11-21 18:18:52,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1609133.3333333333, ans=0.125 2023-11-21 18:19:03,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1609200.0, ans=0.0 2023-11-21 18:19:05,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.217e+01 8.787e+01 9.484e+01 1.241e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 18:19:27,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241400 2023-11-21 18:19:38,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1609333.3333333333, ans=0.2 2023-11-21 18:19:39,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-21 18:19:39,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1609333.3333333333, ans=0.2 2023-11-21 18:19:49,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1609400.0, ans=0.125 2023-11-21 18:19:51,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-21 18:19:53,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1609400.0, ans=0.0 2023-11-21 18:19:55,353 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 950, loss[loss=0.05374, simple_loss=0.06519, pruned_loss=0.01169, audio_tagging_loss=0.009453, over 14430.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09413, pruned_loss=0.01596, audio_tagging_loss=0.009943, over 3022673.45 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 32.0 2023-11-21 18:20:01,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1609466.6666666667, ans=0.1 2023-11-21 18:20:31,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241450 2023-11-21 18:20:41,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1609666.6666666667, ans=0.125 2023-11-21 18:20:49,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1609733.3333333333, ans=0.0 2023-11-21 18:20:55,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1609733.3333333333, ans=0.1 2023-11-21 18:21:00,404 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1000, loss[loss=0.07223, simple_loss=0.08848, pruned_loss=0.01826, audio_tagging_loss=0.009729, over 14534.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09441, pruned_loss=0.01614, audio_tagging_loss=0.009738, over 3024223.64 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:21:15,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.205e+01 8.818e+01 9.430e+01 1.136e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 18:21:26,807 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:21:35,620 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241500 2023-11-21 18:22:03,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1610133.3333333333, ans=0.0 2023-11-21 18:22:04,690 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1050, loss[loss=0.07952, simple_loss=0.1024, pruned_loss=0.01754, audio_tagging_loss=0.01078, over 15804.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09419, pruned_loss=0.016, audio_tagging_loss=0.009658, over 3024065.08 frames. ], batch size: 59, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:22:07,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1610133.3333333333, ans=0.125 2023-11-21 18:22:11,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-11-21 18:22:28,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1610266.6666666667, ans=0.125 2023-11-21 18:22:33,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-21 18:22:40,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241550 2023-11-21 18:23:08,012 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1100, loss[loss=0.0557, simple_loss=0.06487, pruned_loss=0.01244, audio_tagging_loss=0.01083, over 15370.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09307, pruned_loss=0.01575, audio_tagging_loss=0.009645, over 3029894.12 frames. ], batch size: 63, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:23:10,548 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:23:17,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-21 18:23:23,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 7.887e+01 8.672e+01 9.418e+01 1.186e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-21 18:23:35,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1610600.0, ans=0.1 2023-11-21 18:23:41,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1610600.0, ans=0.125 2023-11-21 18:23:44,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241600 2023-11-21 18:23:49,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2023-11-21 18:23:49,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1610666.6666666667, ans=0.125 2023-11-21 18:23:50,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1610666.6666666667, ans=0.125 2023-11-21 18:23:51,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-21 18:23:57,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:24:12,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-21 18:24:12,790 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1150, loss[loss=0.06702, simple_loss=0.0811, pruned_loss=0.01442, audio_tagging_loss=0.01205, over 15105.00 frames. ], tot_loss[loss=0.07234, simple_loss=0.09352, pruned_loss=0.01593, audio_tagging_loss=0.009648, over 3029791.65 frames. ], batch size: 61, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:24:19,120 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:24:32,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1610866.6666666667, ans=0.09899494936611666 2023-11-21 18:24:48,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241650 2023-11-21 18:25:04,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1611066.6666666667, ans=0.125 2023-11-21 18:25:05,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1611066.6666666667, ans=0.125 2023-11-21 18:25:07,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1611066.6666666667, ans=0.1 2023-11-21 18:25:17,319 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1200, loss[loss=0.06115, simple_loss=0.07453, pruned_loss=0.01313, audio_tagging_loss=0.01076, over 15847.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09416, pruned_loss=0.01601, audio_tagging_loss=0.009622, over 3027670.53 frames. ], batch size: 59, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:25:18,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1611133.3333333333, ans=0.125 2023-11-21 18:25:33,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.048e+01 8.609e+01 9.236e+01 1.274e+02, threshold=1.722e+02, percent-clipped=0.0 2023-11-21 18:25:46,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1611266.6666666667, ans=0.0 2023-11-21 18:25:52,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241700 2023-11-21 18:26:09,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-21 18:26:10,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1611400.0, ans=0.1 2023-11-21 18:26:18,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=12.0 2023-11-21 18:26:20,650 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1250, loss[loss=0.06106, simple_loss=0.07777, pruned_loss=0.01405, audio_tagging_loss=0.008136, over 14034.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.09467, pruned_loss=0.01613, audio_tagging_loss=0.009555, over 3026836.95 frames. ], batch size: 53, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:26:23,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1611466.6666666667, ans=10.0 2023-11-21 18:26:24,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1611466.6666666667, ans=0.0 2023-11-21 18:26:51,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1611600.0, ans=0.125 2023-11-21 18:26:57,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241750 2023-11-21 18:27:25,080 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1300, loss[loss=0.05184, simple_loss=0.05916, pruned_loss=0.01036, audio_tagging_loss=0.0119, over 15384.00 frames. ], tot_loss[loss=0.07299, simple_loss=0.09466, pruned_loss=0.01613, audio_tagging_loss=0.009522, over 3032982.95 frames. ], batch size: 61, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:27:41,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.105e+01 8.749e+01 9.378e+01 1.268e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 18:27:58,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1611933.3333333333, ans=0.125 2023-11-21 18:28:00,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241800 2023-11-21 18:28:16,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1612066.6666666667, ans=0.1 2023-11-21 18:28:28,411 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1350, loss[loss=0.08842, simple_loss=0.1177, pruned_loss=0.0206, audio_tagging_loss=0.008962, over 16593.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09445, pruned_loss=0.01603, audio_tagging_loss=0.009494, over 3033956.05 frames. ], batch size: 59, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:28:43,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.681e-03 2023-11-21 18:28:50,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1612200.0, ans=0.125 2023-11-21 18:29:00,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2023-11-21 18:29:05,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241850 2023-11-21 18:29:14,758 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:29:29,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-21 18:29:33,006 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1400, loss[loss=0.07216, simple_loss=0.09047, pruned_loss=0.01802, audio_tagging_loss=0.008902, over 15448.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.09505, pruned_loss=0.01632, audio_tagging_loss=0.009527, over 3031418.49 frames. ], batch size: 59, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:29:34,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1612466.6666666667, ans=0.125 2023-11-21 18:29:50,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.505e+01 9.180e+01 9.942e+01 1.333e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-21 18:30:00,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1612600.0, ans=0.07 2023-11-21 18:30:09,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241900 2023-11-21 18:30:16,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1612666.6666666667, ans=0.125 2023-11-21 18:30:19,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-21 18:30:37,670 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1450, loss[loss=0.06005, simple_loss=0.07517, pruned_loss=0.01259, audio_tagging_loss=0.009873, over 14302.00 frames. ], tot_loss[loss=0.07333, simple_loss=0.09515, pruned_loss=0.0162, audio_tagging_loss=0.009556, over 3033979.61 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:30:58,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1612866.6666666667, ans=0.0 2023-11-21 18:31:03,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1612933.3333333333, ans=0.125 2023-11-21 18:31:07,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1612933.3333333333, ans=0.125 2023-11-21 18:31:13,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 241950 2023-11-21 18:31:16,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1613000.0, ans=0.125 2023-11-21 18:31:40,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1613133.3333333333, ans=0.0 2023-11-21 18:31:41,550 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1500, loss[loss=0.08719, simple_loss=0.1094, pruned_loss=0.02382, audio_tagging_loss=0.008681, over 15693.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09507, pruned_loss=0.01623, audio_tagging_loss=0.009682, over 3032792.94 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 8.0 2023-11-21 18:31:44,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1613133.3333333333, ans=0.125 2023-11-21 18:31:45,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1613133.3333333333, ans=0.125 2023-11-21 18:31:59,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.238e+01 7.980e+01 8.647e+01 9.591e+01 1.272e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-21 18:32:17,997 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242000 2023-11-21 18:32:18,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-21 18:32:34,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1613400.0, ans=0.125 2023-11-21 18:32:46,303 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1550, loss[loss=0.06323, simple_loss=0.07874, pruned_loss=0.0119, audio_tagging_loss=0.01196, over 14703.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09484, pruned_loss=0.01639, audio_tagging_loss=0.009758, over 3036571.62 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 8.0 2023-11-21 18:32:50,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1613466.6666666667, ans=10.0 2023-11-21 18:32:56,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1613466.6666666667, ans=0.0 2023-11-21 18:33:01,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1613533.3333333333, ans=0.0 2023-11-21 18:33:22,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242050 2023-11-21 18:33:31,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.85 vs. limit=10.0 2023-11-21 18:33:39,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2023-11-21 18:33:50,189 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1600, loss[loss=0.05738, simple_loss=0.07223, pruned_loss=0.01186, audio_tagging_loss=0.009408, over 14354.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.09463, pruned_loss=0.01613, audio_tagging_loss=0.00987, over 3043678.54 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:34:07,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 8.028e+01 8.717e+01 9.586e+01 1.995e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-21 18:34:26,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242100 2023-11-21 18:34:54,118 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1650, loss[loss=0.07236, simple_loss=0.08959, pruned_loss=0.01643, audio_tagging_loss=0.01114, over 13928.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09614, pruned_loss=0.01636, audio_tagging_loss=0.009724, over 3041704.85 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:35:17,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1614200.0, ans=0.2 2023-11-21 18:35:22,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1614266.6666666667, ans=0.125 2023-11-21 18:35:31,030 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242150 2023-11-21 18:35:47,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1614400.0, ans=0.1 2023-11-21 18:35:58,534 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1700, loss[loss=0.07676, simple_loss=0.09657, pruned_loss=0.01769, audio_tagging_loss=0.01078, over 14679.00 frames. ], tot_loss[loss=0.07426, simple_loss=0.09636, pruned_loss=0.01635, audio_tagging_loss=0.009726, over 3040796.26 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:36:16,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.867e+01 8.384e+01 8.955e+01 9.684e+01 1.332e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-21 18:36:20,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1614533.3333333333, ans=0.1 2023-11-21 18:36:26,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-11-21 18:36:34,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242200 2023-11-21 18:36:39,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-21 18:36:46,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1614666.6666666667, ans=0.0 2023-11-21 18:36:48,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1614666.6666666667, ans=0.125 2023-11-21 18:37:02,896 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1750, loss[loss=0.08218, simple_loss=0.1052, pruned_loss=0.01905, audio_tagging_loss=0.01056, over 14995.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09597, pruned_loss=0.01635, audio_tagging_loss=0.009646, over 3041943.69 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:37:09,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1614800.0, ans=0.125 2023-11-21 18:37:24,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1614866.6666666667, ans=0.125 2023-11-21 18:37:25,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1614866.6666666667, ans=0.0 2023-11-21 18:37:29,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1614933.3333333333, ans=0.125 2023-11-21 18:37:38,629 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242250 2023-11-21 18:37:43,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1615000.0, ans=0.125 2023-11-21 18:37:50,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1615000.0, ans=0.125 2023-11-21 18:38:02,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1615066.6666666667, ans=0.2 2023-11-21 18:38:07,262 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1800, loss[loss=0.06965, simple_loss=0.101, pruned_loss=0.01064, audio_tagging_loss=0.008503, over 15621.00 frames. ], tot_loss[loss=0.07408, simple_loss=0.09633, pruned_loss=0.01636, audio_tagging_loss=0.00956, over 3044497.23 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:38:21,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1615200.0, ans=0.125 2023-11-21 18:38:25,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.166e+01 8.773e+01 9.532e+01 1.071e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 18:38:39,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1615266.6666666667, ans=0.125 2023-11-21 18:38:42,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1615266.6666666667, ans=0.125 2023-11-21 18:38:43,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242300 2023-11-21 18:38:48,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1615333.3333333333, ans=0.125 2023-11-21 18:38:50,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1615333.3333333333, ans=0.125 2023-11-21 18:38:50,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1615333.3333333333, ans=0.0 2023-11-21 18:38:51,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1615333.3333333333, ans=0.1 2023-11-21 18:38:56,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1615333.3333333333, ans=0.125 2023-11-21 18:38:58,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1615400.0, ans=0.2 2023-11-21 18:39:11,201 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1850, loss[loss=0.07776, simple_loss=0.1033, pruned_loss=0.0181, audio_tagging_loss=0.008019, over 17572.00 frames. ], tot_loss[loss=0.07443, simple_loss=0.09687, pruned_loss=0.01648, audio_tagging_loss=0.009519, over 3048794.59 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:39:22,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1615533.3333333333, ans=0.125 2023-11-21 18:39:40,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2023-11-21 18:39:47,084 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242350 2023-11-21 18:40:02,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-21 18:40:07,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1615733.3333333333, ans=0.125 2023-11-21 18:40:09,695 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:40:12,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-21 18:40:14,376 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1900, loss[loss=0.08012, simple_loss=0.09654, pruned_loss=0.02137, audio_tagging_loss=0.01048, over 16527.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09693, pruned_loss=0.01652, audio_tagging_loss=0.00941, over 3050594.70 frames. ], batch size: 62, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:40:26,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1615800.0, ans=0.125 2023-11-21 18:40:33,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.159e+01 8.678e+01 9.357e+01 1.521e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 18:40:34,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1615866.6666666667, ans=0.0 2023-11-21 18:40:50,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242400 2023-11-21 18:41:19,195 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 1950, loss[loss=0.08398, simple_loss=0.1087, pruned_loss=0.01785, audio_tagging_loss=0.01178, over 15617.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09596, pruned_loss=0.01636, audio_tagging_loss=0.009361, over 3049755.70 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 16.0 2023-11-21 18:41:38,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1616200.0, ans=0.0 2023-11-21 18:41:50,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1616266.6666666667, ans=0.125 2023-11-21 18:41:54,109 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242450 2023-11-21 18:42:00,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1616333.3333333333, ans=0.1 2023-11-21 18:42:21,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1616400.0, ans=0.04949747468305833 2023-11-21 18:42:23,263 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2000, loss[loss=0.06455, simple_loss=0.0848, pruned_loss=0.009805, audio_tagging_loss=0.01235, over 15349.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.0954, pruned_loss=0.01631, audio_tagging_loss=0.009487, over 3045705.92 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 32.0 2023-11-21 18:42:24,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1616466.6666666667, ans=0.1 2023-11-21 18:42:30,648 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:42:34,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-21 18:42:38,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1616533.3333333333, ans=0.09899494936611666 2023-11-21 18:42:40,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.542e+01 7.872e+01 8.647e+01 9.317e+01 2.008e+02, threshold=1.729e+02, percent-clipped=1.0 2023-11-21 18:42:58,696 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242500 2023-11-21 18:43:01,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1616666.6666666667, ans=0.125 2023-11-21 18:43:19,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-21 18:43:23,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2023-11-21 18:43:26,309 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2050, loss[loss=0.06958, simple_loss=0.08115, pruned_loss=0.01466, audio_tagging_loss=0.01435, over 14904.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.09574, pruned_loss=0.01633, audio_tagging_loss=0.009437, over 3039897.83 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 32.0 2023-11-21 18:43:48,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1616866.6666666667, ans=0.125 2023-11-21 18:43:49,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1616866.6666666667, ans=0.2 2023-11-21 18:43:50,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1616866.6666666667, ans=0.0 2023-11-21 18:44:03,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242550 2023-11-21 18:44:13,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1617000.0, ans=0.2 2023-11-21 18:44:14,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1617000.0, ans=0.125 2023-11-21 18:44:25,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1617066.6666666667, ans=0.0 2023-11-21 18:44:26,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1617066.6666666667, ans=0.125 2023-11-21 18:44:29,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1617066.6666666667, ans=0.0 2023-11-21 18:44:31,773 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2100, loss[loss=0.0832, simple_loss=0.1117, pruned_loss=0.01969, audio_tagging_loss=0.007662, over 15539.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09584, pruned_loss=0.01643, audio_tagging_loss=0.009346, over 3042355.14 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:44:46,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1617200.0, ans=0.0 2023-11-21 18:44:50,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.884e+01 8.232e+01 8.682e+01 9.296e+01 1.860e+02, threshold=1.736e+02, percent-clipped=1.0 2023-11-21 18:44:53,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1617200.0, ans=0.125 2023-11-21 18:45:06,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242600 2023-11-21 18:45:08,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1617333.3333333333, ans=0.1 2023-11-21 18:45:17,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1617333.3333333333, ans=0.125 2023-11-21 18:45:22,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1617400.0, ans=0.0 2023-11-21 18:45:26,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-21 18:45:36,204 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2150, loss[loss=0.07832, simple_loss=0.103, pruned_loss=0.01723, audio_tagging_loss=0.009608, over 13705.00 frames. ], tot_loss[loss=0.0738, simple_loss=0.09572, pruned_loss=0.01647, audio_tagging_loss=0.009474, over 3046099.55 frames. ], batch size: 52, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:45:46,434 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:45:48,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1617533.3333333333, ans=0.1 2023-11-21 18:45:57,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1617533.3333333333, ans=0.0 2023-11-21 18:45:58,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1617533.3333333333, ans=0.1 2023-11-21 18:46:12,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242650 2023-11-21 18:46:14,887 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:46:16,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1617666.6666666667, ans=0.2 2023-11-21 18:46:36,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=22.5 2023-11-21 18:46:39,530 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2200, loss[loss=0.08242, simple_loss=0.1119, pruned_loss=0.01948, audio_tagging_loss=0.006996, over 15710.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.09516, pruned_loss=0.01626, audio_tagging_loss=0.009499, over 3049342.61 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:46:47,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1617800.0, ans=0.125 2023-11-21 18:46:58,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.003e+01 8.660e+01 9.579e+01 1.643e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 18:47:02,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1617866.6666666667, ans=0.125 2023-11-21 18:47:07,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1617933.3333333333, ans=0.0 2023-11-21 18:47:07,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-21 18:47:12,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1617933.3333333333, ans=0.125 2023-11-21 18:47:13,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1617933.3333333333, ans=0.125 2023-11-21 18:47:15,849 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242700 2023-11-21 18:47:17,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1618000.0, ans=0.0 2023-11-21 18:47:43,662 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2250, loss[loss=0.06993, simple_loss=0.1007, pruned_loss=0.009934, audio_tagging_loss=0.009669, over 14371.00 frames. ], tot_loss[loss=0.0734, simple_loss=0.09558, pruned_loss=0.01617, audio_tagging_loss=0.009438, over 3048850.68 frames. ], batch size: 53, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:47:51,640 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:48:18,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242750 2023-11-21 18:48:34,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1618400.0, ans=0.0 2023-11-21 18:48:38,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1618400.0, ans=0.035 2023-11-21 18:48:47,272 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2300, loss[loss=0.09026, simple_loss=0.123, pruned_loss=0.01876, audio_tagging_loss=0.009975, over 16193.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.09534, pruned_loss=0.01611, audio_tagging_loss=0.009556, over 3046427.83 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:48:47,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1618466.6666666667, ans=15.0 2023-11-21 18:48:51,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1618466.6666666667, ans=0.2 2023-11-21 18:48:59,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1618533.3333333333, ans=0.125 2023-11-21 18:49:00,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1618533.3333333333, ans=0.125 2023-11-21 18:49:05,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1618533.3333333333, ans=0.1 2023-11-21 18:49:06,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 7.925e+01 8.512e+01 9.086e+01 1.194e+02, threshold=1.702e+02, percent-clipped=0.0 2023-11-21 18:49:23,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242800 2023-11-21 18:49:36,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1618666.6666666667, ans=0.0 2023-11-21 18:49:44,488 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:49:46,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1618733.3333333333, ans=0.1 2023-11-21 18:49:48,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1618733.3333333333, ans=0.0 2023-11-21 18:49:51,782 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2350, loss[loss=0.08088, simple_loss=0.1102, pruned_loss=0.01732, audio_tagging_loss=0.008482, over 15751.00 frames. ], tot_loss[loss=0.07351, simple_loss=0.09528, pruned_loss=0.01619, audio_tagging_loss=0.009676, over 3044487.86 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:49:51,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1618800.0, ans=0.1 2023-11-21 18:50:01,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1618800.0, ans=0.0 2023-11-21 18:50:16,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1618933.3333333333, ans=0.125 2023-11-21 18:50:19,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-21 18:50:28,254 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242850 2023-11-21 18:50:28,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1618933.3333333333, ans=0.125 2023-11-21 18:50:41,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1619066.6666666667, ans=0.125 2023-11-21 18:50:50,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1619066.6666666667, ans=0.125 2023-11-21 18:50:56,093 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2400, loss[loss=0.06631, simple_loss=0.07574, pruned_loss=0.01675, audio_tagging_loss=0.01169, over 15033.00 frames. ], tot_loss[loss=0.07466, simple_loss=0.09677, pruned_loss=0.01661, audio_tagging_loss=0.00966, over 3049107.40 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 32.0 2023-11-21 18:51:03,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1619133.3333333333, ans=0.125 2023-11-21 18:51:06,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-21 18:51:15,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.343e+01 8.783e+01 9.778e+01 1.689e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 18:51:25,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1619266.6666666667, ans=0.1 2023-11-21 18:51:27,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-21 18:51:31,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242900 2023-11-21 18:51:31,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1619266.6666666667, ans=0.125 2023-11-21 18:51:41,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1619333.3333333333, ans=0.125 2023-11-21 18:51:44,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1619333.3333333333, ans=0.125 2023-11-21 18:51:52,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1619400.0, ans=0.0 2023-11-21 18:51:53,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-21 18:51:59,512 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2450, loss[loss=0.06989, simple_loss=0.09847, pruned_loss=0.01411, audio_tagging_loss=0.006544, over 15397.00 frames. ], tot_loss[loss=0.07439, simple_loss=0.09652, pruned_loss=0.01643, audio_tagging_loss=0.009697, over 3041118.41 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 32.0 2023-11-21 18:52:36,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 242950 2023-11-21 18:52:51,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1619733.3333333333, ans=0.125 2023-11-21 18:52:58,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1619733.3333333333, ans=0.0 2023-11-21 18:53:01,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1619733.3333333333, ans=0.025 2023-11-21 18:53:01,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1619733.3333333333, ans=0.125 2023-11-21 18:53:01,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1619733.3333333333, ans=0.125 2023-11-21 18:53:03,992 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2500, loss[loss=0.06919, simple_loss=0.09146, pruned_loss=0.01463, audio_tagging_loss=0.008831, over 16333.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09607, pruned_loss=0.01627, audio_tagging_loss=0.009675, over 3047028.23 frames. ], batch size: 60, lr: 3.28e-03, grad_scale: 32.0 2023-11-21 18:53:22,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1619866.6666666667, ans=0.07 2023-11-21 18:53:23,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 8.197e+01 8.814e+01 9.365e+01 1.246e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-21 18:53:39,841 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243000 2023-11-21 18:54:08,142 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2550, loss[loss=0.06029, simple_loss=0.07514, pruned_loss=0.009287, audio_tagging_loss=0.01343, over 14611.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09482, pruned_loss=0.01613, audio_tagging_loss=0.009744, over 3043751.22 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 32.0 2023-11-21 18:54:12,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1620133.3333333333, ans=0.0 2023-11-21 18:54:29,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1620200.0, ans=0.09899494936611666 2023-11-21 18:54:44,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243050 2023-11-21 18:55:02,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1620400.0, ans=0.125 2023-11-21 18:55:11,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=12.0 2023-11-21 18:55:12,125 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2600, loss[loss=0.09802, simple_loss=0.1349, pruned_loss=0.0247, audio_tagging_loss=0.005868, over 15343.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09424, pruned_loss=0.01602, audio_tagging_loss=0.009557, over 3042967.83 frames. ], batch size: 54, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:55:15,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1620466.6666666667, ans=0.125 2023-11-21 18:55:32,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.728e+01 8.396e+01 8.905e+01 9.438e+01 1.443e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-21 18:55:34,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1620533.3333333333, ans=0.125 2023-11-21 18:55:35,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1620533.3333333333, ans=0.0 2023-11-21 18:55:48,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243100 2023-11-21 18:56:10,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1620733.3333333333, ans=0.125 2023-11-21 18:56:15,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1620800.0, ans=0.05 2023-11-21 18:56:16,573 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2650, loss[loss=0.06642, simple_loss=0.0815, pruned_loss=0.01709, audio_tagging_loss=0.008583, over 16987.00 frames. ], tot_loss[loss=0.07338, simple_loss=0.09532, pruned_loss=0.01624, audio_tagging_loss=0.009469, over 3040676.39 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 18:56:23,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1620800.0, ans=0.0 2023-11-21 18:56:29,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1620866.6666666667, ans=0.2 2023-11-21 18:56:38,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-21 18:56:38,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1620866.6666666667, ans=0.125 2023-11-21 18:56:52,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243150 2023-11-21 18:57:05,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1621000.0, ans=0.125 2023-11-21 18:57:10,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1621066.6666666667, ans=0.125 2023-11-21 18:57:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1621066.6666666667, ans=0.2 2023-11-21 18:57:11,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1621066.6666666667, ans=0.125 2023-11-21 18:57:14,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1621066.6666666667, ans=0.04949747468305833 2023-11-21 18:57:19,988 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2700, loss[loss=0.06584, simple_loss=0.07782, pruned_loss=0.01606, audio_tagging_loss=0.01087, over 16493.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09551, pruned_loss=0.01642, audio_tagging_loss=0.009391, over 3048025.76 frames. ], batch size: 63, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 18:57:34,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1621200.0, ans=0.125 2023-11-21 18:57:35,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-21 18:57:41,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 7.974e+01 8.822e+01 9.240e+01 1.303e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-21 18:57:56,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243200 2023-11-21 18:58:07,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1621333.3333333333, ans=0.125 2023-11-21 18:58:11,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1621400.0, ans=0.125 2023-11-21 18:58:17,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1621400.0, ans=0.125 2023-11-21 18:58:20,818 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 18:58:22,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1621400.0, ans=0.125 2023-11-21 18:58:24,256 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2750, loss[loss=0.06671, simple_loss=0.09655, pruned_loss=0.01159, audio_tagging_loss=0.006855, over 15941.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09388, pruned_loss=0.01597, audio_tagging_loss=0.009453, over 3045122.36 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 18:58:37,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1621533.3333333333, ans=0.125 2023-11-21 18:58:39,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1621533.3333333333, ans=0.0 2023-11-21 18:59:00,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243250 2023-11-21 18:59:05,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2023-11-21 18:59:17,904 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 18:59:28,123 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2800, loss[loss=0.06896, simple_loss=0.09496, pruned_loss=0.01208, audio_tagging_loss=0.009402, over 15379.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.09381, pruned_loss=0.01592, audio_tagging_loss=0.009373, over 3043742.50 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 18:59:36,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1621800.0, ans=0.2 2023-11-21 18:59:40,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-11-21 18:59:50,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 7.876e+01 8.639e+01 9.327e+01 1.140e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-21 18:59:52,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2023-11-21 19:00:03,993 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243300 2023-11-21 19:00:10,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1622000.0, ans=0.0 2023-11-21 19:00:22,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1622066.6666666667, ans=0.125 2023-11-21 19:00:31,565 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2850, loss[loss=0.08012, simple_loss=0.1057, pruned_loss=0.01918, audio_tagging_loss=0.00807, over 14745.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.09414, pruned_loss=0.01595, audio_tagging_loss=0.009399, over 3038182.41 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:00:40,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1622133.3333333333, ans=0.2 2023-11-21 19:00:51,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1622200.0, ans=0.015 2023-11-21 19:01:06,483 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243350 2023-11-21 19:01:16,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1622333.3333333333, ans=0.1 2023-11-21 19:01:30,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1622400.0, ans=0.2 2023-11-21 19:01:34,265 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2900, loss[loss=0.07741, simple_loss=0.09979, pruned_loss=0.0179, audio_tagging_loss=0.009613, over 15886.00 frames. ], tot_loss[loss=0.07314, simple_loss=0.09529, pruned_loss=0.01621, audio_tagging_loss=0.009277, over 3037434.11 frames. ], batch size: 60, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:01:55,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.105e+01 8.663e+01 9.336e+01 1.251e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-21 19:02:00,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1622600.0, ans=0.05 2023-11-21 19:02:09,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243400 2023-11-21 19:02:24,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1622733.3333333333, ans=0.2 2023-11-21 19:02:29,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1622733.3333333333, ans=0.2 2023-11-21 19:02:38,008 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 2950, loss[loss=0.09404, simple_loss=0.1264, pruned_loss=0.02309, audio_tagging_loss=0.007753, over 13985.00 frames. ], tot_loss[loss=0.07403, simple_loss=0.09639, pruned_loss=0.01653, audio_tagging_loss=0.009298, over 3036058.45 frames. ], batch size: 52, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 19:02:50,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1622866.6666666667, ans=0.5 2023-11-21 19:03:04,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1622933.3333333333, ans=0.2 2023-11-21 19:03:13,847 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243450 2023-11-21 19:03:30,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=22.5 2023-11-21 19:03:35,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1623066.6666666667, ans=0.0 2023-11-21 19:03:41,269 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3000, loss[loss=0.0553, simple_loss=0.06623, pruned_loss=0.008922, audio_tagging_loss=0.01326, over 14980.00 frames. ], tot_loss[loss=0.07391, simple_loss=0.09612, pruned_loss=0.01645, audio_tagging_loss=0.009394, over 3041009.35 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 19:03:41,270 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 19:04:16,147 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4889, 3.0472, 3.7546, 3.4344], device='cuda:1') 2023-11-21 19:04:24,759 INFO [train_asr.py:1253] (1/4) Epoch 21, validation: loss=0.0594, simple_loss=0.05205, pruned_loss=0.005197, audio_tagging_loss=0.02817, over 4681554.00 frames. 2023-11-21 19:04:24,760 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 19:04:38,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=15.0 2023-11-21 19:04:47,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.111e+01 8.701e+01 9.542e+01 1.276e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 19:05:00,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243500 2023-11-21 19:05:00,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-21 19:05:23,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1623400.0, ans=0.125 2023-11-21 19:05:28,952 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3050, loss[loss=0.09212, simple_loss=0.1228, pruned_loss=0.02228, audio_tagging_loss=0.008458, over 15025.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09668, pruned_loss=0.01672, audio_tagging_loss=0.009351, over 3037109.38 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 19:06:03,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-11-21 19:06:05,515 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:06:05,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243550 2023-11-21 19:06:17,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1623666.6666666667, ans=0.125 2023-11-21 19:06:17,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-21 19:06:32,837 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3100, loss[loss=0.07768, simple_loss=0.1044, pruned_loss=0.01669, audio_tagging_loss=0.008766, over 15276.00 frames. ], tot_loss[loss=0.07517, simple_loss=0.09791, pruned_loss=0.01689, audio_tagging_loss=0.009327, over 3039771.62 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 19:06:56,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.034e+01 8.558e+01 9.488e+01 1.458e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 19:07:09,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243600 2023-11-21 19:07:23,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624066.6666666667, ans=0.1 2023-11-21 19:07:25,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1624066.6666666667, ans=0.1 2023-11-21 19:07:25,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1624066.6666666667, ans=0.0 2023-11-21 19:07:38,436 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3150, loss[loss=0.07092, simple_loss=0.0907, pruned_loss=0.01538, audio_tagging_loss=0.01019, over 15525.00 frames. ], tot_loss[loss=0.07514, simple_loss=0.09784, pruned_loss=0.01684, audio_tagging_loss=0.009378, over 3037737.52 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 8.0 2023-11-21 19:08:01,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1624200.0, ans=0.1 2023-11-21 19:08:13,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243650 2023-11-21 19:08:14,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-21 19:08:35,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2023-11-21 19:08:42,949 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3200, loss[loss=0.07313, simple_loss=0.0973, pruned_loss=0.01698, audio_tagging_loss=0.007498, over 14530.00 frames. ], tot_loss[loss=0.07503, simple_loss=0.09776, pruned_loss=0.01673, audio_tagging_loss=0.009411, over 3041507.90 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:08:51,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:09:01,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1624533.3333333333, ans=0.2 2023-11-21 19:09:04,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.386e+01 8.002e+01 8.673e+01 9.467e+01 1.702e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 19:09:18,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243700 2023-11-21 19:09:18,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1624600.0, ans=0.125 2023-11-21 19:09:19,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1624666.6666666667, ans=0.1 2023-11-21 19:09:45,242 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3250, loss[loss=0.08409, simple_loss=0.1086, pruned_loss=0.0187, audio_tagging_loss=0.01107, over 15489.00 frames. ], tot_loss[loss=0.0746, simple_loss=0.09724, pruned_loss=0.01648, audio_tagging_loss=0.009507, over 3043990.29 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:10:20,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243750 2023-11-21 19:10:24,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:10:26,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1625000.0, ans=0.125 2023-11-21 19:10:39,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1625066.6666666667, ans=0.125 2023-11-21 19:10:48,929 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3300, loss[loss=0.07763, simple_loss=0.1011, pruned_loss=0.01694, audio_tagging_loss=0.01014, over 16255.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.09753, pruned_loss=0.0164, audio_tagging_loss=0.009646, over 3046541.00 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:10:53,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1625133.3333333333, ans=0.0 2023-11-21 19:11:03,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1625200.0, ans=0.2 2023-11-21 19:11:11,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.264e+01 8.839e+01 9.527e+01 1.267e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 19:11:23,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1625266.6666666667, ans=0.1 2023-11-21 19:11:24,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243800 2023-11-21 19:11:27,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-21 19:11:53,900 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3350, loss[loss=0.05062, simple_loss=0.06108, pruned_loss=0.009297, audio_tagging_loss=0.01078, over 15103.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09657, pruned_loss=0.01621, audio_tagging_loss=0.009617, over 3047974.88 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:12:00,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2023-11-21 19:12:20,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=12.0 2023-11-21 19:12:27,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1625600.0, ans=0.125 2023-11-21 19:12:29,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243850 2023-11-21 19:12:42,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1625666.6666666667, ans=0.0 2023-11-21 19:12:57,942 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3400, loss[loss=0.05252, simple_loss=0.05982, pruned_loss=0.01148, audio_tagging_loss=0.01112, over 15370.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09703, pruned_loss=0.01632, audio_tagging_loss=0.009402, over 3051867.49 frames. ], batch size: 59, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:12:59,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1625800.0, ans=0.0 2023-11-21 19:13:03,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1625800.0, ans=0.125 2023-11-21 19:13:20,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.826e+01 8.123e+01 8.713e+01 9.440e+01 3.289e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-21 19:13:27,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1625933.3333333333, ans=0.0 2023-11-21 19:13:34,195 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243900 2023-11-21 19:13:39,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1626000.0, ans=0.125 2023-11-21 19:13:41,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1626000.0, ans=0.125 2023-11-21 19:13:50,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-21 19:14:01,813 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3450, loss[loss=0.08246, simple_loss=0.1076, pruned_loss=0.0207, audio_tagging_loss=0.007945, over 14274.00 frames. ], tot_loss[loss=0.07371, simple_loss=0.09615, pruned_loss=0.01626, audio_tagging_loss=0.009374, over 3049929.60 frames. ], batch size: 53, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:14:34,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1626266.6666666667, ans=0.125 2023-11-21 19:14:37,939 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 243950 2023-11-21 19:14:52,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1626400.0, ans=0.125 2023-11-21 19:15:01,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1626400.0, ans=0.1 2023-11-21 19:15:03,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-11-21 19:15:06,972 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3500, loss[loss=0.05151, simple_loss=0.06217, pruned_loss=0.01031, audio_tagging_loss=0.01011, over 15574.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09617, pruned_loss=0.01632, audio_tagging_loss=0.009298, over 3057458.62 frames. ], batch size: 60, lr: 3.28e-03, grad_scale: 16.0 2023-11-21 19:15:08,569 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:15:12,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1626466.6666666667, ans=0.0 2023-11-21 19:15:26,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-21 19:15:29,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 8.089e+01 8.726e+01 9.622e+01 1.634e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 19:15:35,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2023-11-21 19:15:37,843 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:15:42,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244000 2023-11-21 19:15:44,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1626666.6666666667, ans=0.0 2023-11-21 19:16:02,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1626733.3333333333, ans=0.125 2023-11-21 19:16:13,950 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3550, loss[loss=0.07073, simple_loss=0.0857, pruned_loss=0.01803, audio_tagging_loss=0.009848, over 14644.00 frames. ], tot_loss[loss=0.07311, simple_loss=0.09526, pruned_loss=0.01612, audio_tagging_loss=0.009353, over 3055747.90 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:16:40,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1626933.3333333333, ans=0.125 2023-11-21 19:16:42,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1626933.3333333333, ans=0.1 2023-11-21 19:16:42,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-21 19:16:50,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1626933.3333333333, ans=0.125 2023-11-21 19:16:51,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244050 2023-11-21 19:16:57,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1627000.0, ans=0.0 2023-11-21 19:17:05,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-11-21 19:17:13,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1627066.6666666667, ans=0.1 2023-11-21 19:17:18,062 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3600, loss[loss=0.0623, simple_loss=0.07064, pruned_loss=0.01566, audio_tagging_loss=0.01132, over 15661.00 frames. ], tot_loss[loss=0.07352, simple_loss=0.09602, pruned_loss=0.01625, audio_tagging_loss=0.009258, over 3050400.97 frames. ], batch size: 62, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:17:29,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1627133.3333333333, ans=0.125 2023-11-21 19:17:31,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1627200.0, ans=0.1 2023-11-21 19:17:42,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.176e+01 8.799e+01 9.565e+01 1.406e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 19:17:53,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244100 2023-11-21 19:18:14,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1627400.0, ans=0.125 2023-11-21 19:18:21,753 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3650, loss[loss=0.08074, simple_loss=0.09977, pruned_loss=0.02112, audio_tagging_loss=0.009726, over 15736.00 frames. ], tot_loss[loss=0.07362, simple_loss=0.0958, pruned_loss=0.01642, audio_tagging_loss=0.009306, over 3048777.75 frames. ], batch size: 60, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:18:52,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1627600.0, ans=0.125 2023-11-21 19:18:57,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244150 2023-11-21 19:19:15,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1627733.3333333333, ans=0.125 2023-11-21 19:19:16,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1627733.3333333333, ans=0.125 2023-11-21 19:19:26,326 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3700, loss[loss=0.07619, simple_loss=0.09085, pruned_loss=0.02092, audio_tagging_loss=0.00984, over 14940.00 frames. ], tot_loss[loss=0.07471, simple_loss=0.09741, pruned_loss=0.01679, audio_tagging_loss=0.009214, over 3050645.09 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:19:50,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.276e+01 8.879e+01 9.943e+01 1.926e+02, threshold=1.776e+02, percent-clipped=1.0 2023-11-21 19:19:53,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1627933.3333333333, ans=0.125 2023-11-21 19:19:58,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1627933.3333333333, ans=0.125 2023-11-21 19:20:02,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244200 2023-11-21 19:20:03,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1628000.0, ans=0.125 2023-11-21 19:20:08,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1628000.0, ans=0.125 2023-11-21 19:20:11,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1628000.0, ans=0.0 2023-11-21 19:20:17,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1628066.6666666667, ans=0.1 2023-11-21 19:20:29,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-21 19:20:30,373 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3750, loss[loss=0.08077, simple_loss=0.1009, pruned_loss=0.01801, audio_tagging_loss=0.01229, over 15325.00 frames. ], tot_loss[loss=0.07458, simple_loss=0.0971, pruned_loss=0.01669, audio_tagging_loss=0.009348, over 3047182.39 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:20:30,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1628133.3333333333, ans=0.125 2023-11-21 19:21:01,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-11-21 19:21:07,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244250 2023-11-21 19:21:14,273 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:21:16,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1628333.3333333333, ans=0.125 2023-11-21 19:21:34,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-11-21 19:21:35,250 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3800, loss[loss=0.07585, simple_loss=0.09986, pruned_loss=0.01777, audio_tagging_loss=0.00815, over 15675.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09634, pruned_loss=0.01648, audio_tagging_loss=0.00947, over 3049460.37 frames. ], batch size: 58, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:21:37,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1628466.6666666667, ans=0.0 2023-11-21 19:21:59,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.245e+01 8.842e+01 9.701e+01 1.541e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 19:22:09,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2023-11-21 19:22:10,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244300 2023-11-21 19:22:16,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1628666.6666666667, ans=0.125 2023-11-21 19:22:19,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1628666.6666666667, ans=0.0 2023-11-21 19:22:27,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=15.0 2023-11-21 19:22:36,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2023-11-21 19:22:38,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1628800.0, ans=0.0 2023-11-21 19:22:39,718 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3850, loss[loss=0.1056, simple_loss=0.1546, pruned_loss=0.02271, audio_tagging_loss=0.00558, over 16143.00 frames. ], tot_loss[loss=0.07406, simple_loss=0.0964, pruned_loss=0.01637, audio_tagging_loss=0.009495, over 3050461.02 frames. ], batch size: 55, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:22:58,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1628866.6666666667, ans=0.125 2023-11-21 19:23:05,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1628933.3333333333, ans=0.125 2023-11-21 19:23:15,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244350 2023-11-21 19:23:30,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-21 19:23:39,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1629066.6666666667, ans=0.07 2023-11-21 19:23:43,358 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3900, loss[loss=0.09062, simple_loss=0.1185, pruned_loss=0.023, audio_tagging_loss=0.008348, over 14910.00 frames. ], tot_loss[loss=0.07362, simple_loss=0.09548, pruned_loss=0.01625, audio_tagging_loss=0.009628, over 3047573.69 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:23:48,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1629133.3333333333, ans=0.0 2023-11-21 19:23:49,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1629133.3333333333, ans=0.05 2023-11-21 19:24:04,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1629200.0, ans=0.1 2023-11-21 19:24:08,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.034e+01 8.736e+01 9.358e+01 1.610e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 19:24:20,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244400 2023-11-21 19:24:29,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1629333.3333333333, ans=0.125 2023-11-21 19:24:32,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1629333.3333333333, ans=0.1 2023-11-21 19:24:36,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1629400.0, ans=0.0 2023-11-21 19:24:41,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1629400.0, ans=0.125 2023-11-21 19:24:48,551 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 3950, loss[loss=0.08719, simple_loss=0.1201, pruned_loss=0.01718, audio_tagging_loss=0.009938, over 15293.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.09589, pruned_loss=0.01636, audio_tagging_loss=0.009632, over 3045234.58 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:24:50,141 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:24:53,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1629466.6666666667, ans=0.125 2023-11-21 19:25:20,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2023-11-21 19:25:23,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244450 2023-11-21 19:25:37,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1629666.6666666667, ans=0.0 2023-11-21 19:25:42,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1629733.3333333333, ans=0.95 2023-11-21 19:25:44,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1629733.3333333333, ans=0.125 2023-11-21 19:25:45,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1629733.3333333333, ans=0.125 2023-11-21 19:25:52,434 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4000, loss[loss=0.06204, simple_loss=0.08645, pruned_loss=0.01081, audio_tagging_loss=0.008012, over 15069.00 frames. ], tot_loss[loss=0.07429, simple_loss=0.09631, pruned_loss=0.01645, audio_tagging_loss=0.009685, over 3047011.52 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:25:53,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1629800.0, ans=0.0 2023-11-21 19:26:05,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1629866.6666666667, ans=0.0 2023-11-21 19:26:16,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.271e+01 8.886e+01 9.884e+01 1.294e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-21 19:26:28,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244500 2023-11-21 19:26:32,090 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:26:37,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1630000.0, ans=0.125 2023-11-21 19:26:56,281 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4050, loss[loss=0.08225, simple_loss=0.1066, pruned_loss=0.02154, audio_tagging_loss=0.007392, over 14876.00 frames. ], tot_loss[loss=0.07431, simple_loss=0.09651, pruned_loss=0.01637, audio_tagging_loss=0.009686, over 3042674.29 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:26:57,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1630133.3333333333, ans=0.1 2023-11-21 19:26:58,660 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:27:12,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1630200.0, ans=0.07 2023-11-21 19:27:19,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1630200.0, ans=0.125 2023-11-21 19:27:32,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244550 2023-11-21 19:27:37,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1630333.3333333333, ans=0.035 2023-11-21 19:27:47,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1630400.0, ans=0.0 2023-11-21 19:27:51,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1630400.0, ans=0.05 2023-11-21 19:27:56,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1630400.0, ans=0.0 2023-11-21 19:27:58,690 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:28:00,855 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4100, loss[loss=0.05889, simple_loss=0.08048, pruned_loss=0.009426, audio_tagging_loss=0.009223, over 15736.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09654, pruned_loss=0.01627, audio_tagging_loss=0.009699, over 3044956.10 frames. ], batch size: 61, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:28:25,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.257e+01 8.384e+01 8.914e+01 9.573e+01 1.296e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-21 19:28:26,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.02 vs. limit=10.0 2023-11-21 19:28:32,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-21 19:28:36,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244600 2023-11-21 19:28:36,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1630600.0, ans=0.0 2023-11-21 19:28:48,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-21 19:28:49,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1630666.6666666667, ans=0.125 2023-11-21 19:28:58,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-21 19:29:01,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1630733.3333333333, ans=0.125 2023-11-21 19:29:06,008 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4150, loss[loss=0.04911, simple_loss=0.05482, pruned_loss=0.007661, audio_tagging_loss=0.01404, over 15363.00 frames. ], tot_loss[loss=0.07366, simple_loss=0.096, pruned_loss=0.01609, audio_tagging_loss=0.009572, over 3044194.56 frames. ], batch size: 60, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:29:26,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1630866.6666666667, ans=0.1 2023-11-21 19:29:26,886 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.330e-02 2023-11-21 19:29:42,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244650 2023-11-21 19:29:53,029 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:29:54,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1631000.0, ans=0.125 2023-11-21 19:30:10,083 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4200, loss[loss=0.06611, simple_loss=0.08489, pruned_loss=0.01377, audio_tagging_loss=0.009892, over 15343.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09581, pruned_loss=0.0161, audio_tagging_loss=0.009439, over 3041449.94 frames. ], batch size: 58, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:30:10,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1631133.3333333333, ans=0.125 2023-11-21 19:30:25,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1631200.0, ans=0.125 2023-11-21 19:30:36,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 7.875e+01 8.455e+01 9.386e+01 1.251e+02, threshold=1.691e+02, percent-clipped=0.0 2023-11-21 19:30:36,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1631266.6666666667, ans=0.125 2023-11-21 19:30:39,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-21 19:30:46,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244700 2023-11-21 19:31:12,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1631400.0, ans=0.1 2023-11-21 19:31:15,173 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4250, loss[loss=0.06029, simple_loss=0.08513, pruned_loss=0.00872, audio_tagging_loss=0.009007, over 14913.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.09613, pruned_loss=0.01629, audio_tagging_loss=0.009458, over 3035933.06 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:31:18,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1631466.6666666667, ans=0.125 2023-11-21 19:31:23,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1631466.6666666667, ans=0.0 2023-11-21 19:31:29,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1631533.3333333333, ans=0.125 2023-11-21 19:31:43,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-21 19:31:50,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244750 2023-11-21 19:31:52,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2023-11-21 19:31:58,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1631666.6666666667, ans=0.125 2023-11-21 19:32:18,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1631800.0, ans=0.0 2023-11-21 19:32:19,210 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4300, loss[loss=0.08121, simple_loss=0.1045, pruned_loss=0.0203, audio_tagging_loss=0.008659, over 14962.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09603, pruned_loss=0.01625, audio_tagging_loss=0.009416, over 3049428.95 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:32:26,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1631800.0, ans=0.125 2023-11-21 19:32:30,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1631866.6666666667, ans=0.125 2023-11-21 19:32:41,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1631866.6666666667, ans=0.125 2023-11-21 19:32:43,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.395e+01 8.866e+01 9.565e+01 1.376e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-21 19:32:43,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1631933.3333333333, ans=0.0 2023-11-21 19:32:51,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1631933.3333333333, ans=0.125 2023-11-21 19:32:55,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244800 2023-11-21 19:33:13,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1632066.6666666667, ans=0.0 2023-11-21 19:33:17,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1632066.6666666667, ans=0.125 2023-11-21 19:33:21,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1632133.3333333333, ans=0.0 2023-11-21 19:33:22,530 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4350, loss[loss=0.09342, simple_loss=0.1276, pruned_loss=0.02206, audio_tagging_loss=0.007556, over 15435.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09663, pruned_loss=0.01643, audio_tagging_loss=0.009415, over 3049160.39 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 16.0 2023-11-21 19:33:23,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1632133.3333333333, ans=0.0 2023-11-21 19:33:29,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1632133.3333333333, ans=0.0 2023-11-21 19:33:36,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-21 19:33:45,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1632200.0, ans=0.0 2023-11-21 19:33:48,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1632266.6666666667, ans=0.125 2023-11-21 19:33:48,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1632266.6666666667, ans=0.0 2023-11-21 19:33:54,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1632266.6666666667, ans=0.125 2023-11-21 19:33:58,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244850 2023-11-21 19:34:06,588 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:34:16,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1632400.0, ans=0.1 2023-11-21 19:34:27,326 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4400, loss[loss=0.08856, simple_loss=0.115, pruned_loss=0.02034, audio_tagging_loss=0.01072, over 15915.00 frames. ], tot_loss[loss=0.07427, simple_loss=0.09652, pruned_loss=0.01656, audio_tagging_loss=0.009449, over 3041523.24 frames. ], batch size: 59, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:34:52,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.144e+01 8.724e+01 9.531e+01 1.294e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 19:34:56,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1632600.0, ans=0.125 2023-11-21 19:35:02,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244900 2023-11-21 19:35:08,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1632666.6666666667, ans=0.125 2023-11-21 19:35:29,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-21 19:35:32,192 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4450, loss[loss=0.06537, simple_loss=0.08435, pruned_loss=0.01412, audio_tagging_loss=0.009073, over 15375.00 frames. ], tot_loss[loss=0.07387, simple_loss=0.09593, pruned_loss=0.01648, audio_tagging_loss=0.009426, over 3040952.26 frames. ], batch size: 60, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:35:33,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1632800.0, ans=0.2 2023-11-21 19:35:50,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1632866.6666666667, ans=0.125 2023-11-21 19:36:06,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1632933.3333333333, ans=0.125 2023-11-21 19:36:07,804 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 244950 2023-11-21 19:36:08,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1632933.3333333333, ans=0.2 2023-11-21 19:36:28,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1633066.6666666667, ans=0.125 2023-11-21 19:36:35,219 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4500, loss[loss=0.05507, simple_loss=0.07043, pruned_loss=0.01171, audio_tagging_loss=0.00814, over 15121.00 frames. ], tot_loss[loss=0.07338, simple_loss=0.09541, pruned_loss=0.0163, audio_tagging_loss=0.009378, over 3049328.16 frames. ], batch size: 58, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:36:35,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-21 19:36:36,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1633133.3333333333, ans=0.125 2023-11-21 19:36:47,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1633200.0, ans=0.0 2023-11-21 19:36:57,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1633200.0, ans=0.125 2023-11-21 19:36:59,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1633200.0, ans=0.125 2023-11-21 19:37:01,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.155e+01 8.993e+01 9.811e+01 1.324e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-21 19:37:06,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1633266.6666666667, ans=0.125 2023-11-21 19:37:10,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1633266.6666666667, ans=0.1 2023-11-21 19:37:11,790 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245000 2023-11-21 19:37:13,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1633333.3333333333, ans=0.0 2023-11-21 19:37:23,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1633333.3333333333, ans=0.0 2023-11-21 19:37:23,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-21 19:37:34,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-21 19:37:39,066 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4550, loss[loss=0.09232, simple_loss=0.1256, pruned_loss=0.0239, audio_tagging_loss=0.005624, over 14276.00 frames. ], tot_loss[loss=0.07341, simple_loss=0.09572, pruned_loss=0.01617, audio_tagging_loss=0.00938, over 3043984.78 frames. ], batch size: 51, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:37:50,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-21 19:38:12,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1633600.0, ans=0.125 2023-11-21 19:38:15,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1633600.0, ans=0.125 2023-11-21 19:38:16,174 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245050 2023-11-21 19:38:28,284 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 19:38:44,475 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4600, loss[loss=0.08429, simple_loss=0.1195, pruned_loss=0.01664, audio_tagging_loss=0.007891, over 15362.00 frames. ], tot_loss[loss=0.07299, simple_loss=0.09486, pruned_loss=0.01613, audio_tagging_loss=0.009432, over 3047712.27 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:38:51,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-21 19:39:01,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1633866.6666666667, ans=0.125 2023-11-21 19:39:08,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1633933.3333333333, ans=0.125 2023-11-21 19:39:09,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.114e+01 8.737e+01 9.348e+01 1.187e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 19:39:19,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245100 2023-11-21 19:39:40,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1634066.6666666667, ans=0.125 2023-11-21 19:39:48,071 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4650, loss[loss=0.08064, simple_loss=0.09189, pruned_loss=0.02052, audio_tagging_loss=0.01417, over 15841.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09434, pruned_loss=0.01611, audio_tagging_loss=0.009543, over 3046288.01 frames. ], batch size: 58, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:40:06,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1634200.0, ans=0.125 2023-11-21 19:40:06,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1634200.0, ans=0.125 2023-11-21 19:40:12,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1634266.6666666667, ans=0.0 2023-11-21 19:40:23,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1634266.6666666667, ans=0.0 2023-11-21 19:40:24,018 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245150 2023-11-21 19:40:43,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1634400.0, ans=0.125 2023-11-21 19:40:50,945 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4700, loss[loss=0.07095, simple_loss=0.098, pruned_loss=0.0146, audio_tagging_loss=0.007357, over 16542.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09456, pruned_loss=0.01621, audio_tagging_loss=0.009598, over 3054641.67 frames. ], batch size: 60, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:40:51,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1634466.6666666667, ans=0.1 2023-11-21 19:41:07,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1634533.3333333333, ans=0.0 2023-11-21 19:41:17,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.203e+01 8.093e+01 8.785e+01 9.721e+01 1.198e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-21 19:41:27,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245200 2023-11-21 19:41:28,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-21 19:41:34,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1634666.6666666667, ans=0.0 2023-11-21 19:41:38,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:41:42,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1634733.3333333333, ans=0.1 2023-11-21 19:41:48,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1634733.3333333333, ans=0.125 2023-11-21 19:41:53,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1634733.3333333333, ans=0.1 2023-11-21 19:41:55,452 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4750, loss[loss=0.08539, simple_loss=0.1144, pruned_loss=0.01933, audio_tagging_loss=0.008853, over 16271.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.09501, pruned_loss=0.01612, audio_tagging_loss=0.009684, over 3057755.78 frames. ], batch size: 59, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:42:06,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1634800.0, ans=0.05 2023-11-21 19:42:25,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1634933.3333333333, ans=0.125 2023-11-21 19:42:30,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245250 2023-11-21 19:42:31,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1635000.0, ans=0.125 2023-11-21 19:42:52,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1635066.6666666667, ans=0.0 2023-11-21 19:42:53,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2023-11-21 19:42:59,121 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4800, loss[loss=0.07787, simple_loss=0.1123, pruned_loss=0.01285, audio_tagging_loss=0.008869, over 15874.00 frames. ], tot_loss[loss=0.07417, simple_loss=0.09611, pruned_loss=0.01632, audio_tagging_loss=0.009796, over 3061898.80 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:43:03,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1635133.3333333333, ans=0.0 2023-11-21 19:43:12,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1635200.0, ans=0.1 2023-11-21 19:43:16,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-21 19:43:25,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.541e+01 8.114e+01 8.825e+01 9.753e+01 1.179e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-21 19:43:34,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245300 2023-11-21 19:43:34,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1635266.6666666667, ans=0.1 2023-11-21 19:43:50,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1635400.0, ans=0.1 2023-11-21 19:44:02,530 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4850, loss[loss=0.06229, simple_loss=0.07294, pruned_loss=0.01534, audio_tagging_loss=0.01048, over 14773.00 frames. ], tot_loss[loss=0.07389, simple_loss=0.09551, pruned_loss=0.01619, audio_tagging_loss=0.009952, over 3054097.94 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:44:05,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1635466.6666666667, ans=0.125 2023-11-21 19:44:08,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-21 19:44:21,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1635533.3333333333, ans=0.125 2023-11-21 19:44:22,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1635533.3333333333, ans=0.125 2023-11-21 19:44:39,418 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245350 2023-11-21 19:44:42,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1635666.6666666667, ans=0.125 2023-11-21 19:44:47,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-21 19:44:48,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.64 vs. limit=15.0 2023-11-21 19:44:49,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1635666.6666666667, ans=0.125 2023-11-21 19:45:07,159 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4900, loss[loss=0.06418, simple_loss=0.09146, pruned_loss=0.01122, audio_tagging_loss=0.007228, over 15260.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.09516, pruned_loss=0.01593, audio_tagging_loss=0.009839, over 3048504.63 frames. ], batch size: 59, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:45:18,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2023-11-21 19:45:34,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 7.987e+01 8.445e+01 9.023e+01 1.315e+02, threshold=1.689e+02, percent-clipped=0.0 2023-11-21 19:45:43,185 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245400 2023-11-21 19:46:12,594 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 4950, loss[loss=0.06196, simple_loss=0.08419, pruned_loss=0.01181, audio_tagging_loss=0.008049, over 15713.00 frames. ], tot_loss[loss=0.0735, simple_loss=0.09545, pruned_loss=0.01603, audio_tagging_loss=0.009748, over 3051315.84 frames. ], batch size: 57, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:46:17,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1636133.3333333333, ans=0.1 2023-11-21 19:46:20,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-21 19:46:40,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1636266.6666666667, ans=0.1 2023-11-21 19:46:48,303 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245450 2023-11-21 19:46:52,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1636333.3333333333, ans=0.125 2023-11-21 19:47:02,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636333.3333333333, ans=0.1 2023-11-21 19:47:16,827 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5000, loss[loss=0.07026, simple_loss=0.09806, pruned_loss=0.01326, audio_tagging_loss=0.007973, over 15134.00 frames. ], tot_loss[loss=0.07414, simple_loss=0.09673, pruned_loss=0.01626, audio_tagging_loss=0.009513, over 3054937.85 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 32.0 2023-11-21 19:47:21,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1636466.6666666667, ans=0.0 2023-11-21 19:47:28,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1636533.3333333333, ans=0.125 2023-11-21 19:47:44,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.789e+01 8.147e+01 8.751e+01 9.572e+01 1.278e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-21 19:47:53,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245500 2023-11-21 19:48:07,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-21 19:48:08,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1636733.3333333333, ans=0.09899494936611666 2023-11-21 19:48:21,025 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5050, loss[loss=0.1082, simple_loss=0.1333, pruned_loss=0.02958, audio_tagging_loss=0.01197, over 14824.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09587, pruned_loss=0.01614, audio_tagging_loss=0.009484, over 3048082.23 frames. ], batch size: 53, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:48:26,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-21 19:48:56,568 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245550 2023-11-21 19:48:58,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1637000.0, ans=0.04949747468305833 2023-11-21 19:49:00,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1637000.0, ans=0.125 2023-11-21 19:49:04,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1637000.0, ans=0.0 2023-11-21 19:49:25,076 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5100, loss[loss=0.06463, simple_loss=0.08073, pruned_loss=0.01378, audio_tagging_loss=0.01049, over 15437.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.09595, pruned_loss=0.01629, audio_tagging_loss=0.009363, over 3052472.34 frames. ], batch size: 58, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:49:44,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-11-21 19:49:49,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1637266.6666666667, ans=0.1 2023-11-21 19:49:51,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.776e+01 7.985e+01 8.659e+01 9.540e+01 1.416e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-21 19:50:00,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245600 2023-11-21 19:50:02,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1637333.3333333333, ans=0.0 2023-11-21 19:50:09,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-21 19:50:10,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1637333.3333333333, ans=0.125 2023-11-21 19:50:17,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1637400.0, ans=0.125 2023-11-21 19:50:29,313 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5150, loss[loss=0.07378, simple_loss=0.09562, pruned_loss=0.01769, audio_tagging_loss=0.008283, over 15648.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.09559, pruned_loss=0.01619, audio_tagging_loss=0.009365, over 3052810.97 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 19:50:48,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1637533.3333333333, ans=0.125 2023-11-21 19:51:05,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245650 2023-11-21 19:51:15,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1637666.6666666667, ans=0.035 2023-11-21 19:51:24,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1637733.3333333333, ans=0.0 2023-11-21 19:51:33,900 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5200, loss[loss=0.07018, simple_loss=0.082, pruned_loss=0.01641, audio_tagging_loss=0.01276, over 15514.00 frames. ], tot_loss[loss=0.07359, simple_loss=0.09579, pruned_loss=0.01634, audio_tagging_loss=0.009361, over 3048135.39 frames. ], batch size: 60, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:51:43,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1637800.0, ans=0.2 2023-11-21 19:52:01,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.548e+01 7.936e+01 8.716e+01 9.499e+01 2.064e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-21 19:52:02,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-21 19:52:05,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1637933.3333333333, ans=0.125 2023-11-21 19:52:09,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245700 2023-11-21 19:52:13,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1638000.0, ans=0.0 2023-11-21 19:52:38,033 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5250, loss[loss=0.07421, simple_loss=0.1003, pruned_loss=0.01469, audio_tagging_loss=0.009393, over 14797.00 frames. ], tot_loss[loss=0.07451, simple_loss=0.09708, pruned_loss=0.0167, audio_tagging_loss=0.009266, over 3043440.33 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:52:44,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1638133.3333333333, ans=0.0 2023-11-21 19:52:54,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1638200.0, ans=0.125 2023-11-21 19:53:14,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245750 2023-11-21 19:53:41,719 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5300, loss[loss=0.09287, simple_loss=0.1152, pruned_loss=0.02471, audio_tagging_loss=0.01056, over 15243.00 frames. ], tot_loss[loss=0.07435, simple_loss=0.09698, pruned_loss=0.01661, audio_tagging_loss=0.009256, over 3043632.58 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:53:45,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1638466.6666666667, ans=0.1 2023-11-21 19:53:48,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1638466.6666666667, ans=0.1 2023-11-21 19:53:58,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1638533.3333333333, ans=0.125 2023-11-21 19:54:05,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1638533.3333333333, ans=0.125 2023-11-21 19:54:10,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.760e+01 8.198e+01 8.861e+01 9.238e+01 1.215e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-21 19:54:17,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1638600.0, ans=0.1 2023-11-21 19:54:18,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245800 2023-11-21 19:54:34,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1638733.3333333333, ans=0.125 2023-11-21 19:54:41,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-21 19:54:45,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1638733.3333333333, ans=0.1 2023-11-21 19:54:47,341 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5350, loss[loss=0.08522, simple_loss=0.1173, pruned_loss=0.01831, audio_tagging_loss=0.008255, over 15206.00 frames. ], tot_loss[loss=0.07416, simple_loss=0.09699, pruned_loss=0.01645, audio_tagging_loss=0.009217, over 3041863.43 frames. ], batch size: 54, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:55:12,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1638933.3333333333, ans=0.95 2023-11-21 19:55:15,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1638933.3333333333, ans=0.0 2023-11-21 19:55:18,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1638933.3333333333, ans=0.0 2023-11-21 19:55:23,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245850 2023-11-21 19:55:25,718 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:55:44,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-21 19:55:51,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2023-11-21 19:55:52,125 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5400, loss[loss=0.08647, simple_loss=0.1263, pruned_loss=0.01418, audio_tagging_loss=0.009152, over 15649.00 frames. ], tot_loss[loss=0.07418, simple_loss=0.09711, pruned_loss=0.01636, audio_tagging_loss=0.009274, over 3045078.06 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:55:54,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1639133.3333333333, ans=0.07 2023-11-21 19:56:07,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 19:56:16,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1639266.6666666667, ans=0.1 2023-11-21 19:56:19,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.098e+01 8.835e+01 9.334e+01 1.205e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 19:56:28,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245900 2023-11-21 19:56:38,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1639333.3333333333, ans=0.1 2023-11-21 19:56:55,944 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5450, loss[loss=0.07279, simple_loss=0.09909, pruned_loss=0.01544, audio_tagging_loss=0.007801, over 15062.00 frames. ], tot_loss[loss=0.07398, simple_loss=0.09665, pruned_loss=0.01636, audio_tagging_loss=0.009294, over 3043791.27 frames. ], batch size: 53, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:57:32,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 245950 2023-11-21 19:58:00,397 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5500, loss[loss=0.0734, simple_loss=0.09762, pruned_loss=0.01556, audio_tagging_loss=0.009033, over 14882.00 frames. ], tot_loss[loss=0.07376, simple_loss=0.09636, pruned_loss=0.01625, audio_tagging_loss=0.009335, over 3043031.76 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:58:03,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1639800.0, ans=0.0 2023-11-21 19:58:13,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1639866.6666666667, ans=0.125 2023-11-21 19:58:28,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.083e+01 8.649e+01 9.254e+01 1.187e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 19:58:31,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-21 19:58:36,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246000 2023-11-21 19:58:37,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-21 19:59:04,546 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5550, loss[loss=0.07804, simple_loss=0.105, pruned_loss=0.01575, audio_tagging_loss=0.009812, over 14186.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09643, pruned_loss=0.0162, audio_tagging_loss=0.009502, over 3047477.27 frames. ], batch size: 54, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 19:59:25,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1640200.0, ans=0.1 2023-11-21 19:59:27,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1640200.0, ans=0.0 2023-11-21 19:59:31,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=8.0 2023-11-21 19:59:40,109 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246050 2023-11-21 19:59:54,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1640400.0, ans=0.2 2023-11-21 20:00:08,728 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5600, loss[loss=0.0657, simple_loss=0.08716, pruned_loss=0.01259, audio_tagging_loss=0.009537, over 14754.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.09615, pruned_loss=0.01611, audio_tagging_loss=0.009621, over 3045014.47 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 20:00:12,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1640466.6666666667, ans=0.1 2023-11-21 20:00:19,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1640533.3333333333, ans=0.125 2023-11-21 20:00:27,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1640533.3333333333, ans=0.125 2023-11-21 20:00:29,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1640533.3333333333, ans=0.0 2023-11-21 20:00:36,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1640600.0, ans=0.125 2023-11-21 20:00:37,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.033e+01 8.931e+01 9.623e+01 1.146e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-21 20:00:44,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246100 2023-11-21 20:00:47,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1640666.6666666667, ans=0.04949747468305833 2023-11-21 20:00:53,817 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 20:01:00,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1640733.3333333333, ans=0.09899494936611666 2023-11-21 20:01:11,913 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5650, loss[loss=0.07044, simple_loss=0.09125, pruned_loss=0.0144, audio_tagging_loss=0.01041, over 14987.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09596, pruned_loss=0.01624, audio_tagging_loss=0.009743, over 3050232.47 frames. ], batch size: 55, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 20:01:13,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1640800.0, ans=0.125 2023-11-21 20:01:21,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1640800.0, ans=0.0 2023-11-21 20:01:38,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1640933.3333333333, ans=0.0 2023-11-21 20:01:40,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1640933.3333333333, ans=0.0 2023-11-21 20:01:43,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1640933.3333333333, ans=10.0 2023-11-21 20:01:48,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246150 2023-11-21 20:01:57,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-21 20:02:16,864 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5700, loss[loss=0.06321, simple_loss=0.09103, pruned_loss=0.01011, audio_tagging_loss=0.007586, over 15017.00 frames. ], tot_loss[loss=0.0736, simple_loss=0.09535, pruned_loss=0.01611, audio_tagging_loss=0.00982, over 3046101.16 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 20:02:19,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1641133.3333333333, ans=0.0 2023-11-21 20:02:26,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1641133.3333333333, ans=0.125 2023-11-21 20:02:31,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1641200.0, ans=0.0 2023-11-21 20:02:43,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1641266.6666666667, ans=0.125 2023-11-21 20:02:44,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.226e+01 8.873e+01 9.610e+01 1.173e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 20:02:52,794 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246200 2023-11-21 20:03:18,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1641400.0, ans=15.0 2023-11-21 20:03:21,874 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5750, loss[loss=0.07708, simple_loss=0.08986, pruned_loss=0.01755, audio_tagging_loss=0.0146, over 15186.00 frames. ], tot_loss[loss=0.07287, simple_loss=0.09433, pruned_loss=0.01599, audio_tagging_loss=0.009717, over 3049095.45 frames. ], batch size: 60, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 20:03:34,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1641533.3333333333, ans=0.0 2023-11-21 20:03:38,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-21 20:03:46,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1641600.0, ans=0.125 2023-11-21 20:03:53,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2023-11-21 20:03:58,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246250 2023-11-21 20:04:08,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1641666.6666666667, ans=0.125 2023-11-21 20:04:11,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1641666.6666666667, ans=0.0 2023-11-21 20:04:25,618 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5800, loss[loss=0.09315, simple_loss=0.1248, pruned_loss=0.0211, audio_tagging_loss=0.00965, over 15157.00 frames. ], tot_loss[loss=0.07323, simple_loss=0.09498, pruned_loss=0.0162, audio_tagging_loss=0.009539, over 3045964.26 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:04:50,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2023-11-21 20:04:54,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1641933.3333333333, ans=0.125 2023-11-21 20:04:55,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.128e+01 8.477e+01 9.291e+01 1.218e+02, threshold=1.695e+02, percent-clipped=0.0 2023-11-21 20:04:57,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1641933.3333333333, ans=0.125 2023-11-21 20:05:01,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246300 2023-11-21 20:05:09,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1642000.0, ans=0.125 2023-11-21 20:05:12,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1642000.0, ans=0.5 2023-11-21 20:05:29,873 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5850, loss[loss=0.0482, simple_loss=0.06213, pruned_loss=0.008709, audio_tagging_loss=0.008423, over 14550.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09513, pruned_loss=0.01615, audio_tagging_loss=0.009507, over 3044182.32 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:05:36,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1642133.3333333333, ans=0.1 2023-11-21 20:05:42,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:06:04,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1642266.6666666667, ans=0.0 2023-11-21 20:06:05,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246350 2023-11-21 20:06:30,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1642400.0, ans=0.125 2023-11-21 20:06:34,579 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5900, loss[loss=0.07268, simple_loss=0.09195, pruned_loss=0.0178, audio_tagging_loss=0.008908, over 15534.00 frames. ], tot_loss[loss=0.07376, simple_loss=0.09617, pruned_loss=0.01627, audio_tagging_loss=0.009407, over 3053249.49 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:06:38,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1642466.6666666667, ans=0.2 2023-11-21 20:06:48,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1642533.3333333333, ans=0.0 2023-11-21 20:07:04,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.116e+01 8.807e+01 9.373e+01 1.126e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 20:07:10,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246400 2023-11-21 20:07:18,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1642666.6666666667, ans=0.2 2023-11-21 20:07:37,639 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 5950, loss[loss=0.06046, simple_loss=0.07136, pruned_loss=0.01421, audio_tagging_loss=0.01057, over 14793.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09573, pruned_loss=0.01602, audio_tagging_loss=0.009298, over 3052312.63 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:08:10,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-21 20:08:14,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246450 2023-11-21 20:08:19,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1643000.0, ans=0.2 2023-11-21 20:08:30,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1643066.6666666667, ans=0.1 2023-11-21 20:08:42,277 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6000, loss[loss=0.09443, simple_loss=0.1277, pruned_loss=0.02484, audio_tagging_loss=0.005756, over 15466.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.09617, pruned_loss=0.0162, audio_tagging_loss=0.009339, over 3054553.29 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 32.0 2023-11-21 20:08:42,278 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 20:09:23,089 INFO [train_asr.py:1253] (1/4) Epoch 21, validation: loss=0.05951, simple_loss=0.05205, pruned_loss=0.005242, audio_tagging_loss=0.02825, over 4681554.00 frames. 2023-11-21 20:09:23,090 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 20:09:34,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2023-11-21 20:09:45,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2023-11-21 20:09:52,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 7.949e+01 8.551e+01 9.154e+01 1.132e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-21 20:09:58,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246500 2023-11-21 20:10:01,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1643333.3333333333, ans=0.125 2023-11-21 20:10:09,450 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 20:10:26,630 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6050, loss[loss=0.06832, simple_loss=0.08739, pruned_loss=0.01217, audio_tagging_loss=0.01245, over 15842.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09562, pruned_loss=0.01601, audio_tagging_loss=0.009365, over 3053265.14 frames. ], batch size: 60, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:10:38,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1643533.3333333333, ans=0.2 2023-11-21 20:10:57,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1643600.0, ans=0.0 2023-11-21 20:11:02,845 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246550 2023-11-21 20:11:08,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-21 20:11:11,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1643666.6666666667, ans=0.0 2023-11-21 20:11:30,727 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6100, loss[loss=0.06128, simple_loss=0.0819, pruned_loss=0.01032, audio_tagging_loss=0.01001, over 15374.00 frames. ], tot_loss[loss=0.07339, simple_loss=0.09601, pruned_loss=0.01604, audio_tagging_loss=0.009347, over 3051527.72 frames. ], batch size: 59, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:11:32,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1643800.0, ans=0.0 2023-11-21 20:11:40,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-11-21 20:11:59,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2023-11-21 20:12:03,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 7.866e+01 8.426e+01 9.427e+01 1.141e+02, threshold=1.685e+02, percent-clipped=0.0 2023-11-21 20:12:06,080 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246600 2023-11-21 20:12:07,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-11-21 20:12:25,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-21 20:12:26,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1644066.6666666667, ans=0.1 2023-11-21 20:12:29,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1644066.6666666667, ans=0.1 2023-11-21 20:12:34,400 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6150, loss[loss=0.05915, simple_loss=0.07241, pruned_loss=0.01414, audio_tagging_loss=0.008809, over 14879.00 frames. ], tot_loss[loss=0.07354, simple_loss=0.0962, pruned_loss=0.01609, audio_tagging_loss=0.009348, over 3052942.52 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:12:42,676 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:13:09,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1644266.6666666667, ans=0.125 2023-11-21 20:13:10,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246650 2023-11-21 20:13:11,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1644333.3333333333, ans=0.125 2023-11-21 20:13:14,057 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:13:37,953 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6200, loss[loss=0.06358, simple_loss=0.08998, pruned_loss=0.01153, audio_tagging_loss=0.007064, over 15057.00 frames. ], tot_loss[loss=0.0738, simple_loss=0.09647, pruned_loss=0.01617, audio_tagging_loss=0.009391, over 3056705.07 frames. ], batch size: 58, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:13:49,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1644466.6666666667, ans=0.125 2023-11-21 20:13:55,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1644533.3333333333, ans=0.125 2023-11-21 20:14:05,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1644600.0, ans=0.125 2023-11-21 20:14:11,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.261e+01 8.724e+01 9.317e+01 1.600e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 20:14:13,599 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246700 2023-11-21 20:14:17,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1644666.6666666667, ans=0.125 2023-11-21 20:14:42,103 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6250, loss[loss=0.07811, simple_loss=0.1072, pruned_loss=0.01717, audio_tagging_loss=0.00733, over 14152.00 frames. ], tot_loss[loss=0.0741, simple_loss=0.09677, pruned_loss=0.01629, audio_tagging_loss=0.009429, over 3053150.98 frames. ], batch size: 53, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:15:10,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2023-11-21 20:15:17,635 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246750 2023-11-21 20:15:20,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1645000.0, ans=0.0 2023-11-21 20:15:20,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1645000.0, ans=0.125 2023-11-21 20:15:28,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1645000.0, ans=0.1 2023-11-21 20:15:34,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1645066.6666666667, ans=0.125 2023-11-21 20:15:38,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1645066.6666666667, ans=0.0 2023-11-21 20:15:45,979 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6300, loss[loss=0.06023, simple_loss=0.08541, pruned_loss=0.009946, audio_tagging_loss=0.007579, over 16365.00 frames. ], tot_loss[loss=0.07351, simple_loss=0.09552, pruned_loss=0.01618, audio_tagging_loss=0.009577, over 3049898.13 frames. ], batch size: 62, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:16:05,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1645200.0, ans=0.09899494936611666 2023-11-21 20:16:09,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-21 20:16:15,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1645266.6666666667, ans=0.125 2023-11-21 20:16:19,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.125e+01 8.837e+01 9.702e+01 1.272e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 20:16:21,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246800 2023-11-21 20:16:50,190 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6350, loss[loss=0.06732, simple_loss=0.0842, pruned_loss=0.01518, audio_tagging_loss=0.01005, over 14020.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.0962, pruned_loss=0.01639, audio_tagging_loss=0.009666, over 3042719.34 frames. ], batch size: 53, lr: 3.26e-03, grad_scale: 8.0 2023-11-21 20:16:57,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1645466.6666666667, ans=0.0 2023-11-21 20:17:16,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1645600.0, ans=0.2 2023-11-21 20:17:23,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1645600.0, ans=0.2 2023-11-21 20:17:26,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246850 2023-11-21 20:17:46,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1645733.3333333333, ans=0.125 2023-11-21 20:17:55,204 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6400, loss[loss=0.08738, simple_loss=0.1084, pruned_loss=0.0218, audio_tagging_loss=0.01139, over 14634.00 frames. ], tot_loss[loss=0.07434, simple_loss=0.09649, pruned_loss=0.01645, audio_tagging_loss=0.009646, over 3043890.35 frames. ], batch size: 56, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:18:07,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1645866.6666666667, ans=0.125 2023-11-21 20:18:18,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1645866.6666666667, ans=0.125 2023-11-21 20:18:19,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1645933.3333333333, ans=0.125 2023-11-21 20:18:27,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 7.956e+01 8.478e+01 9.124e+01 1.124e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-21 20:18:30,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246900 2023-11-21 20:18:32,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-21 20:18:50,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1646066.6666666667, ans=0.125 2023-11-21 20:18:58,454 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6450, loss[loss=0.06064, simple_loss=0.0768, pruned_loss=0.01139, audio_tagging_loss=0.01085, over 15332.00 frames. ], tot_loss[loss=0.0741, simple_loss=0.09624, pruned_loss=0.01626, audio_tagging_loss=0.009721, over 3041383.84 frames. ], batch size: 60, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:19:06,672 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:19:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1646266.6666666667, ans=0.0 2023-11-21 20:19:34,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-21 20:19:34,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 246950 2023-11-21 20:19:43,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-11-21 20:20:02,047 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6500, loss[loss=0.06678, simple_loss=0.0908, pruned_loss=0.01358, audio_tagging_loss=0.007794, over 14216.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09518, pruned_loss=0.01595, audio_tagging_loss=0.009728, over 3034502.30 frames. ], batch size: 55, lr: 3.26e-03, grad_scale: 16.0 2023-11-21 20:20:11,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2023-11-21 20:20:27,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-21 20:20:35,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.211e+01 8.118e+01 8.877e+01 9.588e+01 1.151e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-21 20:20:38,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247000 2023-11-21 20:20:40,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1646666.6666666667, ans=0.125 2023-11-21 20:20:52,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1646733.3333333333, ans=0.125 2023-11-21 20:21:00,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1646733.3333333333, ans=0.2 2023-11-21 20:21:07,044 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6550, loss[loss=0.0836, simple_loss=0.1118, pruned_loss=0.01923, audio_tagging_loss=0.008452, over 15472.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.09555, pruned_loss=0.01609, audio_tagging_loss=0.00963, over 3039661.54 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:21:15,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1646800.0, ans=0.0 2023-11-21 20:21:19,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1646866.6666666667, ans=0.125 2023-11-21 20:21:29,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1646866.6666666667, ans=0.125 2023-11-21 20:21:43,207 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247050 2023-11-21 20:21:45,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1647000.0, ans=0.125 2023-11-21 20:21:56,649 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:22:11,491 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6600, loss[loss=0.07029, simple_loss=0.09463, pruned_loss=0.01428, audio_tagging_loss=0.008698, over 14800.00 frames. ], tot_loss[loss=0.07378, simple_loss=0.09596, pruned_loss=0.01625, audio_tagging_loss=0.009546, over 3040053.32 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:22:24,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1647200.0, ans=0.04949747468305833 2023-11-21 20:22:44,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.213e+01 8.717e+01 9.748e+01 1.869e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-21 20:22:48,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247100 2023-11-21 20:22:49,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1647333.3333333333, ans=0.0 2023-11-21 20:22:53,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1647333.3333333333, ans=0.0 2023-11-21 20:23:07,379 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:23:15,531 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6650, loss[loss=0.06148, simple_loss=0.07815, pruned_loss=0.01166, audio_tagging_loss=0.01075, over 14709.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09511, pruned_loss=0.01602, audio_tagging_loss=0.009462, over 3045933.82 frames. ], batch size: 58, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:23:19,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:23:28,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1647533.3333333333, ans=0.0 2023-11-21 20:23:49,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1647600.0, ans=0.125 2023-11-21 20:23:50,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247150 2023-11-21 20:24:02,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1647666.6666666667, ans=0.125 2023-11-21 20:24:04,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1647733.3333333333, ans=0.2 2023-11-21 20:24:13,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1647733.3333333333, ans=0.0 2023-11-21 20:24:16,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1647733.3333333333, ans=0.2 2023-11-21 20:24:18,044 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6700, loss[loss=0.07277, simple_loss=0.09626, pruned_loss=0.01514, audio_tagging_loss=0.009494, over 15057.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.09537, pruned_loss=0.01604, audio_tagging_loss=0.009486, over 3042132.58 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:24:18,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1647800.0, ans=0.125 2023-11-21 20:24:23,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1647800.0, ans=0.125 2023-11-21 20:24:27,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1647800.0, ans=0.0 2023-11-21 20:24:45,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-21 20:24:51,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.787e+01 8.102e+01 8.678e+01 9.415e+01 1.201e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-21 20:24:54,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247200 2023-11-21 20:25:00,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1648000.0, ans=0.0 2023-11-21 20:25:05,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1648000.0, ans=0.1 2023-11-21 20:25:23,178 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6750, loss[loss=0.04898, simple_loss=0.05182, pruned_loss=0.01017, audio_tagging_loss=0.0129, over 13885.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09498, pruned_loss=0.01602, audio_tagging_loss=0.009424, over 3040818.35 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:25:32,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1648133.3333333333, ans=0.0 2023-11-21 20:25:58,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247250 2023-11-21 20:26:04,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-21 20:26:05,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1648333.3333333333, ans=0.0 2023-11-21 20:26:26,360 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6800, loss[loss=0.05925, simple_loss=0.08448, pruned_loss=0.009071, audio_tagging_loss=0.007941, over 14746.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09539, pruned_loss=0.01608, audio_tagging_loss=0.009446, over 3042082.74 frames. ], batch size: 56, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:26:31,446 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:26:59,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 7.919e+01 8.607e+01 9.295e+01 1.340e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-21 20:27:02,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247300 2023-11-21 20:27:15,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1648666.6666666667, ans=0.0 2023-11-21 20:27:29,480 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6850, loss[loss=0.07819, simple_loss=0.1015, pruned_loss=0.01723, audio_tagging_loss=0.01022, over 14475.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09501, pruned_loss=0.01591, audio_tagging_loss=0.009377, over 3042106.81 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:27:35,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1648800.0, ans=0.05 2023-11-21 20:28:05,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1648933.3333333333, ans=0.125 2023-11-21 20:28:06,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247350 2023-11-21 20:28:13,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1649000.0, ans=0.125 2023-11-21 20:28:33,861 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6900, loss[loss=0.05422, simple_loss=0.07626, pruned_loss=0.006527, audio_tagging_loss=0.009564, over 14190.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09499, pruned_loss=0.0158, audio_tagging_loss=0.009462, over 3048544.80 frames. ], batch size: 56, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:28:48,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1649200.0, ans=0.1 2023-11-21 20:28:52,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1649200.0, ans=0.125 2023-11-21 20:29:08,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.008e+01 8.614e+01 9.356e+01 1.357e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 20:29:08,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247400 2023-11-21 20:29:23,596 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 20:29:37,996 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 6950, loss[loss=0.07391, simple_loss=0.1033, pruned_loss=0.01388, audio_tagging_loss=0.008366, over 14005.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09638, pruned_loss=0.01601, audio_tagging_loss=0.009369, over 3056413.19 frames. ], batch size: 52, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:29:38,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-21 20:29:56,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-21 20:30:06,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2023-11-21 20:30:13,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247450 2023-11-21 20:30:20,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1649666.6666666667, ans=0.125 2023-11-21 20:30:22,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1649666.6666666667, ans=0.2 2023-11-21 20:30:24,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1649666.6666666667, ans=0.05 2023-11-21 20:30:30,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1649733.3333333333, ans=0.125 2023-11-21 20:30:41,553 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7000, loss[loss=0.08217, simple_loss=0.1115, pruned_loss=0.01751, audio_tagging_loss=0.008901, over 14602.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09589, pruned_loss=0.01603, audio_tagging_loss=0.009453, over 3050856.62 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:30:41,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1649800.0, ans=0.125 2023-11-21 20:31:11,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1649933.3333333333, ans=0.125 2023-11-21 20:31:13,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1649933.3333333333, ans=0.125 2023-11-21 20:31:18,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 7.928e+01 8.493e+01 9.155e+01 1.182e+02, threshold=1.699e+02, percent-clipped=0.0 2023-11-21 20:31:18,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247500 2023-11-21 20:31:20,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1650000.0, ans=0.125 2023-11-21 20:31:24,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:31:26,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1650000.0, ans=0.125 2023-11-21 20:31:32,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.55 vs. limit=22.5 2023-11-21 20:31:33,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-21 20:31:34,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2023-11-21 20:31:40,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1650066.6666666667, ans=0.025 2023-11-21 20:31:42,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-21 20:31:47,106 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7050, loss[loss=0.08332, simple_loss=0.1123, pruned_loss=0.02021, audio_tagging_loss=0.006967, over 15339.00 frames. ], tot_loss[loss=0.07377, simple_loss=0.0964, pruned_loss=0.01615, audio_tagging_loss=0.009422, over 3058848.88 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:31:59,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1650200.0, ans=0.1 2023-11-21 20:31:59,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1650200.0, ans=0.0 2023-11-21 20:32:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1650200.0, ans=0.125 2023-11-21 20:32:22,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247550 2023-11-21 20:32:25,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1650333.3333333333, ans=0.125 2023-11-21 20:32:44,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1650400.0, ans=0.2 2023-11-21 20:32:51,796 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7100, loss[loss=0.07281, simple_loss=0.09134, pruned_loss=0.01571, audio_tagging_loss=0.01143, over 14692.00 frames. ], tot_loss[loss=0.07289, simple_loss=0.09493, pruned_loss=0.01582, audio_tagging_loss=0.009596, over 3062750.87 frames. ], batch size: 56, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:32:57,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-11-21 20:32:58,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1650466.6666666667, ans=0.0 2023-11-21 20:33:19,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.02 vs. limit=10.0 2023-11-21 20:33:27,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.401e+01 8.093e+01 8.713e+01 9.346e+01 1.337e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 20:33:27,273 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247600 2023-11-21 20:33:36,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1650666.6666666667, ans=0.125 2023-11-21 20:33:55,486 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7150, loss[loss=0.08207, simple_loss=0.1044, pruned_loss=0.02271, audio_tagging_loss=0.007141, over 15475.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.0953, pruned_loss=0.0159, audio_tagging_loss=0.009616, over 3057069.74 frames. ], batch size: 58, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:33:57,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-21 20:33:58,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-21 20:34:01,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1650800.0, ans=0.125 2023-11-21 20:34:17,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1650866.6666666667, ans=0.0 2023-11-21 20:34:28,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1650933.3333333333, ans=0.125 2023-11-21 20:34:32,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247650 2023-11-21 20:34:38,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-21 20:34:44,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1651000.0, ans=0.125 2023-11-21 20:34:47,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1651066.6666666667, ans=0.0 2023-11-21 20:35:00,016 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7200, loss[loss=0.07736, simple_loss=0.1066, pruned_loss=0.01613, audio_tagging_loss=0.0079, over 14948.00 frames. ], tot_loss[loss=0.07354, simple_loss=0.09569, pruned_loss=0.01605, audio_tagging_loss=0.009647, over 3054351.69 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:35:15,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1651200.0, ans=0.2 2023-11-21 20:35:29,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1651266.6666666667, ans=0.0 2023-11-21 20:35:35,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.484e+01 9.159e+01 1.010e+02 1.295e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-21 20:35:35,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247700 2023-11-21 20:35:39,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1651333.3333333333, ans=0.125 2023-11-21 20:35:41,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1651333.3333333333, ans=0.125 2023-11-21 20:35:55,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1651400.0, ans=0.0 2023-11-21 20:35:55,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1651400.0, ans=0.0 2023-11-21 20:36:03,919 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7250, loss[loss=0.07122, simple_loss=0.09348, pruned_loss=0.01464, audio_tagging_loss=0.009846, over 14900.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.09557, pruned_loss=0.01589, audio_tagging_loss=0.009693, over 3054691.04 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:36:12,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1651466.6666666667, ans=0.125 2023-11-21 20:36:19,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1651533.3333333333, ans=0.125 2023-11-21 20:36:39,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247750 2023-11-21 20:36:44,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-21 20:36:48,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1651666.6666666667, ans=0.125 2023-11-21 20:37:07,608 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7300, loss[loss=0.1004, simple_loss=0.1333, pruned_loss=0.02619, audio_tagging_loss=0.007529, over 16195.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09555, pruned_loss=0.01602, audio_tagging_loss=0.00964, over 3055647.45 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:37:13,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-21 20:37:42,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1651933.3333333333, ans=0.1 2023-11-21 20:37:43,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.100e+01 8.051e+01 8.700e+01 9.385e+01 1.347e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 20:37:43,624 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247800 2023-11-21 20:37:54,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1652000.0, ans=0.125 2023-11-21 20:38:04,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1652066.6666666667, ans=0.125 2023-11-21 20:38:12,569 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7350, loss[loss=0.07537, simple_loss=0.1077, pruned_loss=0.01376, audio_tagging_loss=0.007761, over 14118.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.0945, pruned_loss=0.01584, audio_tagging_loss=0.009518, over 3050228.28 frames. ], batch size: 52, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:38:14,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1652133.3333333333, ans=0.2 2023-11-21 20:38:22,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1652133.3333333333, ans=0.1 2023-11-21 20:38:43,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-21 20:38:47,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247850 2023-11-21 20:38:48,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-21 20:38:55,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1652333.3333333333, ans=0.0 2023-11-21 20:38:59,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1652333.3333333333, ans=0.125 2023-11-21 20:39:02,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1652400.0, ans=0.125 2023-11-21 20:39:02,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1652400.0, ans=0.0 2023-11-21 20:39:11,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1652400.0, ans=0.2 2023-11-21 20:39:15,989 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7400, loss[loss=0.07424, simple_loss=0.09467, pruned_loss=0.01811, audio_tagging_loss=0.008795, over 15810.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.09515, pruned_loss=0.0159, audio_tagging_loss=0.009379, over 3052973.52 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:39:26,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1652466.6666666667, ans=0.125 2023-11-21 20:39:51,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.103e+01 8.809e+01 9.626e+01 1.321e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 20:39:51,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247900 2023-11-21 20:39:52,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-21 20:39:53,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-11-21 20:40:07,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1652733.3333333333, ans=0.2 2023-11-21 20:40:16,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1652733.3333333333, ans=0.09899494936611666 2023-11-21 20:40:19,642 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7450, loss[loss=0.07785, simple_loss=0.09458, pruned_loss=0.01997, audio_tagging_loss=0.01059, over 15868.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.09404, pruned_loss=0.01579, audio_tagging_loss=0.009382, over 3052957.58 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:40:53,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2023-11-21 20:40:55,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 247950 2023-11-21 20:41:14,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-21 20:41:16,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1653066.6666666667, ans=0.0 2023-11-21 20:41:22,595 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7500, loss[loss=0.06033, simple_loss=0.07748, pruned_loss=0.01248, audio_tagging_loss=0.009108, over 14147.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09535, pruned_loss=0.016, audio_tagging_loss=0.009233, over 3054143.59 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:41:34,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1653200.0, ans=0.125 2023-11-21 20:41:34,556 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:41:46,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1653200.0, ans=0.0 2023-11-21 20:41:57,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.210e+01 8.790e+01 9.485e+01 1.319e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 20:41:58,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248000 2023-11-21 20:42:16,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1653400.0, ans=0.125 2023-11-21 20:42:26,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-21 20:42:29,568 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7550, loss[loss=0.06833, simple_loss=0.08967, pruned_loss=0.01404, audio_tagging_loss=0.009452, over 15921.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09448, pruned_loss=0.01583, audio_tagging_loss=0.009246, over 3053896.09 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:42:39,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-21 20:42:52,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1653533.3333333333, ans=0.0 2023-11-21 20:42:55,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1653600.0, ans=0.125 2023-11-21 20:43:05,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248050 2023-11-21 20:43:06,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1653666.6666666667, ans=0.0 2023-11-21 20:43:12,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1653666.6666666667, ans=0.1 2023-11-21 20:43:20,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=12.0 2023-11-21 20:43:32,794 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7600, loss[loss=0.06075, simple_loss=0.08315, pruned_loss=0.008603, audio_tagging_loss=0.01058, over 15812.00 frames. ], tot_loss[loss=0.07185, simple_loss=0.09369, pruned_loss=0.0157, audio_tagging_loss=0.009304, over 3048172.71 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:43:37,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1653800.0, ans=0.025 2023-11-21 20:43:40,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1653800.0, ans=0.0 2023-11-21 20:43:51,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1653866.6666666667, ans=0.1 2023-11-21 20:43:56,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1653866.6666666667, ans=0.1 2023-11-21 20:44:03,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1653933.3333333333, ans=0.0 2023-11-21 20:44:08,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.151e+01 8.758e+01 9.560e+01 1.334e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 20:44:08,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248100 2023-11-21 20:44:08,521 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:44:33,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2023-11-21 20:44:36,143 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7650, loss[loss=0.06107, simple_loss=0.08754, pruned_loss=0.01016, audio_tagging_loss=0.007133, over 13994.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09292, pruned_loss=0.01575, audio_tagging_loss=0.009428, over 3039935.27 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:44:36,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-21 20:44:44,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-11-21 20:44:45,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1654133.3333333333, ans=0.1 2023-11-21 20:45:11,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248150 2023-11-21 20:45:40,003 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7700, loss[loss=0.07887, simple_loss=0.1067, pruned_loss=0.01577, audio_tagging_loss=0.009743, over 15780.00 frames. ], tot_loss[loss=0.07164, simple_loss=0.09316, pruned_loss=0.01568, audio_tagging_loss=0.00938, over 3038558.85 frames. ], batch size: 58, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:45:41,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1654466.6666666667, ans=0.0 2023-11-21 20:45:50,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1654466.6666666667, ans=10.0 2023-11-21 20:45:51,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1654533.3333333333, ans=0.125 2023-11-21 20:46:15,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.008e+01 8.521e+01 9.342e+01 1.167e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-21 20:46:16,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248200 2023-11-21 20:46:36,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1654733.3333333333, ans=0.0 2023-11-21 20:46:42,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1654800.0, ans=0.125 2023-11-21 20:46:43,451 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7750, loss[loss=0.08007, simple_loss=0.1138, pruned_loss=0.01705, audio_tagging_loss=0.006128, over 14984.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09482, pruned_loss=0.01587, audio_tagging_loss=0.009395, over 3040921.07 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 32.0 2023-11-21 20:47:11,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-21 20:47:13,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1654933.3333333333, ans=0.1 2023-11-21 20:47:19,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248250 2023-11-21 20:47:19,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1654933.3333333333, ans=0.0 2023-11-21 20:47:46,955 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7800, loss[loss=0.1048, simple_loss=0.1472, pruned_loss=0.02513, audio_tagging_loss=0.006076, over 14873.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.0947, pruned_loss=0.01579, audio_tagging_loss=0.009411, over 3041307.56 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:47:50,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1655133.3333333333, ans=0.125 2023-11-21 20:47:50,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1655133.3333333333, ans=0.04949747468305833 2023-11-21 20:47:51,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1655133.3333333333, ans=0.125 2023-11-21 20:47:55,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1655133.3333333333, ans=0.1 2023-11-21 20:47:56,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1655133.3333333333, ans=0.2 2023-11-21 20:48:23,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248300 2023-11-21 20:48:24,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.033e+01 8.572e+01 9.400e+01 1.167e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-21 20:48:25,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1655333.3333333333, ans=0.1 2023-11-21 20:48:29,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1655333.3333333333, ans=0.125 2023-11-21 20:48:51,005 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7850, loss[loss=0.08186, simple_loss=0.1039, pruned_loss=0.02074, audio_tagging_loss=0.009175, over 14724.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09437, pruned_loss=0.0157, audio_tagging_loss=0.009468, over 3040190.37 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:49:10,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1655533.3333333333, ans=0.125 2023-11-21 20:49:25,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248350 2023-11-21 20:49:30,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1655666.6666666667, ans=0.125 2023-11-21 20:49:32,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-11-21 20:49:53,838 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7900, loss[loss=0.07213, simple_loss=0.09416, pruned_loss=0.01343, audio_tagging_loss=0.01162, over 15721.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.0945, pruned_loss=0.01572, audio_tagging_loss=0.009484, over 3046729.76 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:50:02,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1655800.0, ans=0.1 2023-11-21 20:50:11,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1655866.6666666667, ans=0.0 2023-11-21 20:50:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1655933.3333333333, ans=0.125 2023-11-21 20:50:29,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248400 2023-11-21 20:50:32,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.166e+01 8.800e+01 9.422e+01 1.274e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-21 20:50:39,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1656000.0, ans=0.125 2023-11-21 20:50:39,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-21 20:50:46,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.30 vs. limit=5.0 2023-11-21 20:50:52,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-21 20:50:56,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1656133.3333333333, ans=0.1 2023-11-21 20:50:57,273 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 7950, loss[loss=0.07512, simple_loss=0.09939, pruned_loss=0.01583, audio_tagging_loss=0.009593, over 15114.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09427, pruned_loss=0.01576, audio_tagging_loss=0.009636, over 3050091.28 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 8.0 2023-11-21 20:51:00,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1656133.3333333333, ans=0.2 2023-11-21 20:51:01,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1656133.3333333333, ans=0.2 2023-11-21 20:51:01,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-21 20:51:13,908 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 20:51:26,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1656266.6666666667, ans=0.125 2023-11-21 20:51:34,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248450 2023-11-21 20:52:02,357 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8000, loss[loss=0.06217, simple_loss=0.07337, pruned_loss=0.0147, audio_tagging_loss=0.01078, over 16302.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09311, pruned_loss=0.01563, audio_tagging_loss=0.009745, over 3050029.69 frames. ], batch size: 63, lr: 3.25e-03, grad_scale: 16.0 2023-11-21 20:52:37,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248500 2023-11-21 20:52:37,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1656600.0, ans=0.1 2023-11-21 20:52:39,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.541e+01 8.016e+01 8.965e+01 9.786e+01 1.299e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-21 20:52:42,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1656666.6666666667, ans=0.125 2023-11-21 20:52:59,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1656733.3333333333, ans=0.125 2023-11-21 20:53:05,619 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8050, loss[loss=0.09018, simple_loss=0.1111, pruned_loss=0.02308, audio_tagging_loss=0.01157, over 15345.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09246, pruned_loss=0.01575, audio_tagging_loss=0.009854, over 3050153.20 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:53:14,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1656800.0, ans=10.0 2023-11-21 20:53:19,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1656866.6666666667, ans=0.125 2023-11-21 20:53:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1656866.6666666667, ans=0.125 2023-11-21 20:53:27,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-21 20:53:27,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1656866.6666666667, ans=0.125 2023-11-21 20:53:32,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1656933.3333333333, ans=0.125 2023-11-21 20:53:39,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1656933.3333333333, ans=0.1 2023-11-21 20:53:41,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248550 2023-11-21 20:53:55,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-21 20:53:58,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1657066.6666666667, ans=0.0 2023-11-21 20:54:08,577 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8100, loss[loss=0.08457, simple_loss=0.113, pruned_loss=0.01876, audio_tagging_loss=0.009295, over 15358.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09328, pruned_loss=0.01588, audio_tagging_loss=0.009803, over 3048317.33 frames. ], batch size: 56, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:54:12,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1657133.3333333333, ans=0.125 2023-11-21 20:54:39,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1657266.6666666667, ans=0.0 2023-11-21 20:54:41,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1657266.6666666667, ans=0.0 2023-11-21 20:54:44,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248600 2023-11-21 20:54:47,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.022e+01 8.595e+01 9.139e+01 1.165e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-21 20:55:06,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1657400.0, ans=0.07 2023-11-21 20:55:12,701 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8150, loss[loss=0.06338, simple_loss=0.09421, pruned_loss=0.009883, audio_tagging_loss=0.006391, over 15547.00 frames. ], tot_loss[loss=0.07227, simple_loss=0.09362, pruned_loss=0.01583, audio_tagging_loss=0.009632, over 3051928.61 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:55:22,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1657466.6666666667, ans=0.2 2023-11-21 20:55:39,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1657600.0, ans=0.125 2023-11-21 20:55:41,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1657600.0, ans=0.125 2023-11-21 20:55:42,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1657600.0, ans=0.125 2023-11-21 20:55:47,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248650 2023-11-21 20:56:00,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2023-11-21 20:56:12,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1657733.3333333333, ans=0.0 2023-11-21 20:56:16,184 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8200, loss[loss=0.07458, simple_loss=0.09836, pruned_loss=0.01697, audio_tagging_loss=0.008424, over 16383.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09471, pruned_loss=0.0158, audio_tagging_loss=0.00952, over 3049462.61 frames. ], batch size: 61, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:56:16,247 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 20:56:23,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1657800.0, ans=0.125 2023-11-21 20:56:24,737 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 20:56:25,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1657800.0, ans=0.1 2023-11-21 20:56:32,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=1657866.6666666667, ans=12.0 2023-11-21 20:56:43,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1657933.3333333333, ans=0.125 2023-11-21 20:56:50,613 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248700 2023-11-21 20:56:52,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.429e+01 8.006e+01 8.771e+01 9.726e+01 1.237e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 20:57:04,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1658000.0, ans=0.125 2023-11-21 20:57:08,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-21 20:57:12,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1658066.6666666667, ans=0.125 2023-11-21 20:57:17,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1658133.3333333333, ans=0.1 2023-11-21 20:57:18,314 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8250, loss[loss=0.0941, simple_loss=0.1201, pruned_loss=0.02494, audio_tagging_loss=0.009092, over 14897.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09489, pruned_loss=0.01602, audio_tagging_loss=0.009448, over 3046385.36 frames. ], batch size: 55, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:57:21,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-21 20:57:29,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1658200.0, ans=0.1 2023-11-21 20:57:40,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1658200.0, ans=0.1 2023-11-21 20:57:54,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248750 2023-11-21 20:58:16,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1658400.0, ans=0.0 2023-11-21 20:58:22,525 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8300, loss[loss=0.07958, simple_loss=0.09721, pruned_loss=0.01878, audio_tagging_loss=0.0122, over 15621.00 frames. ], tot_loss[loss=0.07326, simple_loss=0.09567, pruned_loss=0.01602, audio_tagging_loss=0.009404, over 3048449.31 frames. ], batch size: 59, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:58:38,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1658533.3333333333, ans=0.5 2023-11-21 20:58:47,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1658600.0, ans=0.125 2023-11-21 20:58:58,063 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248800 2023-11-21 20:59:00,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.249e+01 8.152e+01 8.713e+01 9.319e+01 1.326e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 20:59:07,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1658666.6666666667, ans=0.0 2023-11-21 20:59:07,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-21 20:59:13,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1658733.3333333333, ans=0.5 2023-11-21 20:59:15,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1658733.3333333333, ans=0.125 2023-11-21 20:59:26,937 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8350, loss[loss=0.06621, simple_loss=0.09187, pruned_loss=0.01349, audio_tagging_loss=0.006789, over 14116.00 frames. ], tot_loss[loss=0.07338, simple_loss=0.09618, pruned_loss=0.01594, audio_tagging_loss=0.009357, over 3048868.44 frames. ], batch size: 54, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 20:59:34,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1658800.0, ans=15.0 2023-11-21 20:59:48,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1658866.6666666667, ans=0.0 2023-11-21 20:59:50,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1658933.3333333333, ans=0.0 2023-11-21 20:59:58,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1658933.3333333333, ans=0.125 2023-11-21 21:00:02,774 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248850 2023-11-21 21:00:12,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1659000.0, ans=0.0 2023-11-21 21:00:30,752 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8400, loss[loss=0.06585, simple_loss=0.08541, pruned_loss=0.01328, audio_tagging_loss=0.009861, over 15887.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.09634, pruned_loss=0.01603, audio_tagging_loss=0.009294, over 3055650.74 frames. ], batch size: 61, lr: 3.24e-03, grad_scale: 32.0 2023-11-21 21:01:06,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248900 2023-11-21 21:01:10,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 7.990e+01 8.773e+01 9.433e+01 1.119e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 21:01:11,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1659333.3333333333, ans=0.04949747468305833 2023-11-21 21:01:23,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1659400.0, ans=0.2 2023-11-21 21:01:25,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1659400.0, ans=0.0 2023-11-21 21:01:33,138 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8450, loss[loss=0.06025, simple_loss=0.07622, pruned_loss=0.0119, audio_tagging_loss=0.01024, over 14818.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09537, pruned_loss=0.01598, audio_tagging_loss=0.009451, over 3051650.50 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:01:37,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-21 21:01:45,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2023-11-21 21:01:46,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1659533.3333333333, ans=0.95 2023-11-21 21:01:48,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1659533.3333333333, ans=0.2 2023-11-21 21:01:49,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1659533.3333333333, ans=0.0 2023-11-21 21:02:08,939 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 248950 2023-11-21 21:02:12,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1659666.6666666667, ans=0.0 2023-11-21 21:02:36,328 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8500, loss[loss=0.0824, simple_loss=0.1197, pruned_loss=0.01437, audio_tagging_loss=0.008182, over 15654.00 frames. ], tot_loss[loss=0.07375, simple_loss=0.09624, pruned_loss=0.01615, audio_tagging_loss=0.009482, over 3048302.66 frames. ], batch size: 58, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:02:51,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1659866.6666666667, ans=0.0 2023-11-21 21:02:51,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2023-11-21 21:02:54,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-21 21:03:03,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1659933.3333333333, ans=0.125 2023-11-21 21:03:12,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249000 2023-11-21 21:03:16,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.238e+01 7.932e+01 8.481e+01 9.270e+01 1.196e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-21 21:03:33,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1660066.6666666667, ans=0.125 2023-11-21 21:03:41,016 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8550, loss[loss=0.08588, simple_loss=0.1305, pruned_loss=0.01634, audio_tagging_loss=0.004308, over 16225.00 frames. ], tot_loss[loss=0.07306, simple_loss=0.09562, pruned_loss=0.01582, audio_tagging_loss=0.009426, over 3053095.96 frames. ], batch size: 60, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:03:47,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1660133.3333333333, ans=0.95 2023-11-21 21:03:51,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1660133.3333333333, ans=0.125 2023-11-21 21:03:52,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1660200.0, ans=0.125 2023-11-21 21:04:07,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2023-11-21 21:04:12,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1660266.6666666667, ans=0.1 2023-11-21 21:04:12,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1660266.6666666667, ans=0.125 2023-11-21 21:04:16,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249050 2023-11-21 21:04:17,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1660333.3333333333, ans=0.0 2023-11-21 21:04:21,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1660333.3333333333, ans=0.125 2023-11-21 21:04:30,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1660400.0, ans=0.125 2023-11-21 21:04:43,476 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8600, loss[loss=0.07998, simple_loss=0.1133, pruned_loss=0.01588, audio_tagging_loss=0.007451, over 15130.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09558, pruned_loss=0.01594, audio_tagging_loss=0.009462, over 3047955.46 frames. ], batch size: 55, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:05:06,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-21 21:05:20,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249100 2023-11-21 21:05:24,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2023-11-21 21:05:24,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.123e+01 8.590e+01 9.454e+01 1.253e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-21 21:05:30,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1660666.6666666667, ans=0.125 2023-11-21 21:05:31,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1660666.6666666667, ans=0.025 2023-11-21 21:05:47,747 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8650, loss[loss=0.07036, simple_loss=0.09566, pruned_loss=0.01436, audio_tagging_loss=0.008171, over 15592.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09586, pruned_loss=0.01601, audio_tagging_loss=0.009478, over 3049727.02 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:06:00,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-21 21:06:01,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1660866.6666666667, ans=0.1 2023-11-21 21:06:09,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-11-21 21:06:17,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1660933.3333333333, ans=0.0 2023-11-21 21:06:20,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-21 21:06:23,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249150 2023-11-21 21:06:47,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1661066.6666666667, ans=0.125 2023-11-21 21:06:51,489 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8700, loss[loss=0.1078, simple_loss=0.1383, pruned_loss=0.03082, audio_tagging_loss=0.007776, over 15889.00 frames. ], tot_loss[loss=0.07379, simple_loss=0.09617, pruned_loss=0.01623, audio_tagging_loss=0.009475, over 3053732.31 frames. ], batch size: 56, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:06:51,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1661133.3333333333, ans=0.125 2023-11-21 21:07:02,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=22.5 2023-11-21 21:07:08,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1661200.0, ans=0.0 2023-11-21 21:07:11,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1661200.0, ans=0.04949747468305833 2023-11-21 21:07:27,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249200 2023-11-21 21:07:32,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.673e+01 8.120e+01 8.963e+01 9.729e+01 1.356e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-21 21:07:55,268 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8750, loss[loss=0.07224, simple_loss=0.09369, pruned_loss=0.01666, audio_tagging_loss=0.008734, over 16046.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09687, pruned_loss=0.01638, audio_tagging_loss=0.00951, over 3052285.97 frames. ], batch size: 60, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:07:55,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1661466.6666666667, ans=0.0 2023-11-21 21:08:11,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-21 21:08:28,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1661600.0, ans=0.2 2023-11-21 21:08:31,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249250 2023-11-21 21:08:31,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1661600.0, ans=0.0 2023-11-21 21:08:37,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1661666.6666666667, ans=0.125 2023-11-21 21:08:47,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1661733.3333333333, ans=0.0 2023-11-21 21:08:50,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1661733.3333333333, ans=0.2 2023-11-21 21:08:59,671 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8800, loss[loss=0.06331, simple_loss=0.08394, pruned_loss=0.01124, audio_tagging_loss=0.0101, over 15095.00 frames. ], tot_loss[loss=0.07498, simple_loss=0.09773, pruned_loss=0.01653, audio_tagging_loss=0.00958, over 3057152.45 frames. ], batch size: 59, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:09:31,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1661933.3333333333, ans=0.2 2023-11-21 21:09:35,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249300 2023-11-21 21:09:36,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1662000.0, ans=0.2 2023-11-21 21:09:40,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-11-21 21:09:40,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.249e+01 8.864e+01 9.702e+01 1.215e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-21 21:10:02,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1662133.3333333333, ans=0.0 2023-11-21 21:10:03,852 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8850, loss[loss=0.06675, simple_loss=0.08776, pruned_loss=0.012, audio_tagging_loss=0.01087, over 16711.00 frames. ], tot_loss[loss=0.07481, simple_loss=0.09762, pruned_loss=0.01639, audio_tagging_loss=0.009612, over 3057278.54 frames. ], batch size: 64, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:10:15,548 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:10:20,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1662200.0, ans=0.0 2023-11-21 21:10:23,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1662200.0, ans=0.1 2023-11-21 21:10:25,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1662200.0, ans=0.1 2023-11-21 21:10:39,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249350 2023-11-21 21:10:43,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1662333.3333333333, ans=15.0 2023-11-21 21:11:06,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1662466.6666666667, ans=0.125 2023-11-21 21:11:07,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2023-11-21 21:11:07,398 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8900, loss[loss=0.09924, simple_loss=0.1391, pruned_loss=0.02102, audio_tagging_loss=0.008661, over 15664.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09812, pruned_loss=0.01654, audio_tagging_loss=0.009478, over 3056219.71 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:11:33,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-21 21:11:43,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249400 2023-11-21 21:11:49,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.502e+01 8.199e+01 8.684e+01 9.586e+01 1.559e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 21:12:12,027 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 8950, loss[loss=0.0793, simple_loss=0.1033, pruned_loss=0.01859, audio_tagging_loss=0.009078, over 15497.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09742, pruned_loss=0.01629, audio_tagging_loss=0.009326, over 3049770.82 frames. ], batch size: 59, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:12:18,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1662800.0, ans=0.1 2023-11-21 21:12:23,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1662866.6666666667, ans=0.125 2023-11-21 21:12:23,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1662866.6666666667, ans=0.0 2023-11-21 21:12:23,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1662866.6666666667, ans=0.125 2023-11-21 21:12:44,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1662933.3333333333, ans=15.0 2023-11-21 21:12:47,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249450 2023-11-21 21:13:14,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1663133.3333333333, ans=0.125 2023-11-21 21:13:15,722 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9000, loss[loss=0.08399, simple_loss=0.1108, pruned_loss=0.0185, audio_tagging_loss=0.01007, over 15731.00 frames. ], tot_loss[loss=0.07437, simple_loss=0.09767, pruned_loss=0.01631, audio_tagging_loss=0.00922, over 3059276.61 frames. ], batch size: 58, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:13:15,723 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 21:13:56,851 INFO [train_asr.py:1253] (1/4) Epoch 21, validation: loss=0.06003, simple_loss=0.05194, pruned_loss=0.005168, audio_tagging_loss=0.0289, over 4681554.00 frames. 2023-11-21 21:13:56,852 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 21:14:05,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1663133.3333333333, ans=0.1 2023-11-21 21:14:15,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1663200.0, ans=0.125 2023-11-21 21:14:30,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1663266.6666666667, ans=0.0 2023-11-21 21:14:32,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249500 2023-11-21 21:14:35,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1663333.3333333333, ans=0.125 2023-11-21 21:14:38,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 7.983e+01 8.777e+01 1.012e+02 1.174e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-21 21:14:48,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1663400.0, ans=0.125 2023-11-21 21:14:57,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1663400.0, ans=22.5 2023-11-21 21:14:58,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-21 21:15:01,138 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9050, loss[loss=0.06722, simple_loss=0.0821, pruned_loss=0.01376, audio_tagging_loss=0.01241, over 14608.00 frames. ], tot_loss[loss=0.07383, simple_loss=0.09674, pruned_loss=0.01619, audio_tagging_loss=0.009259, over 3056881.66 frames. ], batch size: 58, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:15:05,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1663466.6666666667, ans=0.0 2023-11-21 21:15:15,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1663533.3333333333, ans=0.125 2023-11-21 21:15:24,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:15:36,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249550 2023-11-21 21:15:43,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1663666.6666666667, ans=0.09899494936611666 2023-11-21 21:16:04,943 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9100, loss[loss=0.08602, simple_loss=0.1178, pruned_loss=0.01707, audio_tagging_loss=0.01003, over 15363.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09629, pruned_loss=0.0161, audio_tagging_loss=0.009179, over 3063247.57 frames. ], batch size: 56, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:16:08,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=12.0 2023-11-21 21:16:22,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1663866.6666666667, ans=0.125 2023-11-21 21:16:32,504 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:16:32,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1663933.3333333333, ans=0.125 2023-11-21 21:16:40,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249600 2023-11-21 21:16:42,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2023-11-21 21:16:43,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1664000.0, ans=0.125 2023-11-21 21:16:44,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1664000.0, ans=0.125 2023-11-21 21:16:47,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.138e+01 8.810e+01 9.417e+01 1.142e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 21:16:48,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-11-21 21:16:51,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1664000.0, ans=0.1 2023-11-21 21:17:07,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1664133.3333333333, ans=0.125 2023-11-21 21:17:08,741 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9150, loss[loss=0.05753, simple_loss=0.07163, pruned_loss=0.01238, audio_tagging_loss=0.009329, over 14877.00 frames. ], tot_loss[loss=0.07311, simple_loss=0.09597, pruned_loss=0.01598, audio_tagging_loss=0.009148, over 3061401.87 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:17:09,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-21 21:17:45,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249650 2023-11-21 21:17:47,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1664333.3333333333, ans=0.125 2023-11-21 21:17:59,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1664400.0, ans=0.125 2023-11-21 21:18:14,557 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9200, loss[loss=0.07069, simple_loss=0.09523, pruned_loss=0.01375, audio_tagging_loss=0.009318, over 16316.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09542, pruned_loss=0.01587, audio_tagging_loss=0.009127, over 3068036.24 frames. ], batch size: 62, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:18:49,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249700 2023-11-21 21:18:56,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.394e+01 8.125e+01 8.694e+01 9.340e+01 1.162e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 21:19:04,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1664666.6666666667, ans=0.125 2023-11-21 21:19:19,589 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9250, loss[loss=0.0676, simple_loss=0.0929, pruned_loss=0.0107, audio_tagging_loss=0.01044, over 14840.00 frames. ], tot_loss[loss=0.07222, simple_loss=0.09463, pruned_loss=0.01568, audio_tagging_loss=0.009226, over 3071628.87 frames. ], batch size: 55, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:19:20,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1664800.0, ans=0.125 2023-11-21 21:19:24,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1664800.0, ans=0.125 2023-11-21 21:19:33,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-21 21:19:38,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1664866.6666666667, ans=0.0 2023-11-21 21:19:41,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1664866.6666666667, ans=0.125 2023-11-21 21:19:55,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249750 2023-11-21 21:20:22,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1665133.3333333333, ans=0.125 2023-11-21 21:20:23,908 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9300, loss[loss=0.04658, simple_loss=0.0524, pruned_loss=0.009877, audio_tagging_loss=0.01051, over 15306.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09432, pruned_loss=0.01556, audio_tagging_loss=0.009261, over 3068270.84 frames. ], batch size: 61, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:20:38,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1665200.0, ans=0.1 2023-11-21 21:20:52,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1665266.6666666667, ans=0.0 2023-11-21 21:20:59,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1665266.6666666667, ans=0.0 2023-11-21 21:21:00,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249800 2023-11-21 21:21:06,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.134e+01 8.648e+01 9.186e+01 1.314e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-21 21:21:06,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1665333.3333333333, ans=0.1 2023-11-21 21:21:12,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-11-21 21:21:16,899 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:21:29,040 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9350, loss[loss=0.0763, simple_loss=0.1023, pruned_loss=0.01757, audio_tagging_loss=0.007593, over 15806.00 frames. ], tot_loss[loss=0.07243, simple_loss=0.09474, pruned_loss=0.01577, audio_tagging_loss=0.009288, over 3072902.83 frames. ], batch size: 58, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:21:35,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1665466.6666666667, ans=0.1 2023-11-21 21:21:42,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1665533.3333333333, ans=0.1 2023-11-21 21:21:47,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1665533.3333333333, ans=0.125 2023-11-21 21:21:48,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1665533.3333333333, ans=0.0 2023-11-21 21:21:57,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1665600.0, ans=0.0 2023-11-21 21:22:04,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249850 2023-11-21 21:22:18,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1665666.6666666667, ans=0.1 2023-11-21 21:22:33,837 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9400, loss[loss=0.09621, simple_loss=0.1247, pruned_loss=0.02258, audio_tagging_loss=0.0113, over 15084.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09503, pruned_loss=0.01597, audio_tagging_loss=0.009432, over 3056030.36 frames. ], batch size: 57, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:22:35,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1665800.0, ans=0.125 2023-11-21 21:23:09,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249900 2023-11-21 21:23:13,614 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:23:16,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 7.941e+01 8.716e+01 9.449e+01 2.230e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-21 21:23:34,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1666066.6666666667, ans=0.125 2023-11-21 21:23:35,587 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:23:35,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666066.6666666667, ans=0.1 2023-11-21 21:23:35,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1666066.6666666667, ans=0.125 2023-11-21 21:23:38,006 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9450, loss[loss=0.05856, simple_loss=0.07725, pruned_loss=0.01078, audio_tagging_loss=0.009154, over 15044.00 frames. ], tot_loss[loss=0.07293, simple_loss=0.09535, pruned_loss=0.01583, audio_tagging_loss=0.009421, over 3059103.56 frames. ], batch size: 58, lr: 3.24e-03, grad_scale: 16.0 2023-11-21 21:23:50,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1666200.0, ans=0.125 2023-11-21 21:23:54,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1666200.0, ans=0.025 2023-11-21 21:23:59,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1666200.0, ans=0.1 2023-11-21 21:24:14,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 249950 2023-11-21 21:24:19,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-11-21 21:24:28,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1666400.0, ans=10.0 2023-11-21 21:24:41,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-21 21:24:42,637 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9500, loss[loss=0.07604, simple_loss=0.1072, pruned_loss=0.01458, audio_tagging_loss=0.007843, over 16058.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09423, pruned_loss=0.01564, audio_tagging_loss=0.009602, over 3055603.17 frames. ], batch size: 60, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:24:45,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1666466.6666666667, ans=0.05 2023-11-21 21:24:47,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=22.5 2023-11-21 21:24:50,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1666466.6666666667, ans=0.125 2023-11-21 21:24:51,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1666466.6666666667, ans=0.0 2023-11-21 21:24:52,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666466.6666666667, ans=0.1 2023-11-21 21:24:56,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1666533.3333333333, ans=0.0 2023-11-21 21:24:56,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1666533.3333333333, ans=0.125 2023-11-21 21:24:57,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1666533.3333333333, ans=0.2 2023-11-21 21:25:00,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1666533.3333333333, ans=0.125 2023-11-21 21:25:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666533.3333333333, ans=0.1 2023-11-21 21:25:03,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1666533.3333333333, ans=0.125 2023-11-21 21:25:04,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-21 21:25:18,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250000 2023-11-21 21:25:23,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-21 21:25:26,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.539e+01 8.975e+01 9.633e+01 1.238e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-21 21:25:27,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1666666.6666666667, ans=0.125 2023-11-21 21:25:32,358 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:25:48,001 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9550, loss[loss=0.08615, simple_loss=0.1233, pruned_loss=0.01551, audio_tagging_loss=0.009015, over 15043.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09499, pruned_loss=0.01567, audio_tagging_loss=0.009662, over 3050384.68 frames. ], batch size: 56, lr: 3.24e-03, grad_scale: 8.0 2023-11-21 21:25:58,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1666800.0, ans=15.0 2023-11-21 21:26:03,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1666866.6666666667, ans=0.1 2023-11-21 21:26:24,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250050 2023-11-21 21:26:53,391 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9600, loss[loss=0.05573, simple_loss=0.0742, pruned_loss=0.009225, audio_tagging_loss=0.009406, over 14995.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09489, pruned_loss=0.0157, audio_tagging_loss=0.009696, over 3054686.65 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:26:58,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2023-11-21 21:27:00,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-21 21:27:07,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1667200.0, ans=0.1 2023-11-21 21:27:15,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-21 21:27:20,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1667266.6666666667, ans=0.125 2023-11-21 21:27:30,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250100 2023-11-21 21:27:33,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-11-21 21:27:35,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-21 21:27:36,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1667333.3333333333, ans=0.125 2023-11-21 21:27:38,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.305e+01 8.780e+01 9.351e+01 2.082e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-21 21:27:43,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1667333.3333333333, ans=0.125 2023-11-21 21:27:58,568 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9650, loss[loss=0.08052, simple_loss=0.09807, pruned_loss=0.02347, audio_tagging_loss=0.00801, over 15815.00 frames. ], tot_loss[loss=0.07293, simple_loss=0.09458, pruned_loss=0.01592, audio_tagging_loss=0.009721, over 3048677.01 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:28:09,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1667466.6666666667, ans=0.125 2023-11-21 21:28:34,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250150 2023-11-21 21:28:39,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1667666.6666666667, ans=0.2 2023-11-21 21:28:50,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1667733.3333333333, ans=0.0 2023-11-21 21:28:58,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1667733.3333333333, ans=0.125 2023-11-21 21:29:03,957 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9700, loss[loss=0.0823, simple_loss=0.1126, pruned_loss=0.01944, audio_tagging_loss=0.006553, over 15154.00 frames. ], tot_loss[loss=0.07278, simple_loss=0.09456, pruned_loss=0.01588, audio_tagging_loss=0.009622, over 3049700.68 frames. ], batch size: 54, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:29:12,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1667800.0, ans=0.0 2023-11-21 21:29:21,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1667866.6666666667, ans=0.125 2023-11-21 21:29:40,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250200 2023-11-21 21:29:44,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1668000.0, ans=0.0 2023-11-21 21:29:47,778 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.673e+01 8.250e+01 8.691e+01 9.260e+01 1.203e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 21:29:57,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1668066.6666666667, ans=10.0 2023-11-21 21:30:09,300 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9750, loss[loss=0.09079, simple_loss=0.1255, pruned_loss=0.01885, audio_tagging_loss=0.009171, over 16850.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09489, pruned_loss=0.01592, audio_tagging_loss=0.009546, over 3049366.34 frames. ], batch size: 61, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:30:18,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1668133.3333333333, ans=0.2 2023-11-21 21:30:28,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1668200.0, ans=0.1 2023-11-21 21:30:46,073 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250250 2023-11-21 21:31:01,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1668400.0, ans=0.125 2023-11-21 21:31:09,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1668400.0, ans=0.125 2023-11-21 21:31:14,289 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9800, loss[loss=0.06104, simple_loss=0.07648, pruned_loss=0.01368, audio_tagging_loss=0.009123, over 15195.00 frames. ], tot_loss[loss=0.0725, simple_loss=0.09429, pruned_loss=0.01581, audio_tagging_loss=0.009541, over 3046744.62 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:31:23,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1668466.6666666667, ans=0.125 2023-11-21 21:31:45,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1668600.0, ans=0.125 2023-11-21 21:31:50,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250300 2023-11-21 21:31:53,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1668666.6666666667, ans=0.0 2023-11-21 21:31:58,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 7.987e+01 8.487e+01 9.363e+01 1.387e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-21 21:32:11,925 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:32:19,229 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9850, loss[loss=0.06208, simple_loss=0.07872, pruned_loss=0.01181, audio_tagging_loss=0.01091, over 15479.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09422, pruned_loss=0.0157, audio_tagging_loss=0.009505, over 3050522.92 frames. ], batch size: 60, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:32:19,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1668800.0, ans=0.125 2023-11-21 21:32:53,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1668933.3333333333, ans=0.0 2023-11-21 21:32:55,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250350 2023-11-21 21:33:23,699 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9900, loss[loss=0.09538, simple_loss=0.127, pruned_loss=0.02531, audio_tagging_loss=0.006576, over 14986.00 frames. ], tot_loss[loss=0.07338, simple_loss=0.09569, pruned_loss=0.0161, audio_tagging_loss=0.009437, over 3054074.20 frames. ], batch size: 54, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:33:50,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1669266.6666666667, ans=0.0 2023-11-21 21:33:59,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2023-11-21 21:34:00,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250400 2023-11-21 21:34:07,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1669333.3333333333, ans=0.125 2023-11-21 21:34:08,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.181e+01 8.919e+01 9.368e+01 2.171e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-21 21:34:12,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1669333.3333333333, ans=0.0 2023-11-21 21:34:23,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:34:28,651 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 9950, loss[loss=0.08313, simple_loss=0.1099, pruned_loss=0.01826, audio_tagging_loss=0.009905, over 15928.00 frames. ], tot_loss[loss=0.07346, simple_loss=0.09591, pruned_loss=0.01609, audio_tagging_loss=0.009412, over 3058811.19 frames. ], batch size: 61, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:34:28,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1669466.6666666667, ans=0.0 2023-11-21 21:34:40,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-21 21:35:02,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1669600.0, ans=0.2 2023-11-21 21:35:04,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250450 2023-11-21 21:35:09,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-21 21:35:09,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1669666.6666666667, ans=0.1 2023-11-21 21:35:12,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1669666.6666666667, ans=0.125 2023-11-21 21:35:17,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1669666.6666666667, ans=0.1 2023-11-21 21:35:20,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1669733.3333333333, ans=0.2 2023-11-21 21:35:21,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1669733.3333333333, ans=0.1 2023-11-21 21:35:33,074 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10000, loss[loss=0.08984, simple_loss=0.1138, pruned_loss=0.02144, audio_tagging_loss=0.01149, over 15583.00 frames. ], tot_loss[loss=0.07358, simple_loss=0.09625, pruned_loss=0.01607, audio_tagging_loss=0.009384, over 3067316.28 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:35:51,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-21 21:36:05,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1669933.3333333333, ans=0.125 2023-11-21 21:36:09,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250500 2023-11-21 21:36:17,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.244e+01 8.958e+01 9.849e+01 1.581e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-21 21:36:37,529 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10050, loss[loss=0.07754, simple_loss=0.1065, pruned_loss=0.0153, audio_tagging_loss=0.008982, over 16116.00 frames. ], tot_loss[loss=0.0735, simple_loss=0.09605, pruned_loss=0.01617, audio_tagging_loss=0.009305, over 3062595.73 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:36:48,882 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:36:50,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1670200.0, ans=0.125 2023-11-21 21:36:55,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1670200.0, ans=0.2 2023-11-21 21:37:11,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1670266.6666666667, ans=0.05 2023-11-21 21:37:13,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250550 2023-11-21 21:37:20,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1670333.3333333333, ans=0.125 2023-11-21 21:37:33,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1670400.0, ans=0.95 2023-11-21 21:37:40,407 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10100, loss[loss=0.09296, simple_loss=0.1202, pruned_loss=0.02409, audio_tagging_loss=0.008785, over 16005.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09521, pruned_loss=0.01605, audio_tagging_loss=0.009423, over 3058088.90 frames. ], batch size: 59, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:37:47,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1670466.6666666667, ans=0.0 2023-11-21 21:38:08,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1670600.0, ans=0.125 2023-11-21 21:38:15,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1670600.0, ans=0.125 2023-11-21 21:38:16,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250600 2023-11-21 21:38:18,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1670666.6666666667, ans=0.1 2023-11-21 21:38:19,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:38:24,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1670666.6666666667, ans=0.125 2023-11-21 21:38:25,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.286e+01 8.086e+01 8.639e+01 9.260e+01 1.282e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-21 21:38:31,772 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:38:45,774 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10150, loss[loss=0.07821, simple_loss=0.09222, pruned_loss=0.02133, audio_tagging_loss=0.01077, over 14988.00 frames. ], tot_loss[loss=0.07375, simple_loss=0.0958, pruned_loss=0.01637, audio_tagging_loss=0.009483, over 3060699.91 frames. ], batch size: 56, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:39:05,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2023-11-21 21:39:14,465 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:39:19,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1670933.3333333333, ans=0.125 2023-11-21 21:39:20,735 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250650 2023-11-21 21:39:20,866 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:39:35,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1671000.0, ans=0.2 2023-11-21 21:39:43,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1671066.6666666667, ans=0.125 2023-11-21 21:39:47,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1671066.6666666667, ans=0.0 2023-11-21 21:39:49,976 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10200, loss[loss=0.08097, simple_loss=0.1002, pruned_loss=0.02103, audio_tagging_loss=0.009857, over 15085.00 frames. ], tot_loss[loss=0.07314, simple_loss=0.09519, pruned_loss=0.01611, audio_tagging_loss=0.009434, over 3059850.79 frames. ], batch size: 59, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:39:57,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1671133.3333333333, ans=0.125 2023-11-21 21:40:12,568 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:40:24,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2023-11-21 21:40:25,332 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:40:26,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250700 2023-11-21 21:40:30,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1671333.3333333333, ans=0.0 2023-11-21 21:40:35,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.067e+01 8.565e+01 9.348e+01 1.300e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-21 21:40:54,060 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10250, loss[loss=0.06586, simple_loss=0.07763, pruned_loss=0.01494, audio_tagging_loss=0.0121, over 14634.00 frames. ], tot_loss[loss=0.07359, simple_loss=0.09586, pruned_loss=0.0162, audio_tagging_loss=0.009459, over 3061577.71 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:40:54,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1671466.6666666667, ans=0.125 2023-11-21 21:41:01,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1671466.6666666667, ans=0.04949747468305833 2023-11-21 21:41:09,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2023-11-21 21:41:30,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250750 2023-11-21 21:41:35,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-21 21:41:43,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1671733.3333333333, ans=0.125 2023-11-21 21:41:57,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1671800.0, ans=0.2 2023-11-21 21:41:58,088 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10300, loss[loss=0.05498, simple_loss=0.07846, pruned_loss=0.005853, audio_tagging_loss=0.009898, over 15918.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09512, pruned_loss=0.01611, audio_tagging_loss=0.009618, over 3054241.03 frames. ], batch size: 59, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:42:27,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1671933.3333333333, ans=0.2 2023-11-21 21:42:33,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250800 2023-11-21 21:42:36,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1672000.0, ans=0.2 2023-11-21 21:42:42,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.124e+01 8.842e+01 9.369e+01 1.260e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 21:43:03,278 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10350, loss[loss=0.0906, simple_loss=0.1131, pruned_loss=0.02513, audio_tagging_loss=0.008922, over 14911.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09589, pruned_loss=0.01636, audio_tagging_loss=0.009618, over 3047503.45 frames. ], batch size: 56, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:43:09,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1672133.3333333333, ans=0.1 2023-11-21 21:43:16,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1672200.0, ans=0.125 2023-11-21 21:43:18,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2023-11-21 21:43:32,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1672266.6666666667, ans=0.2 2023-11-21 21:43:39,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250850 2023-11-21 21:44:01,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1672400.0, ans=0.0 2023-11-21 21:44:04,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1672400.0, ans=0.0 2023-11-21 21:44:07,035 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10400, loss[loss=0.05431, simple_loss=0.06919, pruned_loss=0.007434, audio_tagging_loss=0.01228, over 14405.00 frames. ], tot_loss[loss=0.07379, simple_loss=0.09548, pruned_loss=0.01638, audio_tagging_loss=0.009671, over 3039672.97 frames. ], batch size: 55, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:44:33,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1672600.0, ans=0.2 2023-11-21 21:44:33,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1672600.0, ans=0.125 2023-11-21 21:44:43,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250900 2023-11-21 21:44:48,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-11-21 21:44:52,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.219e+01 8.868e+01 9.725e+01 1.469e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 21:44:55,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1672666.6666666667, ans=0.1 2023-11-21 21:45:01,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1672733.3333333333, ans=0.0 2023-11-21 21:45:12,302 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10450, loss[loss=0.06517, simple_loss=0.08709, pruned_loss=0.0121, audio_tagging_loss=0.00952, over 16393.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.09628, pruned_loss=0.01639, audio_tagging_loss=0.009617, over 3037316.36 frames. ], batch size: 63, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:45:20,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1672800.0, ans=0.0 2023-11-21 21:45:22,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1672800.0, ans=0.025 2023-11-21 21:45:25,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-11-21 21:45:34,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1672866.6666666667, ans=0.0 2023-11-21 21:45:37,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1672933.3333333333, ans=0.0 2023-11-21 21:45:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1672933.3333333333, ans=0.2 2023-11-21 21:45:39,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-21 21:45:48,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 250950 2023-11-21 21:45:54,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1673000.0, ans=0.2 2023-11-21 21:46:09,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1673066.6666666667, ans=0.125 2023-11-21 21:46:11,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1673066.6666666667, ans=0.0 2023-11-21 21:46:14,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1673066.6666666667, ans=0.0 2023-11-21 21:46:17,689 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10500, loss[loss=0.07767, simple_loss=0.108, pruned_loss=0.01726, audio_tagging_loss=0.006405, over 15594.00 frames. ], tot_loss[loss=0.07386, simple_loss=0.09607, pruned_loss=0.01634, audio_tagging_loss=0.009485, over 3042372.61 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:46:32,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1673200.0, ans=0.0 2023-11-21 21:46:41,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1673266.6666666667, ans=0.2 2023-11-21 21:46:54,155 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251000 2023-11-21 21:47:01,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1673333.3333333333, ans=0.125 2023-11-21 21:47:05,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.294e+01 8.913e+01 9.437e+01 1.275e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-21 21:47:12,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-21 21:47:23,403 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10550, loss[loss=0.07759, simple_loss=0.1046, pruned_loss=0.01826, audio_tagging_loss=0.007026, over 15121.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.09525, pruned_loss=0.01599, audio_tagging_loss=0.009456, over 3040906.31 frames. ], batch size: 55, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:47:52,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1673600.0, ans=0.2 2023-11-21 21:47:59,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1673600.0, ans=0.125 2023-11-21 21:48:00,772 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251050 2023-11-21 21:48:00,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1673600.0, ans=0.1 2023-11-21 21:48:02,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-21 21:48:08,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=12.0 2023-11-21 21:48:09,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1673666.6666666667, ans=0.125 2023-11-21 21:48:27,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1673800.0, ans=0.0 2023-11-21 21:48:28,707 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10600, loss[loss=0.09073, simple_loss=0.1246, pruned_loss=0.02082, audio_tagging_loss=0.007615, over 14433.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09472, pruned_loss=0.01583, audio_tagging_loss=0.009516, over 3046201.72 frames. ], batch size: 53, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:48:34,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1673800.0, ans=0.125 2023-11-21 21:48:55,442 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.330e-02 2023-11-21 21:48:57,971 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.565e-02 2023-11-21 21:49:05,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251100 2023-11-21 21:49:15,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.229e+01 8.713e+01 9.313e+01 1.246e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-21 21:49:27,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1674066.6666666667, ans=0.0 2023-11-21 21:49:33,275 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10650, loss[loss=0.09082, simple_loss=0.1254, pruned_loss=0.02145, audio_tagging_loss=0.00666, over 15971.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09516, pruned_loss=0.01601, audio_tagging_loss=0.009442, over 3049098.56 frames. ], batch size: 56, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:49:46,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1674200.0, ans=0.125 2023-11-21 21:50:02,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1674266.6666666667, ans=0.125 2023-11-21 21:50:04,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1674266.6666666667, ans=0.125 2023-11-21 21:50:10,216 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251150 2023-11-21 21:50:29,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1674400.0, ans=0.0 2023-11-21 21:50:29,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-21 21:50:38,630 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10700, loss[loss=0.06323, simple_loss=0.07854, pruned_loss=0.01018, audio_tagging_loss=0.01378, over 14808.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.09458, pruned_loss=0.01586, audio_tagging_loss=0.009497, over 3042741.65 frames. ], batch size: 55, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:50:58,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1674533.3333333333, ans=0.1 2023-11-21 21:51:02,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1674533.3333333333, ans=0.125 2023-11-21 21:51:14,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251200 2023-11-21 21:51:25,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.625e+01 8.099e+01 8.796e+01 9.421e+01 3.240e+02, threshold=1.759e+02, percent-clipped=1.0 2023-11-21 21:51:35,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1674733.3333333333, ans=0.0 2023-11-21 21:51:35,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2023-11-21 21:51:36,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1674733.3333333333, ans=0.0 2023-11-21 21:51:43,386 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10750, loss[loss=0.06932, simple_loss=0.09266, pruned_loss=0.01417, audio_tagging_loss=0.008816, over 16074.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09328, pruned_loss=0.0156, audio_tagging_loss=0.009488, over 3044823.23 frames. ], batch size: 59, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:51:43,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1674800.0, ans=0.125 2023-11-21 21:52:02,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1674866.6666666667, ans=0.07 2023-11-21 21:52:19,730 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251250 2023-11-21 21:52:29,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1675000.0, ans=0.125 2023-11-21 21:52:29,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1675000.0, ans=0.0 2023-11-21 21:52:38,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1675066.6666666667, ans=15.0 2023-11-21 21:52:47,602 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10800, loss[loss=0.06457, simple_loss=0.07071, pruned_loss=0.01673, audio_tagging_loss=0.01248, over 15306.00 frames. ], tot_loss[loss=0.07196, simple_loss=0.0939, pruned_loss=0.01566, audio_tagging_loss=0.009353, over 3052523.39 frames. ], batch size: 56, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:52:47,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1675133.3333333333, ans=0.125 2023-11-21 21:53:09,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1675200.0, ans=0.125 2023-11-21 21:53:11,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2023-11-21 21:53:24,273 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251300 2023-11-21 21:53:24,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1675266.6666666667, ans=0.0 2023-11-21 21:53:34,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.266e+01 8.735e+01 9.337e+01 1.123e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-21 21:53:43,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1675400.0, ans=0.0 2023-11-21 21:53:46,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1675400.0, ans=0.125 2023-11-21 21:53:52,955 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10850, loss[loss=0.06927, simple_loss=0.08717, pruned_loss=0.01565, audio_tagging_loss=0.01004, over 15023.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.0947, pruned_loss=0.01592, audio_tagging_loss=0.009417, over 3049106.16 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:54:19,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1675600.0, ans=0.0 2023-11-21 21:54:19,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1675600.0, ans=0.125 2023-11-21 21:54:23,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1675600.0, ans=0.0 2023-11-21 21:54:28,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251350 2023-11-21 21:54:34,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1675666.6666666667, ans=0.2 2023-11-21 21:54:41,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2023-11-21 21:54:52,751 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:54:56,559 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10900, loss[loss=0.07934, simple_loss=0.1018, pruned_loss=0.01612, audio_tagging_loss=0.01233, over 13879.00 frames. ], tot_loss[loss=0.0728, simple_loss=0.09484, pruned_loss=0.01593, audio_tagging_loss=0.009447, over 3045614.15 frames. ], batch size: 57, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:54:59,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1675800.0, ans=0.0 2023-11-21 21:55:01,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1675800.0, ans=0.1 2023-11-21 21:55:02,875 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:55:33,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251400 2023-11-21 21:55:38,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-21 21:55:43,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.153e+01 8.721e+01 9.251e+01 1.356e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-21 21:55:56,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2023-11-21 21:55:58,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1676066.6666666667, ans=0.1 2023-11-21 21:56:00,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1676066.6666666667, ans=0.0 2023-11-21 21:56:02,092 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 10950, loss[loss=0.06952, simple_loss=0.09132, pruned_loss=0.01238, audio_tagging_loss=0.01148, over 15411.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.09469, pruned_loss=0.01592, audio_tagging_loss=0.009417, over 3046332.05 frames. ], batch size: 61, lr: 3.23e-03, grad_scale: 32.0 2023-11-21 21:56:11,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1676133.3333333333, ans=0.125 2023-11-21 21:56:16,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1676200.0, ans=0.0 2023-11-21 21:56:30,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-11-21 21:56:37,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251450 2023-11-21 21:56:38,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1676266.6666666667, ans=0.125 2023-11-21 21:56:53,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1676400.0, ans=0.1 2023-11-21 21:56:56,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1676400.0, ans=0.125 2023-11-21 21:56:57,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1676400.0, ans=0.125 2023-11-21 21:56:59,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1676400.0, ans=0.125 2023-11-21 21:57:06,332 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11000, loss[loss=0.05731, simple_loss=0.07037, pruned_loss=0.0115, audio_tagging_loss=0.01062, over 15193.00 frames. ], tot_loss[loss=0.07315, simple_loss=0.09518, pruned_loss=0.01605, audio_tagging_loss=0.009507, over 3047322.45 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:57:12,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-21 21:57:16,901 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 21:57:23,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1676533.3333333333, ans=0.2 2023-11-21 21:57:28,051 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:57:30,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-21 21:57:38,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1676600.0, ans=0.125 2023-11-21 21:57:41,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251500 2023-11-21 21:57:53,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.207e+01 8.781e+01 9.534e+01 1.219e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-21 21:58:09,759 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11050, loss[loss=0.08205, simple_loss=0.1057, pruned_loss=0.01844, audio_tagging_loss=0.01078, over 15417.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09493, pruned_loss=0.01604, audio_tagging_loss=0.009676, over 3047218.55 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:58:14,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1676800.0, ans=0.0 2023-11-21 21:58:24,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 21:58:43,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1676933.3333333333, ans=0.125 2023-11-21 21:58:45,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251550 2023-11-21 21:59:11,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1677066.6666666667, ans=0.125 2023-11-21 21:59:14,241 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11100, loss[loss=0.07855, simple_loss=0.1038, pruned_loss=0.01496, audio_tagging_loss=0.01166, over 15860.00 frames. ], tot_loss[loss=0.07359, simple_loss=0.09538, pruned_loss=0.01622, audio_tagging_loss=0.009681, over 3052876.96 frames. ], batch size: 59, lr: 3.23e-03, grad_scale: 16.0 2023-11-21 21:59:15,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1677133.3333333333, ans=0.1 2023-11-21 21:59:17,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1677133.3333333333, ans=0.0 2023-11-21 21:59:20,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1677133.3333333333, ans=0.1 2023-11-21 21:59:31,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1677200.0, ans=0.5 2023-11-21 21:59:34,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1677200.0, ans=0.2 2023-11-21 21:59:41,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1677266.6666666667, ans=0.0 2023-11-21 21:59:50,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251600 2023-11-21 22:00:02,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.524e+01 9.104e+01 1.000e+02 2.938e+02, threshold=1.821e+02, percent-clipped=1.0 2023-11-21 22:00:18,947 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11150, loss[loss=0.06974, simple_loss=0.09289, pruned_loss=0.0161, audio_tagging_loss=0.007189, over 15539.00 frames. ], tot_loss[loss=0.07399, simple_loss=0.09606, pruned_loss=0.0163, audio_tagging_loss=0.009652, over 3056547.49 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 16.0 2023-11-21 22:00:24,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=12.0 2023-11-21 22:00:38,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1677533.3333333333, ans=0.125 2023-11-21 22:00:40,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1677533.3333333333, ans=0.125 2023-11-21 22:00:55,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251650 2023-11-21 22:00:56,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1677666.6666666667, ans=0.0 2023-11-21 22:01:02,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1677666.6666666667, ans=0.125 2023-11-21 22:01:08,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1677666.6666666667, ans=0.125 2023-11-21 22:01:16,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1677733.3333333333, ans=0.125 2023-11-21 22:01:22,747 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11200, loss[loss=0.06992, simple_loss=0.09218, pruned_loss=0.01304, audio_tagging_loss=0.01079, over 14899.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.09511, pruned_loss=0.01602, audio_tagging_loss=0.009734, over 3054394.57 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:01:51,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1677933.3333333333, ans=0.0 2023-11-21 22:01:59,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251700 2023-11-21 22:02:10,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-21 22:02:10,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.694e+01 7.989e+01 8.616e+01 9.280e+01 1.646e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-21 22:02:28,508 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11250, loss[loss=0.05608, simple_loss=0.06793, pruned_loss=0.01056, audio_tagging_loss=0.01155, over 14723.00 frames. ], tot_loss[loss=0.07295, simple_loss=0.09461, pruned_loss=0.01589, audio_tagging_loss=0.009757, over 3047431.80 frames. ], batch size: 60, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:02:55,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1678266.6666666667, ans=0.0 2023-11-21 22:02:58,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1678266.6666666667, ans=0.1 2023-11-21 22:03:01,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1678266.6666666667, ans=0.125 2023-11-21 22:03:03,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251750 2023-11-21 22:03:03,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-21 22:03:13,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1678333.3333333333, ans=0.2 2023-11-21 22:03:21,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1678400.0, ans=0.05 2023-11-21 22:03:23,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1678400.0, ans=0.125 2023-11-21 22:03:24,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-21 22:03:31,990 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11300, loss[loss=0.0895, simple_loss=0.1138, pruned_loss=0.02219, audio_tagging_loss=0.01042, over 15178.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09454, pruned_loss=0.01583, audio_tagging_loss=0.009558, over 3046341.13 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:03:42,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1678466.6666666667, ans=0.125 2023-11-21 22:03:54,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1678533.3333333333, ans=0.2 2023-11-21 22:04:08,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251800 2023-11-21 22:04:19,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.032e+01 8.626e+01 9.245e+01 1.282e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-21 22:04:33,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=22.5 2023-11-21 22:04:36,718 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11350, loss[loss=0.07028, simple_loss=0.08427, pruned_loss=0.01643, audio_tagging_loss=0.01172, over 15206.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09477, pruned_loss=0.01596, audio_tagging_loss=0.009412, over 3049437.41 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:04:56,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1678866.6666666667, ans=0.0 2023-11-21 22:05:03,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1678933.3333333333, ans=0.1 2023-11-21 22:05:12,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251850 2023-11-21 22:05:26,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1679066.6666666667, ans=0.0 2023-11-21 22:05:34,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2023-11-21 22:05:40,496 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11400, loss[loss=0.09223, simple_loss=0.1236, pruned_loss=0.02145, audio_tagging_loss=0.008992, over 15619.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09517, pruned_loss=0.01614, audio_tagging_loss=0.009357, over 3049067.77 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:05:40,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1679133.3333333333, ans=0.0 2023-11-21 22:05:42,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-21 22:05:45,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1679133.3333333333, ans=0.0 2023-11-21 22:06:13,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1679266.6666666667, ans=0.09899494936611666 2023-11-21 22:06:15,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251900 2023-11-21 22:06:15,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1679266.6666666667, ans=0.0 2023-11-21 22:06:27,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.296e+01 8.125e+01 8.630e+01 9.501e+01 1.337e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-21 22:06:42,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2023-11-21 22:06:44,333 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11450, loss[loss=0.07754, simple_loss=0.09648, pruned_loss=0.0187, audio_tagging_loss=0.0106, over 15488.00 frames. ], tot_loss[loss=0.0734, simple_loss=0.09566, pruned_loss=0.01632, audio_tagging_loss=0.009256, over 3044103.35 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:06:54,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1679466.6666666667, ans=0.0 2023-11-21 22:07:05,811 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:07:09,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1679600.0, ans=0.125 2023-11-21 22:07:21,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 251950 2023-11-21 22:07:21,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1679600.0, ans=0.95 2023-11-21 22:07:46,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1679733.3333333333, ans=0.2 2023-11-21 22:07:48,705 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11500, loss[loss=0.07865, simple_loss=0.1059, pruned_loss=0.0171, audio_tagging_loss=0.008613, over 15002.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.09558, pruned_loss=0.01609, audio_tagging_loss=0.009326, over 3050941.39 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:08:25,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252000 2023-11-21 22:08:31,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-21 22:08:39,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 7.996e+01 8.554e+01 9.237e+01 1.174e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 22:08:40,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1680000.0, ans=15.0 2023-11-21 22:08:44,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2023-11-21 22:08:56,376 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11550, loss[loss=0.06519, simple_loss=0.07618, pruned_loss=0.01729, audio_tagging_loss=0.009815, over 14840.00 frames. ], tot_loss[loss=0.07268, simple_loss=0.09482, pruned_loss=0.01587, audio_tagging_loss=0.009393, over 3047267.57 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:08:59,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1680133.3333333333, ans=0.1 2023-11-21 22:09:05,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1680133.3333333333, ans=0.1 2023-11-21 22:09:09,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1680200.0, ans=0.125 2023-11-21 22:09:31,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252050 2023-11-21 22:09:34,292 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 22:09:42,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-21 22:09:58,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1680400.0, ans=0.125 2023-11-21 22:10:00,335 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11600, loss[loss=0.06536, simple_loss=0.08839, pruned_loss=0.01206, audio_tagging_loss=0.009109, over 15610.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09453, pruned_loss=0.0158, audio_tagging_loss=0.009483, over 3043129.25 frames. ], batch size: 61, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:10:19,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1680533.3333333333, ans=0.125 2023-11-21 22:10:36,275 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252100 2023-11-21 22:10:47,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.258e+01 8.947e+01 9.707e+01 1.482e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-21 22:10:51,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1680733.3333333333, ans=0.125 2023-11-21 22:11:04,539 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11650, loss[loss=0.09153, simple_loss=0.1201, pruned_loss=0.02319, audio_tagging_loss=0.008312, over 14815.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.0945, pruned_loss=0.01589, audio_tagging_loss=0.009546, over 3040381.63 frames. ], batch size: 54, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:11:12,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1680800.0, ans=0.2 2023-11-21 22:11:29,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-21 22:11:40,815 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252150 2023-11-21 22:11:45,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1681000.0, ans=0.2 2023-11-21 22:11:50,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1681000.0, ans=0.0 2023-11-21 22:12:07,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1681133.3333333333, ans=0.0 2023-11-21 22:12:08,141 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11700, loss[loss=0.06388, simple_loss=0.09087, pruned_loss=0.01047, audio_tagging_loss=0.007977, over 15422.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09449, pruned_loss=0.01581, audio_tagging_loss=0.009637, over 3036226.54 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:12:31,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1681200.0, ans=0.0 2023-11-21 22:12:32,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1681200.0, ans=0.1 2023-11-21 22:12:39,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1681266.6666666667, ans=0.0 2023-11-21 22:12:45,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252200 2023-11-21 22:12:56,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.227e+01 8.797e+01 9.747e+01 1.315e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-21 22:13:14,115 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11750, loss[loss=0.07568, simple_loss=0.1067, pruned_loss=0.01576, audio_tagging_loss=0.006541, over 14250.00 frames. ], tot_loss[loss=0.0728, simple_loss=0.09464, pruned_loss=0.01583, audio_tagging_loss=0.009652, over 3040971.94 frames. ], batch size: 52, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:13:30,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1681533.3333333333, ans=0.125 2023-11-21 22:13:30,987 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:13:50,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252250 2023-11-21 22:13:52,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1681666.6666666667, ans=0.125 2023-11-21 22:13:57,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1681666.6666666667, ans=0.125 2023-11-21 22:13:57,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1681666.6666666667, ans=0.125 2023-11-21 22:14:00,306 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:14:05,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1681733.3333333333, ans=0.125 2023-11-21 22:14:14,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2023-11-21 22:14:15,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1681733.3333333333, ans=0.0 2023-11-21 22:14:15,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1681733.3333333333, ans=0.125 2023-11-21 22:14:19,029 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11800, loss[loss=0.06403, simple_loss=0.08199, pruned_loss=0.01316, audio_tagging_loss=0.009874, over 15007.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09457, pruned_loss=0.01588, audio_tagging_loss=0.009644, over 3047743.61 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 16.0 2023-11-21 22:14:24,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1681800.0, ans=0.125 2023-11-21 22:14:25,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1681800.0, ans=0.2 2023-11-21 22:14:50,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1681933.3333333333, ans=0.125 2023-11-21 22:14:54,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252300 2023-11-21 22:14:57,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-11-21 22:14:58,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2023-11-21 22:15:03,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=22.5 2023-11-21 22:15:06,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1682000.0, ans=0.1 2023-11-21 22:15:07,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 7.961e+01 8.555e+01 9.322e+01 1.275e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 22:15:22,861 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11850, loss[loss=0.1031, simple_loss=0.1291, pruned_loss=0.03104, audio_tagging_loss=0.007534, over 14559.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.09417, pruned_loss=0.01588, audio_tagging_loss=0.009684, over 3047697.67 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 16.0 2023-11-21 22:15:33,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1682133.3333333333, ans=0.0 2023-11-21 22:15:51,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:15:59,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1682266.6666666667, ans=0.0 2023-11-21 22:16:00,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252350 2023-11-21 22:16:00,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1682266.6666666667, ans=0.0 2023-11-21 22:16:02,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=15.0 2023-11-21 22:16:27,840 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11900, loss[loss=0.06521, simple_loss=0.08951, pruned_loss=0.01055, audio_tagging_loss=0.009909, over 15841.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09523, pruned_loss=0.01599, audio_tagging_loss=0.009688, over 3046375.54 frames. ], batch size: 60, lr: 3.22e-03, grad_scale: 16.0 2023-11-21 22:16:34,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1682466.6666666667, ans=0.125 2023-11-21 22:16:37,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1682466.6666666667, ans=0.2 2023-11-21 22:17:04,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252400 2023-11-21 22:17:17,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.240e+01 7.885e+01 8.457e+01 9.193e+01 1.226e+02, threshold=1.691e+02, percent-clipped=0.0 2023-11-21 22:17:31,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1682733.3333333333, ans=0.0 2023-11-21 22:17:33,583 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 11950, loss[loss=0.078, simple_loss=0.1035, pruned_loss=0.01851, audio_tagging_loss=0.007747, over 14593.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.0947, pruned_loss=0.01587, audio_tagging_loss=0.009848, over 3040828.94 frames. ], batch size: 54, lr: 3.22e-03, grad_scale: 16.0 2023-11-21 22:18:09,260 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252450 2023-11-21 22:18:14,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1683000.0, ans=0.125 2023-11-21 22:18:14,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-21 22:18:25,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1683066.6666666667, ans=0.2 2023-11-21 22:18:26,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1683066.6666666667, ans=0.125 2023-11-21 22:18:36,446 INFO [train_asr.py:1221] (1/4) Epoch 21, batch 12000, loss[loss=0.06734, simple_loss=0.0853, pruned_loss=0.016, audio_tagging_loss=0.008694, over 15435.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.09482, pruned_loss=0.01586, audio_tagging_loss=0.009864, over 3046871.28 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 32.0 2023-11-21 22:18:36,446 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 22:19:19,206 INFO [train_asr.py:1253] (1/4) Epoch 21, validation: loss=0.05938, simple_loss=0.05195, pruned_loss=0.005201, audio_tagging_loss=0.02821, over 4681554.00 frames. 2023-11-21 22:19:19,207 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 22:19:27,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683133.3333333333, ans=0.1 2023-11-21 22:19:33,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1683200.0, ans=0.125 2023-11-21 22:19:39,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1683200.0, ans=0.125 2023-11-21 22:20:22,654 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 0, loss[loss=0.07435, simple_loss=0.08495, pruned_loss=0.01086, audio_tagging_loss=0.02102, over 15713.00 frames. ], tot_loss[loss=0.07435, simple_loss=0.08495, pruned_loss=0.01086, audio_tagging_loss=0.02102, over 15713.00 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:20:22,655 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 22:20:59,031 INFO [train_asr.py:1253] (1/4) Epoch 22, validation: loss=0.05904, simple_loss=0.0519, pruned_loss=0.0051, audio_tagging_loss=0.02799, over 4681554.00 frames. 2023-11-21 22:20:59,032 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 22:21:03,952 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252500 2023-11-21 22:21:10,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1683360.0, ans=0.05 2023-11-21 22:21:16,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.182e+01 9.159e+01 9.846e+01 1.292e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-21 22:21:25,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683426.6666666667, ans=0.1 2023-11-21 22:21:26,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1683426.6666666667, ans=0.125 2023-11-21 22:21:31,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2023-11-21 22:21:33,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1683426.6666666667, ans=0.04949747468305833 2023-11-21 22:22:02,744 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 50, loss[loss=0.0882, simple_loss=0.1121, pruned_loss=0.01661, audio_tagging_loss=0.01553, over 14779.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.09714, pruned_loss=0.01603, audio_tagging_loss=0.01776, over 692664.71 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:22:07,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252550 2023-11-21 22:22:35,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-21 22:23:04,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2023-11-21 22:23:05,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1683893.3333333333, ans=0.1 2023-11-21 22:23:05,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683893.3333333333, ans=0.1 2023-11-21 22:23:07,450 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 100, loss[loss=0.1073, simple_loss=0.1389, pruned_loss=0.0273, audio_tagging_loss=0.01055, over 14624.00 frames. ], tot_loss[loss=0.0826, simple_loss=0.09799, pruned_loss=0.01648, audio_tagging_loss=0.01713, over 1214654.34 frames. ], batch size: 53, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:23:12,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252600 2023-11-21 22:23:16,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1683960.0, ans=10.0 2023-11-21 22:23:25,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.773e+01 9.405e+01 1.016e+02 1.413e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-21 22:23:25,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1684026.6666666667, ans=0.125 2023-11-21 22:23:42,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1684093.3333333333, ans=0.0 2023-11-21 22:23:43,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1684093.3333333333, ans=0.125 2023-11-21 22:23:43,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2023-11-21 22:23:49,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1684160.0, ans=0.125 2023-11-21 22:23:49,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1684160.0, ans=0.0 2023-11-21 22:23:50,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-21 22:24:11,238 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 150, loss[loss=0.08084, simple_loss=0.1052, pruned_loss=0.01599, audio_tagging_loss=0.01224, over 15236.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.09937, pruned_loss=0.01684, audio_tagging_loss=0.01503, over 1621983.81 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:24:16,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252650 2023-11-21 22:24:29,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1684360.0, ans=0.125 2023-11-21 22:24:42,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1684426.6666666667, ans=0.0 2023-11-21 22:25:15,869 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 200, loss[loss=0.05464, simple_loss=0.07286, pruned_loss=0.007933, audio_tagging_loss=0.01028, over 15583.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.09821, pruned_loss=0.01644, audio_tagging_loss=0.01329, over 1934896.75 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:25:20,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252700 2023-11-21 22:25:24,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1684626.6666666667, ans=0.09899494936611666 2023-11-21 22:25:30,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1684693.3333333333, ans=0.125 2023-11-21 22:25:34,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.318e+01 8.693e+01 9.716e+01 1.201e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 22:25:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1684693.3333333333, ans=0.2 2023-11-21 22:25:49,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1684760.0, ans=0.2 2023-11-21 22:26:01,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1684826.6666666667, ans=0.04949747468305833 2023-11-21 22:26:09,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-21 22:26:15,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-21 22:26:21,131 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 250, loss[loss=0.05999, simple_loss=0.0831, pruned_loss=0.01038, audio_tagging_loss=0.008063, over 15906.00 frames. ], tot_loss[loss=0.07693, simple_loss=0.09703, pruned_loss=0.01628, audio_tagging_loss=0.01214, over 2181325.37 frames. ], batch size: 60, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:26:26,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252750 2023-11-21 22:26:57,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1685093.3333333333, ans=0.125 2023-11-21 22:27:10,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-21 22:27:15,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1685226.6666666667, ans=0.1 2023-11-21 22:27:24,805 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 300, loss[loss=0.08453, simple_loss=0.1199, pruned_loss=0.01637, audio_tagging_loss=0.008211, over 16286.00 frames. ], tot_loss[loss=0.07627, simple_loss=0.09755, pruned_loss=0.01616, audio_tagging_loss=0.01134, over 2378927.18 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:27:30,629 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252800 2023-11-21 22:27:43,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.749e+01 8.171e+01 8.868e+01 9.880e+01 1.210e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 22:27:59,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1685426.6666666667, ans=0.125 2023-11-21 22:28:05,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1685493.3333333333, ans=0.2 2023-11-21 22:28:06,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1685493.3333333333, ans=0.2 2023-11-21 22:28:22,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1685560.0, ans=0.1 2023-11-21 22:28:30,573 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 350, loss[loss=0.081, simple_loss=0.105, pruned_loss=0.01802, audio_tagging_loss=0.01047, over 15604.00 frames. ], tot_loss[loss=0.07593, simple_loss=0.09724, pruned_loss=0.01647, audio_tagging_loss=0.01084, over 2525836.01 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:28:36,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252850 2023-11-21 22:28:36,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1685626.6666666667, ans=0.125 2023-11-21 22:28:51,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-21 22:28:54,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1685693.3333333333, ans=0.0 2023-11-21 22:29:12,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1685826.6666666667, ans=0.2 2023-11-21 22:29:27,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1685893.3333333333, ans=10.0 2023-11-21 22:29:36,490 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 400, loss[loss=0.04636, simple_loss=0.05463, pruned_loss=0.006896, audio_tagging_loss=0.01215, over 14517.00 frames. ], tot_loss[loss=0.07568, simple_loss=0.09751, pruned_loss=0.01651, audio_tagging_loss=0.01042, over 2648223.57 frames. ], batch size: 59, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:29:42,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252900 2023-11-21 22:29:45,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1685960.0, ans=0.0 2023-11-21 22:29:49,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1686026.6666666667, ans=0.0 2023-11-21 22:29:55,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.135e+01 8.839e+01 9.372e+01 1.171e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 22:29:59,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1686026.6666666667, ans=0.125 2023-11-21 22:30:09,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1686093.3333333333, ans=0.09899494936611666 2023-11-21 22:30:13,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1686093.3333333333, ans=0.125 2023-11-21 22:30:23,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-21 22:30:42,530 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 450, loss[loss=0.06638, simple_loss=0.089, pruned_loss=0.01073, audio_tagging_loss=0.01115, over 14691.00 frames. ], tot_loss[loss=0.07502, simple_loss=0.09707, pruned_loss=0.0163, audio_tagging_loss=0.01018, over 2735419.06 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:30:48,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 252950 2023-11-21 22:30:56,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1686360.0, ans=0.2 2023-11-21 22:31:31,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1686493.3333333333, ans=0.125 2023-11-21 22:31:32,981 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:31:48,764 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 500, loss[loss=0.08654, simple_loss=0.1075, pruned_loss=0.02121, audio_tagging_loss=0.01157, over 14143.00 frames. ], tot_loss[loss=0.07457, simple_loss=0.09657, pruned_loss=0.01625, audio_tagging_loss=0.01004, over 2806572.25 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:31:50,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-11-21 22:31:53,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253000 2023-11-21 22:31:59,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1686626.6666666667, ans=0.0 2023-11-21 22:32:04,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1686693.3333333333, ans=0.125 2023-11-21 22:32:06,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.743e+01 8.271e+01 8.753e+01 9.475e+01 1.190e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 22:32:23,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-11-21 22:32:39,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-21 22:32:47,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1686893.3333333333, ans=0.2 2023-11-21 22:32:48,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1686893.3333333333, ans=0.0 2023-11-21 22:32:52,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1686893.3333333333, ans=0.125 2023-11-21 22:32:53,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-21 22:32:54,282 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 550, loss[loss=0.07566, simple_loss=0.09764, pruned_loss=0.01905, audio_tagging_loss=0.007794, over 14662.00 frames. ], tot_loss[loss=0.07436, simple_loss=0.09628, pruned_loss=0.01633, audio_tagging_loss=0.009885, over 2859035.08 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:33:00,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253050 2023-11-21 22:33:10,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2023-11-21 22:33:22,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2023-11-21 22:34:00,164 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 600, loss[loss=0.06989, simple_loss=0.09922, pruned_loss=0.01314, audio_tagging_loss=0.007138, over 15705.00 frames. ], tot_loss[loss=0.074, simple_loss=0.09588, pruned_loss=0.01624, audio_tagging_loss=0.009821, over 2901358.53 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:34:05,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253100 2023-11-21 22:34:15,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2023-11-21 22:34:18,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.004e+01 8.758e+01 9.434e+01 1.366e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-21 22:34:22,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-21 22:34:24,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1687426.6666666667, ans=0.95 2023-11-21 22:34:29,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1687426.6666666667, ans=0.125 2023-11-21 22:34:35,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1687426.6666666667, ans=0.125 2023-11-21 22:34:38,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1687493.3333333333, ans=0.1 2023-11-21 22:34:42,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687493.3333333333, ans=0.1 2023-11-21 22:34:48,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1687493.3333333333, ans=15.0 2023-11-21 22:35:05,437 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 650, loss[loss=0.08287, simple_loss=0.1095, pruned_loss=0.02074, audio_tagging_loss=0.007387, over 15196.00 frames. ], tot_loss[loss=0.07409, simple_loss=0.09582, pruned_loss=0.01639, audio_tagging_loss=0.009795, over 2932726.42 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:35:08,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-11-21 22:35:10,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253150 2023-11-21 22:35:16,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1687693.3333333333, ans=0.125 2023-11-21 22:35:57,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1687893.3333333333, ans=0.0 2023-11-21 22:36:01,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2023-11-21 22:36:09,019 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 700, loss[loss=0.07633, simple_loss=0.1048, pruned_loss=0.01405, audio_tagging_loss=0.009873, over 15135.00 frames. ], tot_loss[loss=0.07471, simple_loss=0.09713, pruned_loss=0.01643, audio_tagging_loss=0.009712, over 2957688.16 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 16.0 2023-11-21 22:36:13,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253200 2023-11-21 22:36:19,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-21 22:36:28,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.561e+01 8.099e+01 8.868e+01 9.467e+01 1.457e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-21 22:36:39,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1688093.3333333333, ans=0.125 2023-11-21 22:36:41,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1688093.3333333333, ans=0.0 2023-11-21 22:36:45,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1688093.3333333333, ans=0.09899494936611666 2023-11-21 22:36:51,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-21 22:36:51,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-21 22:37:13,515 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 750, loss[loss=0.1012, simple_loss=0.1338, pruned_loss=0.02477, audio_tagging_loss=0.009558, over 15222.00 frames. ], tot_loss[loss=0.07554, simple_loss=0.09815, pruned_loss=0.01671, audio_tagging_loss=0.00976, over 2980596.45 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 16.0 2023-11-21 22:37:14,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-21 22:37:18,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253250 2023-11-21 22:37:21,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1688293.3333333333, ans=0.0 2023-11-21 22:37:41,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-21 22:37:46,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1688426.6666666667, ans=0.0 2023-11-21 22:37:50,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1688493.3333333333, ans=0.0 2023-11-21 22:37:52,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1688493.3333333333, ans=0.0 2023-11-21 22:38:02,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1688493.3333333333, ans=0.05 2023-11-21 22:38:14,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1688560.0, ans=0.035 2023-11-21 22:38:17,555 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 800, loss[loss=0.07076, simple_loss=0.09047, pruned_loss=0.01727, audio_tagging_loss=0.008252, over 15053.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.0983, pruned_loss=0.01671, audio_tagging_loss=0.009697, over 2994084.62 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:38:19,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1688626.6666666667, ans=0.1 2023-11-21 22:38:20,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1688626.6666666667, ans=0.125 2023-11-21 22:38:22,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253300 2023-11-21 22:38:36,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.171e+01 8.697e+01 9.274e+01 1.215e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-21 22:38:46,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-21 22:38:55,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-21 22:39:03,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2023-11-21 22:39:20,972 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 850, loss[loss=0.07696, simple_loss=0.09173, pruned_loss=0.02084, audio_tagging_loss=0.01025, over 14942.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09743, pruned_loss=0.01645, audio_tagging_loss=0.009721, over 3011835.23 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:39:25,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253350 2023-11-21 22:39:26,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-21 22:40:04,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1689160.0, ans=0.0 2023-11-21 22:40:10,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1689226.6666666667, ans=0.125 2023-11-21 22:40:17,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1689226.6666666667, ans=0.1 2023-11-21 22:40:20,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-11-21 22:40:24,928 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 900, loss[loss=0.06879, simple_loss=0.09372, pruned_loss=0.01577, audio_tagging_loss=0.006162, over 16421.00 frames. ], tot_loss[loss=0.07493, simple_loss=0.09757, pruned_loss=0.01649, audio_tagging_loss=0.009652, over 3018058.63 frames. ], batch size: 62, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:40:30,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253400 2023-11-21 22:40:44,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.175e+01 9.057e+01 9.659e+01 1.593e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-21 22:40:56,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1689426.6666666667, ans=0.2 2023-11-21 22:41:00,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1689426.6666666667, ans=0.0 2023-11-21 22:41:01,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=22.5 2023-11-21 22:41:06,262 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:41:30,289 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 950, loss[loss=0.08211, simple_loss=0.1017, pruned_loss=0.02204, audio_tagging_loss=0.009219, over 16571.00 frames. ], tot_loss[loss=0.07508, simple_loss=0.09788, pruned_loss=0.01657, audio_tagging_loss=0.009563, over 3020625.13 frames. ], batch size: 65, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:41:35,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253450 2023-11-21 22:41:47,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1689693.3333333333, ans=0.2 2023-11-21 22:41:50,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1689693.3333333333, ans=0.1 2023-11-21 22:42:02,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1689760.0, ans=0.04949747468305833 2023-11-21 22:42:05,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1689760.0, ans=0.0 2023-11-21 22:42:17,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1689826.6666666667, ans=0.0 2023-11-21 22:42:19,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-21 22:42:24,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1689893.3333333333, ans=0.125 2023-11-21 22:42:32,964 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1000, loss[loss=0.08324, simple_loss=0.1041, pruned_loss=0.02204, audio_tagging_loss=0.009164, over 14336.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09678, pruned_loss=0.0163, audio_tagging_loss=0.009431, over 3021093.37 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:42:37,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253500 2023-11-21 22:42:41,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1689960.0, ans=0.0 2023-11-21 22:42:42,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1689960.0, ans=0.0 2023-11-21 22:42:48,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1690026.6666666667, ans=0.125 2023-11-21 22:42:51,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.522e+01 8.090e+01 8.690e+01 9.714e+01 1.265e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 22:42:53,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1690026.6666666667, ans=0.125 2023-11-21 22:42:56,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1690026.6666666667, ans=0.2 2023-11-21 22:42:59,988 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 22:43:00,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1690093.3333333333, ans=0.0 2023-11-21 22:43:37,585 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1050, loss[loss=0.07565, simple_loss=0.09623, pruned_loss=0.01733, audio_tagging_loss=0.0102, over 15327.00 frames. ], tot_loss[loss=0.07375, simple_loss=0.09618, pruned_loss=0.01622, audio_tagging_loss=0.009447, over 3031813.68 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:43:37,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1690293.3333333333, ans=0.125 2023-11-21 22:43:41,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690293.3333333333, ans=0.1 2023-11-21 22:43:42,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253550 2023-11-21 22:43:43,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1690293.3333333333, ans=10.0 2023-11-21 22:44:05,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1690426.6666666667, ans=0.2 2023-11-21 22:44:12,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1690426.6666666667, ans=0.125 2023-11-21 22:44:14,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1690426.6666666667, ans=0.02 2023-11-21 22:44:14,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1690426.6666666667, ans=0.125 2023-11-21 22:44:17,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1690493.3333333333, ans=0.0 2023-11-21 22:44:17,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1690493.3333333333, ans=0.125 2023-11-21 22:44:19,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1690493.3333333333, ans=0.0 2023-11-21 22:44:40,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1690560.0, ans=0.125 2023-11-21 22:44:42,861 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1100, loss[loss=0.07638, simple_loss=0.09289, pruned_loss=0.01712, audio_tagging_loss=0.01281, over 14796.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09541, pruned_loss=0.01608, audio_tagging_loss=0.009429, over 3033361.64 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 32.0 2023-11-21 22:44:45,360 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 22:44:46,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1690626.6666666667, ans=0.125 2023-11-21 22:44:47,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253600 2023-11-21 22:45:01,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.128e+01 8.851e+01 9.302e+01 1.185e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-21 22:45:23,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1690826.6666666667, ans=0.04949747468305833 2023-11-21 22:45:26,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1690826.6666666667, ans=0.0 2023-11-21 22:45:36,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1690893.3333333333, ans=0.125 2023-11-21 22:45:43,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690893.3333333333, ans=0.1 2023-11-21 22:45:46,079 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1150, loss[loss=0.07422, simple_loss=0.1011, pruned_loss=0.01655, audio_tagging_loss=0.007117, over 15835.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09494, pruned_loss=0.01597, audio_tagging_loss=0.009384, over 3038070.14 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 8.0 2023-11-21 22:45:48,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1690960.0, ans=0.125 2023-11-21 22:45:51,094 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253650 2023-11-21 22:45:58,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-21 22:46:07,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2023-11-21 22:46:13,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1691093.3333333333, ans=0.0 2023-11-21 22:46:28,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1691160.0, ans=0.0 2023-11-21 22:46:32,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1691160.0, ans=0.02 2023-11-21 22:46:50,820 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1200, loss[loss=0.08051, simple_loss=0.1036, pruned_loss=0.01928, audio_tagging_loss=0.009451, over 14978.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.09431, pruned_loss=0.01574, audio_tagging_loss=0.009353, over 3037664.57 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 16.0 2023-11-21 22:46:55,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253700 2023-11-21 22:47:07,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1691360.0, ans=0.2 2023-11-21 22:47:12,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.044e+01 8.657e+01 9.227e+01 1.115e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 22:47:21,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1691426.6666666667, ans=0.125 2023-11-21 22:47:48,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1691560.0, ans=0.2 2023-11-21 22:47:54,542 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1250, loss[loss=0.08554, simple_loss=0.1108, pruned_loss=0.01839, audio_tagging_loss=0.01177, over 15243.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.0936, pruned_loss=0.01568, audio_tagging_loss=0.009363, over 3029681.56 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 16.0 2023-11-21 22:48:00,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253750 2023-11-21 22:48:16,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-21 22:48:25,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1691760.0, ans=0.125 2023-11-21 22:48:39,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1691826.6666666667, ans=0.125 2023-11-21 22:48:42,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1691826.6666666667, ans=0.0 2023-11-21 22:48:59,683 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1300, loss[loss=0.07435, simple_loss=0.08887, pruned_loss=0.0183, audio_tagging_loss=0.01161, over 15768.00 frames. ], tot_loss[loss=0.07211, simple_loss=0.0942, pruned_loss=0.01576, audio_tagging_loss=0.009247, over 3030810.97 frames. ], batch size: 60, lr: 3.14e-03, grad_scale: 16.0 2023-11-21 22:49:04,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253800 2023-11-21 22:49:16,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1692026.6666666667, ans=0.1 2023-11-21 22:49:21,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1692026.6666666667, ans=0.5 2023-11-21 22:49:22,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 7.957e+01 8.431e+01 9.281e+01 1.105e+02, threshold=1.686e+02, percent-clipped=0.0 2023-11-21 22:50:04,322 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1350, loss[loss=0.08194, simple_loss=0.1129, pruned_loss=0.01854, audio_tagging_loss=0.006956, over 16452.00 frames. ], tot_loss[loss=0.07202, simple_loss=0.09423, pruned_loss=0.01564, audio_tagging_loss=0.009264, over 3032559.79 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 8.0 2023-11-21 22:50:10,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253850 2023-11-21 22:50:16,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-21 22:50:24,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1692360.0, ans=0.0 2023-11-21 22:50:40,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-21 22:50:46,435 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:50:51,080 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 22:51:03,757 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:51:09,536 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1400, loss[loss=0.07524, simple_loss=0.0955, pruned_loss=0.01853, audio_tagging_loss=0.008954, over 15399.00 frames. ], tot_loss[loss=0.07351, simple_loss=0.09613, pruned_loss=0.01615, audio_tagging_loss=0.009292, over 3037534.53 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 8.0 2023-11-21 22:51:15,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253900 2023-11-21 22:51:32,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.193e+01 8.723e+01 9.483e+01 1.296e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-21 22:51:39,383 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 22:51:39,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2023-11-21 22:51:48,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1692826.6666666667, ans=0.2 2023-11-21 22:52:15,005 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1450, loss[loss=0.0886, simple_loss=0.1182, pruned_loss=0.01982, audio_tagging_loss=0.009676, over 15659.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09584, pruned_loss=0.01598, audio_tagging_loss=0.009402, over 3034351.37 frames. ], batch size: 60, lr: 3.13e-03, grad_scale: 8.0 2023-11-21 22:52:19,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 253950 2023-11-21 22:52:21,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1692960.0, ans=0.0 2023-11-21 22:52:48,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1693093.3333333333, ans=0.0 2023-11-21 22:52:48,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-21 22:52:57,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1693160.0, ans=0.125 2023-11-21 22:53:14,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1693226.6666666667, ans=0.0 2023-11-21 22:53:19,090 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1500, loss[loss=0.06725, simple_loss=0.0842, pruned_loss=0.01481, audio_tagging_loss=0.01035, over 14056.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09628, pruned_loss=0.01606, audio_tagging_loss=0.009502, over 3038010.07 frames. ], batch size: 53, lr: 3.13e-03, grad_scale: 8.0 2023-11-21 22:53:24,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254000 2023-11-21 22:53:24,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1693293.3333333333, ans=0.2 2023-11-21 22:53:26,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1693293.3333333333, ans=0.0 2023-11-21 22:53:27,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-21 22:53:37,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1693360.0, ans=0.2 2023-11-21 22:53:38,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1693360.0, ans=0.0 2023-11-21 22:53:42,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.377e+01 8.863e+01 9.600e+01 1.162e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-21 22:53:50,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-21 22:54:16,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1693560.0, ans=0.125 2023-11-21 22:54:23,593 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1550, loss[loss=0.06641, simple_loss=0.08715, pruned_loss=0.01192, audio_tagging_loss=0.01092, over 14712.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09606, pruned_loss=0.01609, audio_tagging_loss=0.009556, over 3035302.97 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 8.0 2023-11-21 22:54:28,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254050 2023-11-21 22:54:41,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1693693.3333333333, ans=0.0 2023-11-21 22:54:51,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1693760.0, ans=0.0 2023-11-21 22:55:03,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1693826.6666666667, ans=0.125 2023-11-21 22:55:15,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1693893.3333333333, ans=0.125 2023-11-21 22:55:15,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1693893.3333333333, ans=10.0 2023-11-21 22:55:16,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1693893.3333333333, ans=0.0 2023-11-21 22:55:24,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1693893.3333333333, ans=0.125 2023-11-21 22:55:27,438 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1600, loss[loss=0.07691, simple_loss=0.0949, pruned_loss=0.01951, audio_tagging_loss=0.009944, over 14362.00 frames. ], tot_loss[loss=0.07347, simple_loss=0.09574, pruned_loss=0.01596, audio_tagging_loss=0.00964, over 3038368.22 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 22:55:32,303 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254100 2023-11-21 22:55:35,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1693960.0, ans=0.1 2023-11-21 22:55:39,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1694026.6666666667, ans=0.0 2023-11-21 22:55:41,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1694026.6666666667, ans=0.0 2023-11-21 22:55:50,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.195e+01 8.712e+01 9.406e+01 1.124e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-21 22:55:55,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2023-11-21 22:56:11,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1694160.0, ans=0.125 2023-11-21 22:56:30,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1694293.3333333333, ans=0.2 2023-11-21 22:56:31,732 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1650, loss[loss=0.08866, simple_loss=0.1157, pruned_loss=0.02187, audio_tagging_loss=0.008954, over 16114.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09558, pruned_loss=0.0159, audio_tagging_loss=0.009619, over 3041566.19 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 22:56:36,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254150 2023-11-21 22:56:46,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1694360.0, ans=0.125 2023-11-21 22:56:48,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1694360.0, ans=0.1 2023-11-21 22:56:56,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-21 22:57:01,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1694426.6666666667, ans=0.2 2023-11-21 22:57:06,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1694426.6666666667, ans=0.1 2023-11-21 22:57:23,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1694560.0, ans=0.0 2023-11-21 22:57:33,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=12.0 2023-11-21 22:57:36,309 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1700, loss[loss=0.05455, simple_loss=0.06805, pruned_loss=0.009972, audio_tagging_loss=0.01056, over 15999.00 frames. ], tot_loss[loss=0.07332, simple_loss=0.09569, pruned_loss=0.01584, audio_tagging_loss=0.009632, over 3047555.20 frames. ], batch size: 64, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 22:57:36,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1694626.6666666667, ans=0.07 2023-11-21 22:57:41,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254200 2023-11-21 22:57:57,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1694693.3333333333, ans=0.09899494936611666 2023-11-21 22:57:59,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.322e+01 8.804e+01 9.363e+01 1.175e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-21 22:58:04,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1694760.0, ans=0.0 2023-11-21 22:58:16,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1694826.6666666667, ans=0.0 2023-11-21 22:58:41,010 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1750, loss[loss=0.0694, simple_loss=0.09734, pruned_loss=0.01394, audio_tagging_loss=0.006797, over 16078.00 frames. ], tot_loss[loss=0.07336, simple_loss=0.09569, pruned_loss=0.01595, audio_tagging_loss=0.009563, over 3049270.59 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 22:58:41,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1694960.0, ans=0.125 2023-11-21 22:58:45,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1694960.0, ans=0.05 2023-11-21 22:58:45,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254250 2023-11-21 22:58:47,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1694960.0, ans=0.125 2023-11-21 22:59:22,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1695160.0, ans=0.1 2023-11-21 22:59:34,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-21 22:59:44,796 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1800, loss[loss=0.05337, simple_loss=0.07192, pruned_loss=0.00827, audio_tagging_loss=0.009144, over 16283.00 frames. ], tot_loss[loss=0.07306, simple_loss=0.09524, pruned_loss=0.01594, audio_tagging_loss=0.009495, over 3042656.91 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 22:59:50,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254300 2023-11-21 22:59:53,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1695293.3333333333, ans=0.1 2023-11-21 23:00:08,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.157e+01 8.741e+01 9.334e+01 1.416e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-21 23:00:33,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1695493.3333333333, ans=0.125 2023-11-21 23:00:34,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1695560.0, ans=0.2 2023-11-21 23:00:44,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1695560.0, ans=0.125 2023-11-21 23:00:49,767 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1850, loss[loss=0.0759, simple_loss=0.1052, pruned_loss=0.01543, audio_tagging_loss=0.007888, over 15011.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.09554, pruned_loss=0.01603, audio_tagging_loss=0.009411, over 3044874.01 frames. ], batch size: 54, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:00:54,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254350 2023-11-21 23:00:59,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1695626.6666666667, ans=0.0 2023-11-21 23:01:04,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1695693.3333333333, ans=0.125 2023-11-21 23:01:06,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1695693.3333333333, ans=0.125 2023-11-21 23:01:22,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1695760.0, ans=0.125 2023-11-21 23:01:41,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1695893.3333333333, ans=0.2 2023-11-21 23:01:54,067 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1900, loss[loss=0.09585, simple_loss=0.1248, pruned_loss=0.02472, audio_tagging_loss=0.008727, over 15429.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09558, pruned_loss=0.01581, audio_tagging_loss=0.009322, over 3046753.89 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:01:55,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1695960.0, ans=0.125 2023-11-21 23:01:59,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254400 2023-11-21 23:02:16,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 8.079e+01 8.685e+01 9.740e+01 1.437e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-21 23:02:23,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1696093.3333333333, ans=0.125 2023-11-21 23:02:26,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1696093.3333333333, ans=0.125 2023-11-21 23:02:29,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1696093.3333333333, ans=0.0 2023-11-21 23:02:40,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1696160.0, ans=0.125 2023-11-21 23:02:43,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-21 23:02:55,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.26 vs. limit=22.5 2023-11-21 23:02:58,607 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 1950, loss[loss=0.09393, simple_loss=0.1253, pruned_loss=0.02375, audio_tagging_loss=0.007524, over 15736.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.096, pruned_loss=0.01594, audio_tagging_loss=0.009271, over 3047081.58 frames. ], batch size: 57, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:03:03,579 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254450 2023-11-21 23:03:30,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-21 23:03:38,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1696493.3333333333, ans=0.05 2023-11-21 23:03:47,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1696493.3333333333, ans=0.2 2023-11-21 23:04:04,683 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2000, loss[loss=0.07137, simple_loss=0.09359, pruned_loss=0.01793, audio_tagging_loss=0.006646, over 15131.00 frames. ], tot_loss[loss=0.07316, simple_loss=0.09596, pruned_loss=0.01583, audio_tagging_loss=0.009346, over 3041121.88 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:04:09,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254500 2023-11-21 23:04:24,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1696693.3333333333, ans=0.125 2023-11-21 23:04:27,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.375e+01 7.983e+01 8.610e+01 9.428e+01 1.340e+02, threshold=1.722e+02, percent-clipped=0.0 2023-11-21 23:04:42,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1696826.6666666667, ans=0.0 2023-11-21 23:04:56,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-11-21 23:05:05,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1696893.3333333333, ans=0.0 2023-11-21 23:05:09,018 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2050, loss[loss=0.07279, simple_loss=0.09592, pruned_loss=0.01464, audio_tagging_loss=0.01019, over 15303.00 frames. ], tot_loss[loss=0.07268, simple_loss=0.09516, pruned_loss=0.01572, audio_tagging_loss=0.009379, over 3038735.09 frames. ], batch size: 57, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:05:14,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254550 2023-11-21 23:05:22,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1697026.6666666667, ans=0.0 2023-11-21 23:05:25,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1697026.6666666667, ans=0.0 2023-11-21 23:05:38,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1697093.3333333333, ans=0.0 2023-11-21 23:05:46,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1697093.3333333333, ans=0.125 2023-11-21 23:06:02,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1697226.6666666667, ans=0.035 2023-11-21 23:06:13,800 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2100, loss[loss=0.06409, simple_loss=0.07813, pruned_loss=0.01474, audio_tagging_loss=0.01028, over 16395.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.09559, pruned_loss=0.01597, audio_tagging_loss=0.009359, over 3042198.08 frames. ], batch size: 64, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:06:17,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1697293.3333333333, ans=0.1 2023-11-21 23:06:18,723 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254600 2023-11-21 23:06:37,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 8.098e+01 8.657e+01 9.385e+01 1.571e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-21 23:06:55,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1697493.3333333333, ans=0.0 2023-11-21 23:07:00,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1697493.3333333333, ans=0.125 2023-11-21 23:07:05,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1697560.0, ans=0.125 2023-11-21 23:07:18,033 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2150, loss[loss=0.05594, simple_loss=0.06633, pruned_loss=0.01074, audio_tagging_loss=0.01203, over 14112.00 frames. ], tot_loss[loss=0.07287, simple_loss=0.09572, pruned_loss=0.01574, audio_tagging_loss=0.009273, over 3038717.02 frames. ], batch size: 54, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:07:21,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1697626.6666666667, ans=0.0 2023-11-21 23:07:23,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-21 23:07:23,704 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254650 2023-11-21 23:07:48,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1697760.0, ans=0.125 2023-11-21 23:07:55,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1697826.6666666667, ans=0.1 2023-11-21 23:07:56,833 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:07:57,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1697826.6666666667, ans=0.0 2023-11-21 23:08:11,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1697893.3333333333, ans=0.125 2023-11-21 23:08:11,645 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:08:23,089 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2200, loss[loss=0.05837, simple_loss=0.06817, pruned_loss=0.01279, audio_tagging_loss=0.0115, over 13716.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09451, pruned_loss=0.01563, audio_tagging_loss=0.009397, over 3045007.00 frames. ], batch size: 54, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:08:23,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1697960.0, ans=0.2 2023-11-21 23:08:28,752 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254700 2023-11-21 23:08:31,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1697960.0, ans=0.125 2023-11-21 23:08:44,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1698026.6666666667, ans=0.2 2023-11-21 23:08:47,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.502e+01 8.027e+01 8.507e+01 9.488e+01 1.436e+02, threshold=1.701e+02, percent-clipped=0.0 2023-11-21 23:08:51,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.97 vs. limit=10.0 2023-11-21 23:09:15,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:09:27,597 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2250, loss[loss=0.08392, simple_loss=0.1073, pruned_loss=0.02107, audio_tagging_loss=0.009214, over 15292.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09494, pruned_loss=0.01571, audio_tagging_loss=0.009287, over 3040656.46 frames. ], batch size: 58, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:09:32,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254750 2023-11-21 23:09:32,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1698293.3333333333, ans=0.0 2023-11-21 23:09:47,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-21 23:09:57,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1698426.6666666667, ans=0.2 2023-11-21 23:10:10,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1698493.3333333333, ans=0.125 2023-11-21 23:10:12,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1698493.3333333333, ans=0.125 2023-11-21 23:10:16,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1698493.3333333333, ans=0.5 2023-11-21 23:10:22,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1698560.0, ans=0.1 2023-11-21 23:10:32,071 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2300, loss[loss=0.07725, simple_loss=0.1064, pruned_loss=0.01669, audio_tagging_loss=0.007357, over 15461.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09551, pruned_loss=0.01588, audio_tagging_loss=0.009306, over 3034560.12 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:10:38,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254800 2023-11-21 23:10:51,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1698693.3333333333, ans=0.0 2023-11-21 23:10:57,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.213e+01 8.771e+01 9.442e+01 1.167e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-21 23:11:03,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2023-11-21 23:11:14,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1698826.6666666667, ans=0.0 2023-11-21 23:11:29,826 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:11:37,145 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2350, loss[loss=0.04967, simple_loss=0.06145, pruned_loss=0.009196, audio_tagging_loss=0.009751, over 14185.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09524, pruned_loss=0.01585, audio_tagging_loss=0.009411, over 3031347.85 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 16.0 2023-11-21 23:11:41,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1698960.0, ans=0.125 2023-11-21 23:11:43,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254850 2023-11-21 23:11:44,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1698960.0, ans=0.0 2023-11-21 23:11:48,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1698960.0, ans=0.1 2023-11-21 23:12:26,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-21 23:12:30,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=22.5 2023-11-21 23:12:37,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1699226.6666666667, ans=0.0 2023-11-21 23:12:42,283 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2400, loss[loss=0.09088, simple_loss=0.1266, pruned_loss=0.0206, audio_tagging_loss=0.006977, over 14968.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09473, pruned_loss=0.01585, audio_tagging_loss=0.009504, over 3031852.30 frames. ], batch size: 54, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:12:47,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254900 2023-11-21 23:12:53,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1699360.0, ans=0.0 2023-11-21 23:12:57,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1699360.0, ans=0.125 2023-11-21 23:13:06,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 7.976e+01 8.691e+01 9.517e+01 1.439e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-21 23:13:09,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-21 23:13:22,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1699493.3333333333, ans=0.125 2023-11-21 23:13:26,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1699493.3333333333, ans=0.0 2023-11-21 23:13:32,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1699560.0, ans=0.125 2023-11-21 23:13:38,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1699560.0, ans=0.1 2023-11-21 23:13:41,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2023-11-21 23:13:45,586 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2450, loss[loss=0.05725, simple_loss=0.07637, pruned_loss=0.01205, audio_tagging_loss=0.007014, over 14950.00 frames. ], tot_loss[loss=0.07306, simple_loss=0.09491, pruned_loss=0.01598, audio_tagging_loss=0.009622, over 3036677.83 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:13:51,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 254950 2023-11-21 23:13:54,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1699626.6666666667, ans=0.125 2023-11-21 23:14:18,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1699760.0, ans=0.2 2023-11-21 23:14:22,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1699760.0, ans=15.0 2023-11-21 23:14:50,011 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2500, loss[loss=0.06656, simple_loss=0.08402, pruned_loss=0.01371, audio_tagging_loss=0.01083, over 15293.00 frames. ], tot_loss[loss=0.07285, simple_loss=0.09473, pruned_loss=0.0158, audio_tagging_loss=0.009681, over 3035674.58 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:14:54,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255000 2023-11-21 23:15:01,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1699960.0, ans=0.1 2023-11-21 23:15:14,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.176e+01 8.833e+01 9.682e+01 1.336e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-21 23:15:22,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-21 23:15:27,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1700160.0, ans=0.0 2023-11-21 23:15:31,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1700160.0, ans=0.2 2023-11-21 23:15:35,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2023-11-21 23:15:55,698 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2550, loss[loss=0.08037, simple_loss=0.1026, pruned_loss=0.01996, audio_tagging_loss=0.009116, over 16239.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09444, pruned_loss=0.01579, audio_tagging_loss=0.009681, over 3032131.42 frames. ], batch size: 60, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:15:56,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1700293.3333333333, ans=0.125 2023-11-21 23:16:00,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255050 2023-11-21 23:16:03,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-21 23:16:18,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1700360.0, ans=0.2 2023-11-21 23:16:27,774 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:16:28,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1700426.6666666667, ans=0.0 2023-11-21 23:16:32,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1700493.3333333333, ans=0.125 2023-11-21 23:16:42,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1700493.3333333333, ans=0.0 2023-11-21 23:16:51,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1700560.0, ans=0.1 2023-11-21 23:17:00,025 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2600, loss[loss=0.07042, simple_loss=0.09957, pruned_loss=0.01263, audio_tagging_loss=0.008006, over 15199.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09457, pruned_loss=0.01583, audio_tagging_loss=0.009525, over 3038968.14 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:17:05,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255100 2023-11-21 23:17:16,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1700693.3333333333, ans=0.125 2023-11-21 23:17:21,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1700693.3333333333, ans=0.0 2023-11-21 23:17:24,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.256e+01 8.734e+01 9.365e+01 1.832e+02, threshold=1.747e+02, percent-clipped=1.0 2023-11-21 23:17:45,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1700826.6666666667, ans=0.1 2023-11-21 23:17:54,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1700893.3333333333, ans=0.0 2023-11-21 23:17:55,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1700893.3333333333, ans=0.2 2023-11-21 23:18:05,194 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2650, loss[loss=0.08901, simple_loss=0.1216, pruned_loss=0.02058, audio_tagging_loss=0.007656, over 15628.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09545, pruned_loss=0.01592, audio_tagging_loss=0.009522, over 3041085.33 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:18:09,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1700960.0, ans=0.125 2023-11-21 23:18:10,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255150 2023-11-21 23:18:12,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1700960.0, ans=0.125 2023-11-21 23:18:13,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1700960.0, ans=0.1 2023-11-21 23:18:18,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1701026.6666666667, ans=6.0 2023-11-21 23:18:19,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1701026.6666666667, ans=0.0 2023-11-21 23:18:58,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1701226.6666666667, ans=0.125 2023-11-21 23:18:59,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1701226.6666666667, ans=0.1 2023-11-21 23:19:08,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1701293.3333333333, ans=0.0 2023-11-21 23:19:09,743 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2700, loss[loss=0.08457, simple_loss=0.1098, pruned_loss=0.02106, audio_tagging_loss=0.008587, over 13999.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09531, pruned_loss=0.01591, audio_tagging_loss=0.009533, over 3035991.61 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:19:14,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255200 2023-11-21 23:19:16,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1701293.3333333333, ans=0.125 2023-11-21 23:19:34,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.128e+01 8.561e+01 9.314e+01 1.273e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 23:20:12,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-11-21 23:20:13,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2023-11-21 23:20:15,118 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2750, loss[loss=0.07715, simple_loss=0.1102, pruned_loss=0.01414, audio_tagging_loss=0.007917, over 15800.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.09461, pruned_loss=0.01584, audio_tagging_loss=0.009503, over 3032658.97 frames. ], batch size: 57, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:20:19,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1701626.6666666667, ans=0.125 2023-11-21 23:20:20,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255250 2023-11-21 23:20:49,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1701760.0, ans=0.1 2023-11-21 23:20:50,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1701760.0, ans=0.125 2023-11-21 23:21:00,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1701826.6666666667, ans=0.0 2023-11-21 23:21:07,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1701893.3333333333, ans=0.95 2023-11-21 23:21:08,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1701893.3333333333, ans=0.2 2023-11-21 23:21:10,825 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:21:15,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1701893.3333333333, ans=0.125 2023-11-21 23:21:20,688 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2800, loss[loss=0.09086, simple_loss=0.1271, pruned_loss=0.02017, audio_tagging_loss=0.007151, over 15088.00 frames. ], tot_loss[loss=0.07224, simple_loss=0.09382, pruned_loss=0.0158, audio_tagging_loss=0.009529, over 3031556.60 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:21:22,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1701960.0, ans=0.2 2023-11-21 23:21:23,557 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:21:25,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255300 2023-11-21 23:21:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1702026.6666666667, ans=0.09899494936611666 2023-11-21 23:21:37,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1702026.6666666667, ans=0.125 2023-11-21 23:21:43,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1702026.6666666667, ans=0.125 2023-11-21 23:21:44,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 7.892e+01 8.472e+01 9.341e+01 1.172e+02, threshold=1.694e+02, percent-clipped=0.0 2023-11-21 23:21:48,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1702093.3333333333, ans=0.0 2023-11-21 23:22:01,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1702160.0, ans=0.0 2023-11-21 23:22:13,779 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:22:25,124 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2850, loss[loss=0.07959, simple_loss=0.1072, pruned_loss=0.01725, audio_tagging_loss=0.008733, over 15040.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09388, pruned_loss=0.01591, audio_tagging_loss=0.0095, over 3040117.68 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:22:30,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255350 2023-11-21 23:22:40,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1702360.0, ans=0.1 2023-11-21 23:22:49,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1702426.6666666667, ans=0.125 2023-11-21 23:22:50,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1702426.6666666667, ans=0.2 2023-11-21 23:23:13,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1702493.3333333333, ans=0.125 2023-11-21 23:23:27,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1702626.6666666667, ans=0.05 2023-11-21 23:23:28,842 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2900, loss[loss=0.07683, simple_loss=0.09351, pruned_loss=0.01824, audio_tagging_loss=0.01183, over 15843.00 frames. ], tot_loss[loss=0.07194, simple_loss=0.09378, pruned_loss=0.01566, audio_tagging_loss=0.009394, over 3040149.07 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:23:30,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1702626.6666666667, ans=0.0 2023-11-21 23:23:34,420 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255400 2023-11-21 23:23:54,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.570e+01 8.449e+01 9.076e+01 9.852e+01 1.265e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-21 23:23:55,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1702760.0, ans=0.1 2023-11-21 23:24:18,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1702826.6666666667, ans=0.125 2023-11-21 23:24:21,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-21 23:24:24,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1702893.3333333333, ans=0.0 2023-11-21 23:24:33,956 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 2950, loss[loss=0.06864, simple_loss=0.08898, pruned_loss=0.01152, audio_tagging_loss=0.01264, over 15266.00 frames. ], tot_loss[loss=0.07305, simple_loss=0.0953, pruned_loss=0.01597, audio_tagging_loss=0.009433, over 3042514.33 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:24:34,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1702960.0, ans=0.0 2023-11-21 23:24:38,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255450 2023-11-21 23:24:39,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1702960.0, ans=0.0 2023-11-21 23:25:37,732 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3000, loss[loss=0.08142, simple_loss=0.1048, pruned_loss=0.02086, audio_tagging_loss=0.008145, over 15973.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09478, pruned_loss=0.01593, audio_tagging_loss=0.009521, over 3045470.76 frames. ], batch size: 58, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:25:37,733 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-21 23:26:10,087 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0138, 5.9199, 5.7197, 5.6602], device='cuda:1') 2023-11-21 23:26:16,400 INFO [train_asr.py:1253] (1/4) Epoch 22, validation: loss=0.05907, simple_loss=0.0519, pruned_loss=0.005126, audio_tagging_loss=0.02799, over 4681554.00 frames. 2023-11-21 23:26:16,401 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-21 23:26:19,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-21 23:26:21,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255500 2023-11-21 23:26:27,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1703293.3333333333, ans=0.125 2023-11-21 23:26:41,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 7.989e+01 8.698e+01 9.491e+01 1.260e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-21 23:26:41,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1703426.6666666667, ans=0.125 2023-11-21 23:26:43,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-21 23:26:50,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1703426.6666666667, ans=0.0 2023-11-21 23:27:02,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-21 23:27:08,154 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:27:19,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1703560.0, ans=0.0 2023-11-21 23:27:20,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1703626.6666666667, ans=0.125 2023-11-21 23:27:21,951 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3050, loss[loss=0.06258, simple_loss=0.08262, pruned_loss=0.01211, audio_tagging_loss=0.009155, over 15331.00 frames. ], tot_loss[loss=0.07381, simple_loss=0.09636, pruned_loss=0.01621, audio_tagging_loss=0.009426, over 3057063.39 frames. ], batch size: 58, lr: 3.13e-03, grad_scale: 32.0 2023-11-21 23:27:26,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255550 2023-11-21 23:27:29,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1703626.6666666667, ans=0.0 2023-11-21 23:27:41,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-21 23:27:49,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1703760.0, ans=0.0 2023-11-21 23:27:57,674 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:28:03,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1703826.6666666667, ans=0.0 2023-11-21 23:28:05,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-21 23:28:12,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1703893.3333333333, ans=0.125 2023-11-21 23:28:13,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1703893.3333333333, ans=0.125 2023-11-21 23:28:25,770 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3100, loss[loss=0.0608, simple_loss=0.07677, pruned_loss=0.01177, audio_tagging_loss=0.01065, over 15015.00 frames. ], tot_loss[loss=0.07454, simple_loss=0.09708, pruned_loss=0.01649, audio_tagging_loss=0.009513, over 3054193.33 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:28:30,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255600 2023-11-21 23:28:47,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1704026.6666666667, ans=0.125 2023-11-21 23:28:51,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 7.990e+01 8.561e+01 9.403e+01 1.205e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-21 23:29:24,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1704226.6666666667, ans=0.05 2023-11-21 23:29:29,643 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3150, loss[loss=0.0618, simple_loss=0.07539, pruned_loss=0.01244, audio_tagging_loss=0.01167, over 14057.00 frames. ], tot_loss[loss=0.0744, simple_loss=0.09701, pruned_loss=0.01632, audio_tagging_loss=0.009577, over 3053296.08 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:29:34,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255650 2023-11-21 23:29:46,632 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:30:35,324 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3200, loss[loss=0.0756, simple_loss=0.09639, pruned_loss=0.01601, audio_tagging_loss=0.0114, over 15553.00 frames. ], tot_loss[loss=0.07466, simple_loss=0.09751, pruned_loss=0.01633, audio_tagging_loss=0.009579, over 3056708.27 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:30:40,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255700 2023-11-21 23:30:49,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1704693.3333333333, ans=0.125 2023-11-21 23:30:53,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1704693.3333333333, ans=0.125 2023-11-21 23:31:01,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.275e+01 8.900e+01 9.462e+01 1.368e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-21 23:31:11,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1704826.6666666667, ans=0.0 2023-11-21 23:31:28,527 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:31:29,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1704893.3333333333, ans=0.125 2023-11-21 23:31:30,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1704893.3333333333, ans=0.0 2023-11-21 23:31:34,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1704893.3333333333, ans=0.0 2023-11-21 23:31:39,784 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3250, loss[loss=0.05564, simple_loss=0.0745, pruned_loss=0.008549, audio_tagging_loss=0.009841, over 13939.00 frames. ], tot_loss[loss=0.07424, simple_loss=0.09682, pruned_loss=0.01615, audio_tagging_loss=0.009675, over 3048282.21 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:31:44,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255750 2023-11-21 23:31:57,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1705026.6666666667, ans=0.125 2023-11-21 23:32:06,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-11-21 23:32:09,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1705093.3333333333, ans=0.0 2023-11-21 23:32:17,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1705160.0, ans=0.125 2023-11-21 23:32:28,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1705160.0, ans=0.125 2023-11-21 23:32:33,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1705226.6666666667, ans=0.125 2023-11-21 23:32:39,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1705226.6666666667, ans=0.0 2023-11-21 23:32:43,657 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3300, loss[loss=0.0947, simple_loss=0.1234, pruned_loss=0.02311, audio_tagging_loss=0.009918, over 14750.00 frames. ], tot_loss[loss=0.07411, simple_loss=0.09633, pruned_loss=0.01618, audio_tagging_loss=0.009769, over 3050979.14 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:32:48,622 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255800 2023-11-21 23:33:11,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.185e+01 8.316e+01 8.844e+01 9.343e+01 1.264e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-21 23:33:39,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1705560.0, ans=0.0 2023-11-21 23:33:40,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.07 vs. limit=10.0 2023-11-21 23:33:47,415 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3350, loss[loss=0.06319, simple_loss=0.08049, pruned_loss=0.01347, audio_tagging_loss=0.009471, over 13751.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.0959, pruned_loss=0.01607, audio_tagging_loss=0.009608, over 3053140.88 frames. ], batch size: 50, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:33:52,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255850 2023-11-21 23:34:07,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1705693.3333333333, ans=0.0 2023-11-21 23:34:20,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-21 23:34:52,869 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3400, loss[loss=0.07349, simple_loss=0.1091, pruned_loss=0.01234, audio_tagging_loss=0.00661, over 16060.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09599, pruned_loss=0.01592, audio_tagging_loss=0.009389, over 3052534.33 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:34:57,785 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255900 2023-11-21 23:35:00,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1705960.0, ans=0.0 2023-11-21 23:35:10,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:35:12,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1706026.6666666667, ans=0.125 2023-11-21 23:35:18,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.503e+01 8.132e+01 8.792e+01 9.497e+01 1.160e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-21 23:35:28,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1706093.3333333333, ans=0.125 2023-11-21 23:35:33,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1706160.0, ans=0.0 2023-11-21 23:35:45,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-21 23:35:46,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1706226.6666666667, ans=0.125 2023-11-21 23:35:51,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1706226.6666666667, ans=0.2 2023-11-21 23:35:56,090 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3450, loss[loss=0.06397, simple_loss=0.08267, pruned_loss=0.01275, audio_tagging_loss=0.009891, over 14717.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09452, pruned_loss=0.01555, audio_tagging_loss=0.009393, over 3052015.68 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:35:58,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1706293.3333333333, ans=0.125 2023-11-21 23:36:00,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1706293.3333333333, ans=0.0 2023-11-21 23:36:01,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 255950 2023-11-21 23:36:02,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1706293.3333333333, ans=0.125 2023-11-21 23:36:17,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1706360.0, ans=0.95 2023-11-21 23:36:27,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2023-11-21 23:36:33,042 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:36:36,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1706493.3333333333, ans=0.2 2023-11-21 23:36:59,414 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3500, loss[loss=0.08379, simple_loss=0.1186, pruned_loss=0.01623, audio_tagging_loss=0.008265, over 15163.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09447, pruned_loss=0.01563, audio_tagging_loss=0.009332, over 3051457.79 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:37:05,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256000 2023-11-21 23:37:18,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1706693.3333333333, ans=0.125 2023-11-21 23:37:29,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1706760.0, ans=0.1 2023-11-21 23:37:31,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 7.990e+01 8.553e+01 9.404e+01 1.412e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-21 23:37:32,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1706760.0, ans=0.09899494936611666 2023-11-21 23:37:37,252 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:38:01,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1706893.3333333333, ans=0.0 2023-11-21 23:38:08,318 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3550, loss[loss=0.04549, simple_loss=0.03957, pruned_loss=0.01153, audio_tagging_loss=0.01417, over 13583.00 frames. ], tot_loss[loss=0.07203, simple_loss=0.09408, pruned_loss=0.01557, audio_tagging_loss=0.009416, over 3050913.44 frames. ], batch size: 53, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:38:13,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256050 2023-11-21 23:38:21,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1707026.6666666667, ans=0.0 2023-11-21 23:38:24,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1707026.6666666667, ans=0.1 2023-11-21 23:38:31,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1707026.6666666667, ans=0.0 2023-11-21 23:39:09,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1707226.6666666667, ans=0.025 2023-11-21 23:39:12,512 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3600, loss[loss=0.07386, simple_loss=0.1028, pruned_loss=0.01568, audio_tagging_loss=0.006781, over 14778.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09324, pruned_loss=0.01534, audio_tagging_loss=0.009389, over 3044790.62 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:39:16,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1707293.3333333333, ans=0.0 2023-11-21 23:39:16,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1707293.3333333333, ans=0.0 2023-11-21 23:39:17,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256100 2023-11-21 23:39:17,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1707293.3333333333, ans=0.0 2023-11-21 23:39:22,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1707293.3333333333, ans=0.0 2023-11-21 23:39:33,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1707360.0, ans=0.2 2023-11-21 23:39:34,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1707360.0, ans=0.0 2023-11-21 23:39:39,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.698e+01 8.079e+01 8.522e+01 9.097e+01 1.276e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-21 23:39:39,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1707426.6666666667, ans=0.2 2023-11-21 23:39:43,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1707426.6666666667, ans=0.125 2023-11-21 23:39:55,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1707493.3333333333, ans=0.125 2023-11-21 23:40:16,166 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3650, loss[loss=0.07187, simple_loss=0.09051, pruned_loss=0.01716, audio_tagging_loss=0.009447, over 14531.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09312, pruned_loss=0.0154, audio_tagging_loss=0.009377, over 3047733.93 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:40:21,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256150 2023-11-21 23:40:24,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-21 23:40:48,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1707760.0, ans=0.07 2023-11-21 23:41:21,379 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3700, loss[loss=0.06997, simple_loss=0.1002, pruned_loss=0.01347, audio_tagging_loss=0.006371, over 15564.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.0935, pruned_loss=0.01555, audio_tagging_loss=0.009401, over 3049761.87 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:41:24,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1707960.0, ans=0.125 2023-11-21 23:41:27,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256200 2023-11-21 23:41:36,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-21 23:41:50,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.207e+01 8.757e+01 9.528e+01 1.164e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-21 23:42:15,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1708226.6666666667, ans=0.125 2023-11-21 23:42:26,872 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3750, loss[loss=0.07482, simple_loss=0.09232, pruned_loss=0.01729, audio_tagging_loss=0.01138, over 16232.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09498, pruned_loss=0.01598, audio_tagging_loss=0.009405, over 3052837.40 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:42:31,908 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256250 2023-11-21 23:42:55,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1708426.6666666667, ans=0.125 2023-11-21 23:43:06,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2023-11-21 23:43:11,590 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:43:29,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2023-11-21 23:43:30,484 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3800, loss[loss=0.07668, simple_loss=0.09877, pruned_loss=0.01712, audio_tagging_loss=0.01018, over 15305.00 frames. ], tot_loss[loss=0.07332, simple_loss=0.09565, pruned_loss=0.01608, audio_tagging_loss=0.009413, over 3051516.21 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:43:36,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256300 2023-11-21 23:43:37,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1708626.6666666667, ans=0.1 2023-11-21 23:43:59,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.403e+01 8.924e+01 9.481e+01 1.120e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-21 23:44:06,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1708760.0, ans=0.09899494936611666 2023-11-21 23:44:35,821 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3850, loss[loss=0.06041, simple_loss=0.07175, pruned_loss=0.01395, audio_tagging_loss=0.01059, over 14598.00 frames. ], tot_loss[loss=0.07305, simple_loss=0.09555, pruned_loss=0.01583, audio_tagging_loss=0.009443, over 3050868.60 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:44:40,824 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256350 2023-11-21 23:44:43,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1708960.0, ans=0.1 2023-11-21 23:44:49,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-21 23:45:01,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-21 23:45:05,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1709093.3333333333, ans=0.1 2023-11-21 23:45:10,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1709093.3333333333, ans=0.125 2023-11-21 23:45:10,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1709093.3333333333, ans=0.0 2023-11-21 23:45:12,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2023-11-21 23:45:26,866 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:45:29,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1709226.6666666667, ans=0.5 2023-11-21 23:45:33,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1709226.6666666667, ans=0.0 2023-11-21 23:45:40,210 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3900, loss[loss=0.07479, simple_loss=0.09727, pruned_loss=0.0137, audio_tagging_loss=0.01246, over 15804.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09499, pruned_loss=0.01578, audio_tagging_loss=0.009718, over 3051832.62 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:45:40,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1709293.3333333333, ans=0.125 2023-11-21 23:45:45,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256400 2023-11-21 23:46:05,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1709426.6666666667, ans=0.125 2023-11-21 23:46:09,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.019e+01 8.673e+01 9.477e+01 1.418e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-21 23:46:11,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2023-11-21 23:46:19,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1709493.3333333333, ans=0.125 2023-11-21 23:46:26,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1709493.3333333333, ans=0.125 2023-11-21 23:46:45,106 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 3950, loss[loss=0.08258, simple_loss=0.1003, pruned_loss=0.01995, audio_tagging_loss=0.01246, over 15161.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09524, pruned_loss=0.0159, audio_tagging_loss=0.009776, over 3051766.87 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 16.0 2023-11-21 23:46:48,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1709626.6666666667, ans=0.125 2023-11-21 23:46:50,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256450 2023-11-21 23:46:58,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1709693.3333333333, ans=0.07 2023-11-21 23:47:03,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1709693.3333333333, ans=0.09899494936611666 2023-11-21 23:47:17,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1709760.0, ans=0.0 2023-11-21 23:47:21,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1709760.0, ans=0.125 2023-11-21 23:47:49,831 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4000, loss[loss=0.06022, simple_loss=0.07587, pruned_loss=0.0123, audio_tagging_loss=0.009983, over 15541.00 frames. ], tot_loss[loss=0.07293, simple_loss=0.09467, pruned_loss=0.01585, audio_tagging_loss=0.009744, over 3049975.69 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:47:54,752 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256500 2023-11-21 23:48:01,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1710026.6666666667, ans=0.125 2023-11-21 23:48:04,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1710026.6666666667, ans=0.125 2023-11-21 23:48:06,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2023-11-21 23:48:15,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1710093.3333333333, ans=0.0 2023-11-21 23:48:16,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1710093.3333333333, ans=0.125 2023-11-21 23:48:17,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.007e+01 8.495e+01 8.892e+01 9.387e+01 1.240e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-21 23:48:21,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1710093.3333333333, ans=0.015 2023-11-21 23:48:23,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1710093.3333333333, ans=0.0 2023-11-21 23:48:28,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1710160.0, ans=0.125 2023-11-21 23:48:38,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1710160.0, ans=0.125 2023-11-21 23:48:42,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1710226.6666666667, ans=0.1 2023-11-21 23:48:48,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1710226.6666666667, ans=0.2 2023-11-21 23:48:52,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1710293.3333333333, ans=0.07 2023-11-21 23:48:53,489 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4050, loss[loss=0.07011, simple_loss=0.09819, pruned_loss=0.01234, audio_tagging_loss=0.008676, over 15219.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.09533, pruned_loss=0.01597, audio_tagging_loss=0.009717, over 3046333.19 frames. ], batch size: 58, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:48:57,223 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:48:58,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256550 2023-11-21 23:48:58,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1710293.3333333333, ans=0.125 2023-11-21 23:49:05,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1710360.0, ans=0.0 2023-11-21 23:49:32,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1710493.3333333333, ans=0.1 2023-11-21 23:49:34,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1710493.3333333333, ans=0.125 2023-11-21 23:49:57,002 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4100, loss[loss=0.05881, simple_loss=0.07961, pruned_loss=0.01027, audio_tagging_loss=0.008738, over 16545.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09472, pruned_loss=0.0158, audio_tagging_loss=0.009759, over 3040928.36 frames. ], batch size: 63, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:50:01,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1710626.6666666667, ans=0.0 2023-11-21 23:50:02,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256600 2023-11-21 23:50:16,724 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:50:25,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1710760.0, ans=0.0 2023-11-21 23:50:26,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.154e+01 8.730e+01 9.411e+01 1.441e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-21 23:50:27,310 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:50:33,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1710760.0, ans=0.125 2023-11-21 23:51:03,033 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4150, loss[loss=0.06403, simple_loss=0.08457, pruned_loss=0.01147, audio_tagging_loss=0.01028, over 14861.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09496, pruned_loss=0.01583, audio_tagging_loss=0.009608, over 3040044.59 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:51:08,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256650 2023-11-21 23:51:20,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1711026.6666666667, ans=0.125 2023-11-21 23:51:30,179 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:51:31,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1711093.3333333333, ans=0.125 2023-11-21 23:51:38,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2023-11-21 23:51:41,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1711160.0, ans=0.0 2023-11-21 23:51:42,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1711160.0, ans=0.125 2023-11-21 23:51:50,781 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-21 23:52:08,117 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4200, loss[loss=0.08601, simple_loss=0.1062, pruned_loss=0.02154, audio_tagging_loss=0.01139, over 15432.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.09507, pruned_loss=0.01586, audio_tagging_loss=0.009587, over 3038822.18 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:52:13,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256700 2023-11-21 23:52:13,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1711293.3333333333, ans=0.125 2023-11-21 23:52:27,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-21 23:52:34,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1711426.6666666667, ans=0.0 2023-11-21 23:52:34,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1711426.6666666667, ans=0.125 2023-11-21 23:52:36,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 7.996e+01 8.840e+01 9.314e+01 1.225e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-21 23:52:48,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1711493.3333333333, ans=0.125 2023-11-21 23:52:49,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1711493.3333333333, ans=0.1 2023-11-21 23:52:50,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1711493.3333333333, ans=0.1 2023-11-21 23:53:11,823 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4250, loss[loss=0.06828, simple_loss=0.08912, pruned_loss=0.01421, audio_tagging_loss=0.009502, over 15365.00 frames. ], tot_loss[loss=0.07323, simple_loss=0.09578, pruned_loss=0.01594, audio_tagging_loss=0.009396, over 3044853.37 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:53:16,993 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256750 2023-11-21 23:53:19,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-21 23:53:35,987 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:53:43,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1711760.0, ans=0.1 2023-11-21 23:53:48,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-21 23:54:04,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1711893.3333333333, ans=0.1 2023-11-21 23:54:10,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:54:12,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-21 23:54:16,090 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4300, loss[loss=0.07539, simple_loss=0.09946, pruned_loss=0.01765, audio_tagging_loss=0.008013, over 14994.00 frames. ], tot_loss[loss=0.07372, simple_loss=0.09645, pruned_loss=0.0161, audio_tagging_loss=0.009399, over 3046289.41 frames. ], batch size: 54, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:54:16,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1711960.0, ans=0.125 2023-11-21 23:54:20,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1711960.0, ans=0.125 2023-11-21 23:54:22,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256800 2023-11-21 23:54:22,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1711960.0, ans=0.0 2023-11-21 23:54:28,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1712026.6666666667, ans=0.125 2023-11-21 23:54:31,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1712026.6666666667, ans=0.0 2023-11-21 23:54:45,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.276e+01 8.988e+01 9.758e+01 2.160e+02, threshold=1.798e+02, percent-clipped=1.0 2023-11-21 23:54:51,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1712093.3333333333, ans=0.125 2023-11-21 23:55:09,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-21 23:55:22,071 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4350, loss[loss=0.04856, simple_loss=0.05814, pruned_loss=0.01, audio_tagging_loss=0.009486, over 15053.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09522, pruned_loss=0.01597, audio_tagging_loss=0.009464, over 3040134.91 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:55:26,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256850 2023-11-21 23:55:31,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1712293.3333333333, ans=0.0 2023-11-21 23:55:39,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2023-11-21 23:55:46,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1712426.6666666667, ans=0.1 2023-11-21 23:56:12,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2023-11-21 23:56:14,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1712560.0, ans=0.0 2023-11-21 23:56:25,327 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4400, loss[loss=0.06665, simple_loss=0.08404, pruned_loss=0.01574, audio_tagging_loss=0.008888, over 15502.00 frames. ], tot_loss[loss=0.07333, simple_loss=0.09579, pruned_loss=0.01603, audio_tagging_loss=0.009404, over 3035607.77 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:56:26,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1712626.6666666667, ans=0.125 2023-11-21 23:56:30,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256900 2023-11-21 23:56:34,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1712626.6666666667, ans=0.125 2023-11-21 23:56:37,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1712693.3333333333, ans=0.125 2023-11-21 23:56:46,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1712693.3333333333, ans=0.125 2023-11-21 23:56:55,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.141e+01 8.812e+01 9.440e+01 1.228e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-21 23:56:55,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1712760.0, ans=0.125 2023-11-21 23:56:57,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-21 23:56:59,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1712760.0, ans=0.125 2023-11-21 23:57:01,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2023-11-21 23:57:06,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1712826.6666666667, ans=0.125 2023-11-21 23:57:25,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1712893.3333333333, ans=0.2 2023-11-21 23:57:29,767 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4450, loss[loss=0.06062, simple_loss=0.07424, pruned_loss=0.01174, audio_tagging_loss=0.01176, over 15702.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09584, pruned_loss=0.01601, audio_tagging_loss=0.009342, over 3038886.31 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:57:35,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 256950 2023-11-21 23:57:48,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1713026.6666666667, ans=0.2 2023-11-21 23:58:00,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-21 23:58:00,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-21 23:58:20,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1713226.6666666667, ans=0.125 2023-11-21 23:58:34,704 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4500, loss[loss=0.07176, simple_loss=0.1008, pruned_loss=0.01173, audio_tagging_loss=0.00966, over 15320.00 frames. ], tot_loss[loss=0.07325, simple_loss=0.09614, pruned_loss=0.01589, audio_tagging_loss=0.009291, over 3048341.30 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:58:40,211 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257000 2023-11-21 23:58:43,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-21 23:58:44,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1713293.3333333333, ans=0.125 2023-11-21 23:59:01,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1713426.6666666667, ans=0.0 2023-11-21 23:59:03,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.360e+01 9.049e+01 9.861e+01 1.318e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-21 23:59:12,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1713493.3333333333, ans=0.1 2023-11-21 23:59:13,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1713493.3333333333, ans=0.125 2023-11-21 23:59:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1713493.3333333333, ans=0.0 2023-11-21 23:59:23,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2023-11-21 23:59:28,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1713560.0, ans=0.5 2023-11-21 23:59:32,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-11-21 23:59:36,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2023-11-21 23:59:39,441 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4550, loss[loss=0.05864, simple_loss=0.07609, pruned_loss=0.009619, audio_tagging_loss=0.01097, over 15236.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09521, pruned_loss=0.01564, audio_tagging_loss=0.009394, over 3042902.27 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-21 23:59:44,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257050 2023-11-22 00:00:22,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1713826.6666666667, ans=0.5 2023-11-22 00:00:27,886 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 00:00:42,548 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4600, loss[loss=0.06166, simple_loss=0.07642, pruned_loss=0.01506, audio_tagging_loss=0.008386, over 15150.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09461, pruned_loss=0.01572, audio_tagging_loss=0.009443, over 3043275.67 frames. ], batch size: 59, lr: 3.12e-03, grad_scale: 32.0 2023-11-22 00:00:48,788 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257100 2023-11-22 00:00:50,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1713960.0, ans=0.0 2023-11-22 00:01:04,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1714026.6666666667, ans=0.0 2023-11-22 00:01:12,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1714093.3333333333, ans=0.1 2023-11-22 00:01:13,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.884e+01 7.906e+01 8.473e+01 9.187e+01 1.183e+02, threshold=1.695e+02, percent-clipped=0.0 2023-11-22 00:01:19,527 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:01:36,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1714226.6666666667, ans=0.125 2023-11-22 00:01:44,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1714226.6666666667, ans=0.125 2023-11-22 00:01:46,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1714293.3333333333, ans=0.125 2023-11-22 00:01:47,027 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4650, loss[loss=0.06529, simple_loss=0.08743, pruned_loss=0.009437, audio_tagging_loss=0.01214, over 15135.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09457, pruned_loss=0.01554, audio_tagging_loss=0.009541, over 3048613.46 frames. ], batch size: 54, lr: 3.12e-03, grad_scale: 32.0 2023-11-22 00:01:49,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2023-11-22 00:01:53,240 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257150 2023-11-22 00:01:55,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1714293.3333333333, ans=0.2 2023-11-22 00:01:58,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1714293.3333333333, ans=0.2 2023-11-22 00:02:14,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1714426.6666666667, ans=0.125 2023-11-22 00:02:20,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1714426.6666666667, ans=0.0 2023-11-22 00:02:45,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1714560.0, ans=0.2 2023-11-22 00:02:51,267 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4700, loss[loss=0.08519, simple_loss=0.1081, pruned_loss=0.02081, audio_tagging_loss=0.01031, over 14448.00 frames. ], tot_loss[loss=0.07281, simple_loss=0.09483, pruned_loss=0.01581, audio_tagging_loss=0.00958, over 3050487.40 frames. ], batch size: 54, lr: 3.12e-03, grad_scale: 32.0 2023-11-22 00:02:56,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257200 2023-11-22 00:03:19,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1714760.0, ans=0.015 2023-11-22 00:03:21,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 8.209e+01 8.969e+01 9.577e+01 1.305e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 00:03:27,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1714760.0, ans=0.0 2023-11-22 00:03:36,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-22 00:03:56,074 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4750, loss[loss=0.08663, simple_loss=0.1112, pruned_loss=0.02117, audio_tagging_loss=0.009858, over 15834.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.09437, pruned_loss=0.01568, audio_tagging_loss=0.009744, over 3043665.00 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:03:59,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-22 00:04:01,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257250 2023-11-22 00:04:09,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1715026.6666666667, ans=0.125 2023-11-22 00:04:48,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-22 00:05:00,628 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4800, loss[loss=0.08401, simple_loss=0.1137, pruned_loss=0.01896, audio_tagging_loss=0.008221, over 14869.00 frames. ], tot_loss[loss=0.07295, simple_loss=0.09443, pruned_loss=0.0159, audio_tagging_loss=0.009842, over 3042176.22 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:05:05,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257300 2023-11-22 00:05:05,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1715293.3333333333, ans=0.0 2023-11-22 00:05:10,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1715293.3333333333, ans=0.125 2023-11-22 00:05:13,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1715360.0, ans=0.2 2023-11-22 00:05:24,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-22 00:05:30,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.329e+01 8.144e+01 8.815e+01 9.467e+01 1.388e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 00:05:42,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1715493.3333333333, ans=0.0 2023-11-22 00:05:55,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1715560.0, ans=0.0 2023-11-22 00:05:58,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-11-22 00:06:05,299 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4850, loss[loss=0.07484, simple_loss=0.0858, pruned_loss=0.0181, audio_tagging_loss=0.01383, over 15008.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.095, pruned_loss=0.0159, audio_tagging_loss=0.009944, over 3048075.55 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:06:07,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1715626.6666666667, ans=0.0 2023-11-22 00:06:10,257 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257350 2023-11-22 00:06:23,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1715693.3333333333, ans=0.125 2023-11-22 00:06:40,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-22 00:06:49,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1715826.6666666667, ans=0.0 2023-11-22 00:07:00,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1715893.3333333333, ans=0.2 2023-11-22 00:07:04,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1715893.3333333333, ans=0.1 2023-11-22 00:07:05,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1715893.3333333333, ans=0.0 2023-11-22 00:07:07,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-11-22 00:07:09,103 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4900, loss[loss=0.05807, simple_loss=0.07105, pruned_loss=0.01254, audio_tagging_loss=0.01001, over 16576.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.09493, pruned_loss=0.01604, audio_tagging_loss=0.009832, over 3049357.99 frames. ], batch size: 67, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:07:14,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257400 2023-11-22 00:07:27,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1716026.6666666667, ans=0.0 2023-11-22 00:07:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1716026.6666666667, ans=0.125 2023-11-22 00:07:38,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.110e+01 8.831e+01 9.694e+01 1.184e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 00:07:59,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1716226.6666666667, ans=0.0 2023-11-22 00:08:14,174 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 4950, loss[loss=0.06839, simple_loss=0.0889, pruned_loss=0.01668, audio_tagging_loss=0.007264, over 14606.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09455, pruned_loss=0.01605, audio_tagging_loss=0.009586, over 3046434.87 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:08:17,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-22 00:08:19,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257450 2023-11-22 00:08:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1716426.6666666667, ans=15.0 2023-11-22 00:09:11,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1716560.0, ans=0.1 2023-11-22 00:09:13,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1716560.0, ans=0.125 2023-11-22 00:09:18,242 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5000, loss[loss=0.07179, simple_loss=0.09122, pruned_loss=0.01579, audio_tagging_loss=0.01039, over 15324.00 frames. ], tot_loss[loss=0.07273, simple_loss=0.09471, pruned_loss=0.0159, audio_tagging_loss=0.009473, over 3043993.98 frames. ], batch size: 60, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:09:23,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257500 2023-11-22 00:09:28,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1716626.6666666667, ans=0.0 2023-11-22 00:09:48,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.146e+01 8.929e+01 9.625e+01 1.123e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 00:09:49,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1716760.0, ans=0.1 2023-11-22 00:10:00,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1716826.6666666667, ans=0.125 2023-11-22 00:10:15,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1716893.3333333333, ans=0.0 2023-11-22 00:10:22,306 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5050, loss[loss=0.08811, simple_loss=0.116, pruned_loss=0.02116, audio_tagging_loss=0.008945, over 15804.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.09385, pruned_loss=0.01567, audio_tagging_loss=0.009512, over 3046872.13 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:10:27,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257550 2023-11-22 00:10:46,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2023-11-22 00:10:46,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-22 00:11:04,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1717160.0, ans=0.125 2023-11-22 00:11:17,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1717226.6666666667, ans=0.07 2023-11-22 00:11:19,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1717226.6666666667, ans=0.0 2023-11-22 00:11:28,000 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5100, loss[loss=0.0647, simple_loss=0.08678, pruned_loss=0.01186, audio_tagging_loss=0.009442, over 16259.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.0933, pruned_loss=0.01561, audio_tagging_loss=0.009436, over 3043502.58 frames. ], batch size: 61, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:11:33,101 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257600 2023-11-22 00:11:38,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.64 vs. limit=10.0 2023-11-22 00:11:47,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1717360.0, ans=0.1 2023-11-22 00:11:58,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 7.923e+01 8.751e+01 9.756e+01 1.347e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 00:12:13,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1717493.3333333333, ans=0.125 2023-11-22 00:12:20,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1717560.0, ans=0.0 2023-11-22 00:12:34,137 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5150, loss[loss=0.09947, simple_loss=0.1332, pruned_loss=0.02417, audio_tagging_loss=0.008703, over 15614.00 frames. ], tot_loss[loss=0.07203, simple_loss=0.09392, pruned_loss=0.01574, audio_tagging_loss=0.009321, over 3048065.10 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:12:39,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257650 2023-11-22 00:12:46,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2023-11-22 00:13:12,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1717826.6666666667, ans=0.2 2023-11-22 00:13:34,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1717893.3333333333, ans=0.07 2023-11-22 00:13:39,403 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5200, loss[loss=0.07371, simple_loss=0.09598, pruned_loss=0.01695, audio_tagging_loss=0.008768, over 15668.00 frames. ], tot_loss[loss=0.07273, simple_loss=0.09537, pruned_loss=0.01582, audio_tagging_loss=0.009222, over 3039303.71 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:13:44,437 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257700 2023-11-22 00:13:49,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1717960.0, ans=0.125 2023-11-22 00:13:55,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1718026.6666666667, ans=15.0 2023-11-22 00:14:01,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1718026.6666666667, ans=0.125 2023-11-22 00:14:05,508 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:14:09,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.727e+01 8.143e+01 8.728e+01 9.361e+01 2.351e+02, threshold=1.746e+02, percent-clipped=1.0 2023-11-22 00:14:28,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-22 00:14:37,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1718226.6666666667, ans=0.0 2023-11-22 00:14:43,496 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5250, loss[loss=0.07111, simple_loss=0.09093, pruned_loss=0.01483, audio_tagging_loss=0.01081, over 14934.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.09452, pruned_loss=0.01583, audio_tagging_loss=0.009213, over 3040760.90 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:14:44,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-22 00:14:49,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257750 2023-11-22 00:14:50,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1718293.3333333333, ans=0.125 2023-11-22 00:15:07,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1718426.6666666667, ans=0.2 2023-11-22 00:15:09,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1718426.6666666667, ans=0.0 2023-11-22 00:15:12,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1718426.6666666667, ans=0.09899494936611666 2023-11-22 00:15:25,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1718493.3333333333, ans=0.0 2023-11-22 00:15:30,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1718493.3333333333, ans=0.0 2023-11-22 00:15:39,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1718560.0, ans=0.125 2023-11-22 00:15:48,171 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5300, loss[loss=0.06188, simple_loss=0.07573, pruned_loss=0.01302, audio_tagging_loss=0.01099, over 13400.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09485, pruned_loss=0.0159, audio_tagging_loss=0.009256, over 3044139.32 frames. ], batch size: 53, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:15:53,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257800 2023-11-22 00:15:56,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1718626.6666666667, ans=0.0 2023-11-22 00:16:12,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1718760.0, ans=0.125 2023-11-22 00:16:17,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.246e+01 8.912e+01 9.804e+01 1.159e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 00:16:20,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1718760.0, ans=0.125 2023-11-22 00:16:30,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2023-11-22 00:16:37,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1718826.6666666667, ans=22.5 2023-11-22 00:16:52,153 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5350, loss[loss=0.05459, simple_loss=0.06652, pruned_loss=0.009799, audio_tagging_loss=0.01153, over 14205.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09424, pruned_loss=0.01588, audio_tagging_loss=0.009368, over 3036242.11 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:16:57,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257850 2023-11-22 00:17:03,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1718960.0, ans=0.1 2023-11-22 00:17:16,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.66 vs. limit=15.0 2023-11-22 00:17:17,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1719093.3333333333, ans=0.2 2023-11-22 00:17:44,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1719226.6666666667, ans=0.125 2023-11-22 00:17:57,071 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5400, loss[loss=0.07166, simple_loss=0.08636, pruned_loss=0.0163, audio_tagging_loss=0.01218, over 14904.00 frames. ], tot_loss[loss=0.07248, simple_loss=0.09449, pruned_loss=0.01587, audio_tagging_loss=0.009367, over 3034179.35 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:18:02,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257900 2023-11-22 00:18:07,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-22 00:18:08,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1719360.0, ans=0.2 2023-11-22 00:18:27,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 7.882e+01 8.565e+01 9.189e+01 1.192e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-22 00:18:30,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1719426.6666666667, ans=0.125 2023-11-22 00:18:40,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1719493.3333333333, ans=0.125 2023-11-22 00:18:42,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1719493.3333333333, ans=0.0 2023-11-22 00:19:01,111 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5450, loss[loss=0.05735, simple_loss=0.07206, pruned_loss=0.01095, audio_tagging_loss=0.01037, over 15097.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09489, pruned_loss=0.01601, audio_tagging_loss=0.00951, over 3032428.01 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:19:04,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1719626.6666666667, ans=0.0 2023-11-22 00:19:06,706 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 257950 2023-11-22 00:19:11,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1719626.6666666667, ans=0.125 2023-11-22 00:20:05,636 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5500, loss[loss=0.06132, simple_loss=0.0713, pruned_loss=0.01434, audio_tagging_loss=0.01134, over 14600.00 frames. ], tot_loss[loss=0.07376, simple_loss=0.09605, pruned_loss=0.01624, audio_tagging_loss=0.009497, over 3040854.49 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:20:10,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258000 2023-11-22 00:20:19,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1720026.6666666667, ans=0.125 2023-11-22 00:20:36,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.285e+01 8.796e+01 9.521e+01 1.194e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 00:20:59,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=22.5 2023-11-22 00:21:08,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1720293.3333333333, ans=0.125 2023-11-22 00:21:09,037 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5550, loss[loss=0.0805, simple_loss=0.1106, pruned_loss=0.01346, audio_tagging_loss=0.01174, over 15527.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09509, pruned_loss=0.01589, audio_tagging_loss=0.009603, over 3033769.42 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:21:11,809 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:21:14,678 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258050 2023-11-22 00:21:37,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1720426.6666666667, ans=0.0 2023-11-22 00:21:39,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1720426.6666666667, ans=0.125 2023-11-22 00:22:13,632 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5600, loss[loss=0.05738, simple_loss=0.078, pruned_loss=0.0105, audio_tagging_loss=0.007886, over 14434.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09453, pruned_loss=0.01582, audio_tagging_loss=0.009701, over 3043224.24 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:22:19,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258100 2023-11-22 00:22:21,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1720626.6666666667, ans=0.2 2023-11-22 00:22:24,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-22 00:22:30,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1720693.3333333333, ans=0.0 2023-11-22 00:22:41,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=1720760.0, ans=10.0 2023-11-22 00:22:43,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 7.788e+01 8.528e+01 9.166e+01 1.353e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-22 00:22:57,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1720826.6666666667, ans=0.2 2023-11-22 00:22:57,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1720826.6666666667, ans=0.0 2023-11-22 00:22:58,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1720826.6666666667, ans=0.125 2023-11-22 00:22:59,883 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 00:23:12,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-22 00:23:17,566 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5650, loss[loss=0.08213, simple_loss=0.1028, pruned_loss=0.01859, audio_tagging_loss=0.01214, over 14837.00 frames. ], tot_loss[loss=0.07283, simple_loss=0.09455, pruned_loss=0.01574, audio_tagging_loss=0.009818, over 3049142.09 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:23:20,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1720960.0, ans=0.015 2023-11-22 00:23:22,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258150 2023-11-22 00:23:30,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-22 00:23:33,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1721026.6666666667, ans=0.125 2023-11-22 00:23:35,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1721026.6666666667, ans=0.125 2023-11-22 00:23:39,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1721026.6666666667, ans=0.0 2023-11-22 00:23:47,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-22 00:23:51,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1721093.3333333333, ans=0.0 2023-11-22 00:23:55,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1721160.0, ans=0.125 2023-11-22 00:24:09,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1721226.6666666667, ans=0.1 2023-11-22 00:24:21,005 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5700, loss[loss=0.07484, simple_loss=0.1078, pruned_loss=0.01385, audio_tagging_loss=0.007095, over 15392.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.0949, pruned_loss=0.01582, audio_tagging_loss=0.009748, over 3048913.35 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:24:25,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258200 2023-11-22 00:24:41,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1721360.0, ans=0.2 2023-11-22 00:24:52,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.074e+01 8.821e+01 9.592e+01 1.274e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 00:24:58,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1721426.6666666667, ans=0.2 2023-11-22 00:25:15,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1721560.0, ans=0.1 2023-11-22 00:25:15,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1721560.0, ans=0.0 2023-11-22 00:25:26,446 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5750, loss[loss=0.05698, simple_loss=0.06451, pruned_loss=0.01043, audio_tagging_loss=0.01429, over 15435.00 frames. ], tot_loss[loss=0.0728, simple_loss=0.09452, pruned_loss=0.01583, audio_tagging_loss=0.009707, over 3056684.53 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:25:31,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258250 2023-11-22 00:25:51,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1721760.0, ans=0.1 2023-11-22 00:25:57,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1721760.0, ans=0.125 2023-11-22 00:26:13,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1721826.6666666667, ans=0.0 2023-11-22 00:26:29,997 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5800, loss[loss=0.04545, simple_loss=0.04743, pruned_loss=0.007686, audio_tagging_loss=0.01405, over 15513.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.09381, pruned_loss=0.01577, audio_tagging_loss=0.009617, over 3051147.44 frames. ], batch size: 62, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:26:31,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-22 00:26:34,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258300 2023-11-22 00:26:42,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-22 00:26:43,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1722026.6666666667, ans=0.0 2023-11-22 00:26:43,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-22 00:26:44,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1722026.6666666667, ans=0.125 2023-11-22 00:26:55,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1722093.3333333333, ans=0.04949747468305833 2023-11-22 00:26:59,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1722093.3333333333, ans=0.2 2023-11-22 00:27:01,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.125e+01 8.820e+01 9.414e+01 1.363e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 00:27:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1722093.3333333333, ans=0.0 2023-11-22 00:27:09,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1722160.0, ans=0.125 2023-11-22 00:27:26,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1722226.6666666667, ans=0.09899494936611666 2023-11-22 00:27:29,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1722226.6666666667, ans=0.125 2023-11-22 00:27:33,622 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5850, loss[loss=0.05179, simple_loss=0.06945, pruned_loss=0.007267, audio_tagging_loss=0.009795, over 15875.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09394, pruned_loss=0.01581, audio_tagging_loss=0.00959, over 3049938.32 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:27:33,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1722293.3333333333, ans=0.09899494936611666 2023-11-22 00:27:38,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258350 2023-11-22 00:27:43,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2023-11-22 00:27:50,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-11-22 00:28:01,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1722426.6666666667, ans=0.125 2023-11-22 00:28:07,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1722426.6666666667, ans=0.125 2023-11-22 00:28:09,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1722426.6666666667, ans=0.1 2023-11-22 00:28:22,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1722493.3333333333, ans=0.125 2023-11-22 00:28:38,106 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5900, loss[loss=0.07708, simple_loss=0.1007, pruned_loss=0.01798, audio_tagging_loss=0.008775, over 16059.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09436, pruned_loss=0.01585, audio_tagging_loss=0.009503, over 3050194.92 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:28:43,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258400 2023-11-22 00:28:54,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1722693.3333333333, ans=0.125 2023-11-22 00:29:08,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-22 00:29:10,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.841e+01 8.273e+01 8.751e+01 9.457e+01 2.979e+02, threshold=1.750e+02, percent-clipped=1.0 2023-11-22 00:29:25,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1722826.6666666667, ans=0.0 2023-11-22 00:29:41,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1722893.3333333333, ans=0.1 2023-11-22 00:29:43,306 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 5950, loss[loss=0.05872, simple_loss=0.07482, pruned_loss=0.01186, audio_tagging_loss=0.009454, over 14670.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09316, pruned_loss=0.01568, audio_tagging_loss=0.009442, over 3050193.34 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 16.0 2023-11-22 00:29:48,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258450 2023-11-22 00:29:54,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1723026.6666666667, ans=0.0 2023-11-22 00:29:58,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1723026.6666666667, ans=0.2 2023-11-22 00:30:14,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1723093.3333333333, ans=0.2 2023-11-22 00:30:14,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1723093.3333333333, ans=0.125 2023-11-22 00:30:18,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-22 00:30:18,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2023-11-22 00:30:32,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1723160.0, ans=0.125 2023-11-22 00:30:44,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1723226.6666666667, ans=0.07 2023-11-22 00:30:46,871 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6000, loss[loss=0.06001, simple_loss=0.0729, pruned_loss=0.01204, audio_tagging_loss=0.01152, over 17930.00 frames. ], tot_loss[loss=0.07211, simple_loss=0.0939, pruned_loss=0.01578, audio_tagging_loss=0.009379, over 3053930.46 frames. ], batch size: 69, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:30:46,872 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 00:31:27,174 INFO [train_asr.py:1253] (1/4) Epoch 22, validation: loss=0.05958, simple_loss=0.05193, pruned_loss=0.005178, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-22 00:31:27,175 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 00:31:32,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258500 2023-11-22 00:31:32,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1723293.3333333333, ans=0.1 2023-11-22 00:31:43,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1723360.0, ans=0.0 2023-11-22 00:31:48,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1723360.0, ans=0.125 2023-11-22 00:31:51,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1723426.6666666667, ans=0.2 2023-11-22 00:31:58,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.255e+01 8.695e+01 9.611e+01 1.390e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 00:31:59,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1723426.6666666667, ans=0.5 2023-11-22 00:32:04,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-22 00:32:14,258 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 00:32:17,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-22 00:32:18,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1723560.0, ans=0.0 2023-11-22 00:32:31,170 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6050, loss[loss=0.09548, simple_loss=0.1202, pruned_loss=0.02761, audio_tagging_loss=0.007754, over 14929.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.0946, pruned_loss=0.01577, audio_tagging_loss=0.009234, over 3054632.79 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:32:36,078 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258550 2023-11-22 00:32:44,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1723693.3333333333, ans=0.125 2023-11-22 00:32:47,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1723693.3333333333, ans=0.2 2023-11-22 00:32:47,391 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:33:10,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:33:30,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1723893.3333333333, ans=0.125 2023-11-22 00:33:34,419 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6100, loss[loss=0.06976, simple_loss=0.09395, pruned_loss=0.01391, audio_tagging_loss=0.008881, over 15073.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09554, pruned_loss=0.01604, audio_tagging_loss=0.009146, over 3053643.41 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:33:39,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258600 2023-11-22 00:33:39,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:33:57,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1724026.6666666667, ans=0.125 2023-11-22 00:33:59,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-22 00:34:07,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.673e+01 8.187e+01 8.838e+01 9.553e+01 1.618e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-22 00:34:32,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1724226.6666666667, ans=0.0 2023-11-22 00:34:39,404 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6150, loss[loss=0.06242, simple_loss=0.07197, pruned_loss=0.01249, audio_tagging_loss=0.01395, over 14463.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09551, pruned_loss=0.01593, audio_tagging_loss=0.009225, over 3053552.68 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:34:42,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-22 00:34:44,419 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258650 2023-11-22 00:34:51,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1724360.0, ans=0.0 2023-11-22 00:35:08,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2023-11-22 00:35:15,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-22 00:35:26,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1724493.3333333333, ans=0.125 2023-11-22 00:35:43,773 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6200, loss[loss=0.09375, simple_loss=0.1086, pruned_loss=0.02553, audio_tagging_loss=0.01394, over 14467.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09522, pruned_loss=0.01601, audio_tagging_loss=0.009288, over 3052372.17 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:35:48,833 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258700 2023-11-22 00:36:10,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1724760.0, ans=0.125 2023-11-22 00:36:15,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.055e+01 8.630e+01 9.332e+01 1.528e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-22 00:36:23,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1724826.6666666667, ans=0.0 2023-11-22 00:36:30,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1724826.6666666667, ans=0.125 2023-11-22 00:36:44,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1724893.3333333333, ans=0.0 2023-11-22 00:36:47,998 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6250, loss[loss=0.08642, simple_loss=0.1174, pruned_loss=0.02135, audio_tagging_loss=0.006359, over 15272.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09526, pruned_loss=0.01592, audio_tagging_loss=0.009415, over 3057125.41 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:36:53,061 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258750 2023-11-22 00:37:10,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1725026.6666666667, ans=0.125 2023-11-22 00:37:43,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-22 00:37:52,613 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6300, loss[loss=0.05469, simple_loss=0.06675, pruned_loss=0.01027, audio_tagging_loss=0.01105, over 13756.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.09521, pruned_loss=0.01585, audio_tagging_loss=0.00956, over 3053791.87 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:37:55,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2023-11-22 00:37:58,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258800 2023-11-22 00:38:01,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=12.0 2023-11-22 00:38:03,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1725293.3333333333, ans=0.0 2023-11-22 00:38:10,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1725360.0, ans=0.125 2023-11-22 00:38:18,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1725426.6666666667, ans=0.125 2023-11-22 00:38:25,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.059e+01 8.458e+01 8.852e+01 9.578e+01 1.235e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-22 00:38:30,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-22 00:38:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1725493.3333333333, ans=0.1 2023-11-22 00:38:35,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1725493.3333333333, ans=0.125 2023-11-22 00:38:58,134 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6350, loss[loss=0.06324, simple_loss=0.07732, pruned_loss=0.0115, audio_tagging_loss=0.01309, over 14736.00 frames. ], tot_loss[loss=0.07347, simple_loss=0.09589, pruned_loss=0.01599, audio_tagging_loss=0.00953, over 3050026.57 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 32.0 2023-11-22 00:39:03,143 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258850 2023-11-22 00:39:04,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1725626.6666666667, ans=0.125 2023-11-22 00:39:10,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1725693.3333333333, ans=0.0 2023-11-22 00:39:19,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1725693.3333333333, ans=0.2 2023-11-22 00:39:37,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-11-22 00:39:50,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-22 00:39:53,096 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:39:53,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1725893.3333333333, ans=0.1 2023-11-22 00:40:01,465 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6400, loss[loss=0.06729, simple_loss=0.09024, pruned_loss=0.0112, audio_tagging_loss=0.01097, over 15319.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.0957, pruned_loss=0.01596, audio_tagging_loss=0.009679, over 3043811.73 frames. ], batch size: 57, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:40:02,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1725960.0, ans=0.125 2023-11-22 00:40:07,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258900 2023-11-22 00:40:13,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1726026.6666666667, ans=0.2 2023-11-22 00:40:15,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2023-11-22 00:40:26,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1726093.3333333333, ans=0.09899494936611666 2023-11-22 00:40:29,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-22 00:40:34,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.092e+01 8.951e+01 9.607e+01 2.279e+02, threshold=1.790e+02, percent-clipped=1.0 2023-11-22 00:40:52,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1726226.6666666667, ans=0.125 2023-11-22 00:40:52,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1726226.6666666667, ans=0.125 2023-11-22 00:40:53,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2023-11-22 00:41:05,269 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6450, loss[loss=0.07378, simple_loss=0.0836, pruned_loss=0.01722, audio_tagging_loss=0.01476, over 15309.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09535, pruned_loss=0.01601, audio_tagging_loss=0.00973, over 3042389.79 frames. ], batch size: 57, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:41:11,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 258950 2023-11-22 00:41:23,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-11-22 00:41:28,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-22 00:41:31,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1726426.6666666667, ans=0.2 2023-11-22 00:41:32,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1726426.6666666667, ans=0.0 2023-11-22 00:41:44,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1726493.3333333333, ans=0.0 2023-11-22 00:42:03,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1726560.0, ans=0.125 2023-11-22 00:42:05,751 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:42:10,285 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6500, loss[loss=0.06579, simple_loss=0.08733, pruned_loss=0.01633, audio_tagging_loss=0.005799, over 14565.00 frames. ], tot_loss[loss=0.07324, simple_loss=0.09529, pruned_loss=0.01592, audio_tagging_loss=0.009677, over 3047284.78 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:42:13,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1726626.6666666667, ans=0.125 2023-11-22 00:42:15,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259000 2023-11-22 00:42:20,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1726626.6666666667, ans=0.2 2023-11-22 00:42:27,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1726693.3333333333, ans=0.125 2023-11-22 00:42:30,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1726693.3333333333, ans=0.125 2023-11-22 00:42:42,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1726760.0, ans=0.2 2023-11-22 00:42:42,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.155e+01 8.144e+01 8.732e+01 9.307e+01 1.366e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 00:43:02,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-22 00:43:15,081 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6550, loss[loss=0.07863, simple_loss=0.09366, pruned_loss=0.02021, audio_tagging_loss=0.01159, over 16322.00 frames. ], tot_loss[loss=0.07234, simple_loss=0.09427, pruned_loss=0.01562, audio_tagging_loss=0.009584, over 3049490.48 frames. ], batch size: 61, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:43:15,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1726960.0, ans=0.125 2023-11-22 00:43:20,268 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259050 2023-11-22 00:43:56,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1727160.0, ans=0.0 2023-11-22 00:44:07,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1727226.6666666667, ans=22.5 2023-11-22 00:44:18,558 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6600, loss[loss=0.08578, simple_loss=0.117, pruned_loss=0.0195, audio_tagging_loss=0.007799, over 15747.00 frames. ], tot_loss[loss=0.07243, simple_loss=0.0947, pruned_loss=0.01564, audio_tagging_loss=0.009443, over 3047015.28 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:44:20,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1727293.3333333333, ans=0.1 2023-11-22 00:44:24,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259100 2023-11-22 00:44:31,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-22 00:44:39,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1727360.0, ans=0.125 2023-11-22 00:44:52,435 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.795e+01 8.221e+01 8.740e+01 9.559e+01 1.333e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 00:45:08,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1727560.0, ans=0.125 2023-11-22 00:45:11,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2023-11-22 00:45:15,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-11-22 00:45:23,539 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6650, loss[loss=0.07046, simple_loss=0.0914, pruned_loss=0.0148, audio_tagging_loss=0.009962, over 13947.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09524, pruned_loss=0.01567, audio_tagging_loss=0.009424, over 3049767.07 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:45:25,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1727626.6666666667, ans=0.125 2023-11-22 00:45:29,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259150 2023-11-22 00:45:41,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1727693.3333333333, ans=0.0 2023-11-22 00:45:55,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1727760.0, ans=0.95 2023-11-22 00:46:00,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-22 00:46:02,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1727826.6666666667, ans=0.0 2023-11-22 00:46:18,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1727893.3333333333, ans=0.125 2023-11-22 00:46:20,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1727893.3333333333, ans=0.0 2023-11-22 00:46:27,267 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6700, loss[loss=0.05994, simple_loss=0.07398, pruned_loss=0.008824, audio_tagging_loss=0.01412, over 14784.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09578, pruned_loss=0.01596, audio_tagging_loss=0.009439, over 3047549.71 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 8.0 2023-11-22 00:46:32,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259200 2023-11-22 00:46:55,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-22 00:47:01,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 7.937e+01 8.531e+01 9.262e+01 1.112e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-22 00:47:02,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1728093.3333333333, ans=0.05 2023-11-22 00:47:03,392 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:47:05,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1728160.0, ans=0.0 2023-11-22 00:47:07,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1728160.0, ans=0.125 2023-11-22 00:47:07,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1728160.0, ans=0.125 2023-11-22 00:47:30,772 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6750, loss[loss=0.05256, simple_loss=0.076, pruned_loss=0.007559, audio_tagging_loss=0.007005, over 15285.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09452, pruned_loss=0.01562, audio_tagging_loss=0.009391, over 3046438.33 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 8.0 2023-11-22 00:47:34,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1728293.3333333333, ans=0.0 2023-11-22 00:47:35,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259250 2023-11-22 00:47:47,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1728360.0, ans=0.95 2023-11-22 00:47:50,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1728360.0, ans=0.0 2023-11-22 00:48:05,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2023-11-22 00:48:12,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1728493.3333333333, ans=0.1 2023-11-22 00:48:26,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-11-22 00:48:32,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1728560.0, ans=0.125 2023-11-22 00:48:33,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1728560.0, ans=0.0 2023-11-22 00:48:36,250 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6800, loss[loss=0.08717, simple_loss=0.1106, pruned_loss=0.02259, audio_tagging_loss=0.009301, over 15541.00 frames. ], tot_loss[loss=0.07195, simple_loss=0.09389, pruned_loss=0.01564, audio_tagging_loss=0.009362, over 3038038.74 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:48:41,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259300 2023-11-22 00:48:57,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1728693.3333333333, ans=0.0 2023-11-22 00:48:59,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1728693.3333333333, ans=0.2 2023-11-22 00:49:09,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.092e+01 8.012e+01 8.646e+01 9.288e+01 1.206e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 00:49:12,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1728826.6666666667, ans=0.0 2023-11-22 00:49:40,198 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6850, loss[loss=0.04838, simple_loss=0.05147, pruned_loss=0.008127, audio_tagging_loss=0.01452, over 14011.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09497, pruned_loss=0.01582, audio_tagging_loss=0.009367, over 3037675.27 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:49:42,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1728960.0, ans=0.125 2023-11-22 00:49:45,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259350 2023-11-22 00:50:14,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-22 00:50:15,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1729093.3333333333, ans=0.0 2023-11-22 00:50:17,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1729160.0, ans=0.125 2023-11-22 00:50:27,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2023-11-22 00:50:42,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1729293.3333333333, ans=0.1 2023-11-22 00:50:43,748 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6900, loss[loss=0.04936, simple_loss=0.05464, pruned_loss=0.006535, audio_tagging_loss=0.0155, over 14419.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09487, pruned_loss=0.01595, audio_tagging_loss=0.00939, over 3039823.54 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:50:48,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259400 2023-11-22 00:51:12,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1729426.6666666667, ans=0.0 2023-11-22 00:51:19,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.107e+01 8.289e+01 8.755e+01 9.420e+01 1.385e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 00:51:35,602 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 00:51:38,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1729560.0, ans=0.0 2023-11-22 00:51:49,228 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 6950, loss[loss=0.06112, simple_loss=0.07801, pruned_loss=0.01407, audio_tagging_loss=0.008044, over 14189.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.09412, pruned_loss=0.01574, audio_tagging_loss=0.009458, over 3041156.77 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:51:52,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1729626.6666666667, ans=0.0 2023-11-22 00:51:54,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259450 2023-11-22 00:52:10,312 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:52:30,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1729826.6666666667, ans=0.0 2023-11-22 00:52:36,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2023-11-22 00:52:39,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1729893.3333333333, ans=0.125 2023-11-22 00:52:49,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1729893.3333333333, ans=0.2 2023-11-22 00:52:52,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1729893.3333333333, ans=0.0 2023-11-22 00:52:54,423 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7000, loss[loss=0.071, simple_loss=0.08755, pruned_loss=0.01642, audio_tagging_loss=0.0108, over 15028.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09392, pruned_loss=0.01558, audio_tagging_loss=0.009509, over 3046157.96 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:52:54,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1729960.0, ans=0.125 2023-11-22 00:52:59,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259500 2023-11-22 00:53:00,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1729960.0, ans=0.125 2023-11-22 00:53:09,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730026.6666666667, ans=0.1 2023-11-22 00:53:10,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-22 00:53:28,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.568e+01 7.948e+01 8.560e+01 9.274e+01 1.120e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-22 00:53:47,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1730226.6666666667, ans=0.0 2023-11-22 00:53:48,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1730226.6666666667, ans=0.1 2023-11-22 00:53:58,449 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7050, loss[loss=0.04941, simple_loss=0.04946, pruned_loss=0.006885, audio_tagging_loss=0.0178, over 14228.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.0941, pruned_loss=0.01557, audio_tagging_loss=0.009572, over 3047261.08 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:54:03,491 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259550 2023-11-22 00:54:10,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1730360.0, ans=0.125 2023-11-22 00:54:14,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1730360.0, ans=0.0 2023-11-22 00:54:19,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1730360.0, ans=0.125 2023-11-22 00:54:33,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1730426.6666666667, ans=0.1 2023-11-22 00:54:43,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1730493.3333333333, ans=0.125 2023-11-22 00:54:59,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1730560.0, ans=0.0 2023-11-22 00:55:02,333 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7100, loss[loss=0.08148, simple_loss=0.1142, pruned_loss=0.01627, audio_tagging_loss=0.008098, over 14823.00 frames. ], tot_loss[loss=0.07218, simple_loss=0.09375, pruned_loss=0.01566, audio_tagging_loss=0.009649, over 3038755.04 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:55:05,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1730626.6666666667, ans=0.0 2023-11-22 00:55:07,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259600 2023-11-22 00:55:12,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1730626.6666666667, ans=0.035 2023-11-22 00:55:13,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2023-11-22 00:55:14,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1730693.3333333333, ans=0.0 2023-11-22 00:55:37,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.168e+01 8.717e+01 9.423e+01 1.311e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-22 00:55:44,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2023-11-22 00:56:07,809 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7150, loss[loss=0.07112, simple_loss=0.09966, pruned_loss=0.01211, audio_tagging_loss=0.009171, over 15135.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09379, pruned_loss=0.01571, audio_tagging_loss=0.00975, over 3043880.12 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 00:56:12,779 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259650 2023-11-22 00:56:24,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1731026.6666666667, ans=0.0 2023-11-22 00:56:30,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1731026.6666666667, ans=0.125 2023-11-22 00:56:37,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1731093.3333333333, ans=0.125 2023-11-22 00:56:41,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 00:56:54,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1731160.0, ans=0.0 2023-11-22 00:57:00,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1731226.6666666667, ans=0.125 2023-11-22 00:57:02,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-11-22 00:57:11,635 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7200, loss[loss=0.06971, simple_loss=0.09753, pruned_loss=0.01277, audio_tagging_loss=0.008175, over 14621.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.0946, pruned_loss=0.01569, audio_tagging_loss=0.009731, over 3044374.14 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:57:16,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259700 2023-11-22 00:57:19,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1731293.3333333333, ans=0.0 2023-11-22 00:57:46,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.081e+01 8.874e+01 9.710e+01 1.298e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 00:58:08,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1731560.0, ans=0.025 2023-11-22 00:58:15,114 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7250, loss[loss=0.08408, simple_loss=0.1065, pruned_loss=0.02064, audio_tagging_loss=0.01018, over 15161.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09518, pruned_loss=0.01582, audio_tagging_loss=0.009889, over 3044222.70 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:58:21,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259750 2023-11-22 00:58:28,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1731693.3333333333, ans=0.1 2023-11-22 00:58:46,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1731760.0, ans=0.0 2023-11-22 00:58:48,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1731760.0, ans=0.0 2023-11-22 00:59:19,754 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7300, loss[loss=0.1011, simple_loss=0.1273, pruned_loss=0.02822, audio_tagging_loss=0.009216, over 16479.00 frames. ], tot_loss[loss=0.07363, simple_loss=0.09591, pruned_loss=0.01596, audio_tagging_loss=0.009716, over 3050523.84 frames. ], batch size: 61, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 00:59:24,711 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259800 2023-11-22 00:59:34,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-22 00:59:50,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1732093.3333333333, ans=0.125 2023-11-22 00:59:50,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1732093.3333333333, ans=0.0 2023-11-22 00:59:52,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1732093.3333333333, ans=0.125 2023-11-22 00:59:53,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.552e+01 8.104e+01 8.588e+01 9.277e+01 1.159e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-22 01:00:04,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-22 01:00:23,343 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7350, loss[loss=0.06403, simple_loss=0.08454, pruned_loss=0.01447, audio_tagging_loss=0.007298, over 14139.00 frames. ], tot_loss[loss=0.07324, simple_loss=0.09564, pruned_loss=0.0159, audio_tagging_loss=0.009524, over 3039920.51 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:00:23,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1732293.3333333333, ans=0.1 2023-11-22 01:00:28,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259850 2023-11-22 01:00:30,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1732293.3333333333, ans=0.0 2023-11-22 01:00:59,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1732426.6666666667, ans=0.1 2023-11-22 01:01:07,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2023-11-22 01:01:11,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-22 01:01:14,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1732560.0, ans=0.0 2023-11-22 01:01:19,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1732560.0, ans=0.125 2023-11-22 01:01:26,423 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7400, loss[loss=0.07037, simple_loss=0.09173, pruned_loss=0.01641, audio_tagging_loss=0.008089, over 14746.00 frames. ], tot_loss[loss=0.07315, simple_loss=0.09541, pruned_loss=0.01596, audio_tagging_loss=0.009485, over 3047606.79 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:01:32,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259900 2023-11-22 01:01:58,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1732760.0, ans=0.125 2023-11-22 01:02:02,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.170e+01 8.751e+01 9.458e+01 1.086e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 01:02:17,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-11-22 01:02:22,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1732893.3333333333, ans=0.125 2023-11-22 01:02:23,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-11-22 01:02:30,995 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7450, loss[loss=0.07638, simple_loss=0.1026, pruned_loss=0.01576, audio_tagging_loss=0.009313, over 15433.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09471, pruned_loss=0.0158, audio_tagging_loss=0.009379, over 3044414.79 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:02:36,551 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 259950 2023-11-22 01:03:10,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1733160.0, ans=0.1 2023-11-22 01:03:29,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2023-11-22 01:03:35,332 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7500, loss[loss=0.0616, simple_loss=0.07004, pruned_loss=0.01156, audio_tagging_loss=0.01502, over 14773.00 frames. ], tot_loss[loss=0.07299, simple_loss=0.09553, pruned_loss=0.01589, audio_tagging_loss=0.009333, over 3050802.47 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:03:40,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260000 2023-11-22 01:03:41,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-11-22 01:04:04,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1733426.6666666667, ans=0.125 2023-11-22 01:04:14,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.000e+01 8.690e+01 9.325e+01 1.229e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-22 01:04:16,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1733493.3333333333, ans=0.125 2023-11-22 01:04:25,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1733493.3333333333, ans=0.0 2023-11-22 01:04:27,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1733493.3333333333, ans=0.1 2023-11-22 01:04:42,063 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7550, loss[loss=0.06062, simple_loss=0.07801, pruned_loss=0.01267, audio_tagging_loss=0.008945, over 14103.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.09585, pruned_loss=0.01595, audio_tagging_loss=0.00926, over 3059360.05 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:04:47,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260050 2023-11-22 01:05:09,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1733760.0, ans=0.125 2023-11-22 01:05:24,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1733826.6666666667, ans=0.1 2023-11-22 01:05:35,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1733893.3333333333, ans=0.125 2023-11-22 01:05:45,970 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7600, loss[loss=0.06884, simple_loss=0.08601, pruned_loss=0.01311, audio_tagging_loss=0.01273, over 14721.00 frames. ], tot_loss[loss=0.07252, simple_loss=0.09463, pruned_loss=0.01583, audio_tagging_loss=0.009373, over 3052560.70 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:05:51,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260100 2023-11-22 01:05:59,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2023-11-22 01:05:59,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2023-11-22 01:06:21,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.289e+01 8.897e+01 9.780e+01 1.167e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-22 01:06:47,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1734226.6666666667, ans=0.1 2023-11-22 01:06:49,651 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7650, loss[loss=0.04567, simple_loss=0.05742, pruned_loss=0.007281, audio_tagging_loss=0.009677, over 14461.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.09437, pruned_loss=0.0158, audio_tagging_loss=0.009329, over 3050967.50 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:06:54,510 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260150 2023-11-22 01:06:54,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1734293.3333333333, ans=0.0 2023-11-22 01:07:50,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1734560.0, ans=0.1 2023-11-22 01:07:53,018 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7700, loss[loss=0.06748, simple_loss=0.0868, pruned_loss=0.01322, audio_tagging_loss=0.01086, over 15591.00 frames. ], tot_loss[loss=0.07227, simple_loss=0.09447, pruned_loss=0.01566, audio_tagging_loss=0.009382, over 3048950.62 frames. ], batch size: 57, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:07:57,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260200 2023-11-22 01:08:05,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1734693.3333333333, ans=0.125 2023-11-22 01:08:07,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1734693.3333333333, ans=0.0 2023-11-22 01:08:13,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1734693.3333333333, ans=0.2 2023-11-22 01:08:15,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1734693.3333333333, ans=0.125 2023-11-22 01:08:25,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1734760.0, ans=0.125 2023-11-22 01:08:29,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 7.976e+01 8.890e+01 9.341e+01 1.175e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 01:08:36,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1734826.6666666667, ans=10.0 2023-11-22 01:08:57,770 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7750, loss[loss=0.07218, simple_loss=0.09638, pruned_loss=0.01695, audio_tagging_loss=0.007041, over 14501.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09455, pruned_loss=0.01573, audio_tagging_loss=0.009365, over 3053313.22 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:09:02,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260250 2023-11-22 01:09:15,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1735026.6666666667, ans=0.125 2023-11-22 01:09:27,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1735093.3333333333, ans=0.95 2023-11-22 01:09:36,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1735160.0, ans=0.125 2023-11-22 01:09:53,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1735226.6666666667, ans=0.1 2023-11-22 01:09:55,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1735226.6666666667, ans=0.0 2023-11-22 01:09:57,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1735226.6666666667, ans=0.0 2023-11-22 01:09:59,040 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:10:01,237 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7800, loss[loss=0.08355, simple_loss=0.1129, pruned_loss=0.01835, audio_tagging_loss=0.008752, over 15859.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.0952, pruned_loss=0.01581, audio_tagging_loss=0.009347, over 3055516.05 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:10:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1735293.3333333333, ans=0.05 2023-11-22 01:10:06,224 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260300 2023-11-22 01:10:10,118 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:10:21,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1735360.0, ans=0.0 2023-11-22 01:10:24,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1735360.0, ans=6.0 2023-11-22 01:10:37,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.748e+01 8.146e+01 9.007e+01 9.749e+01 1.228e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 01:10:37,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1735426.6666666667, ans=0.0 2023-11-22 01:10:45,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-11-22 01:11:03,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1735626.6666666667, ans=0.05 2023-11-22 01:11:04,766 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7850, loss[loss=0.07667, simple_loss=0.08776, pruned_loss=0.01985, audio_tagging_loss=0.01294, over 13802.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09507, pruned_loss=0.01578, audio_tagging_loss=0.009451, over 3047146.43 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:11:09,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260350 2023-11-22 01:11:30,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1735760.0, ans=0.125 2023-11-22 01:11:35,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1735760.0, ans=0.125 2023-11-22 01:11:40,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=1735760.0, ans=0.2 2023-11-22 01:12:09,861 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7900, loss[loss=0.05519, simple_loss=0.07377, pruned_loss=0.01176, audio_tagging_loss=0.00654, over 15471.00 frames. ], tot_loss[loss=0.07234, simple_loss=0.09399, pruned_loss=0.01566, audio_tagging_loss=0.009684, over 3052052.46 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:12:13,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1735960.0, ans=0.0 2023-11-22 01:12:14,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260400 2023-11-22 01:12:22,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1736026.6666666667, ans=0.125 2023-11-22 01:12:24,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1736026.6666666667, ans=0.125 2023-11-22 01:12:25,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-22 01:12:46,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.144e+01 8.401e+01 8.796e+01 9.438e+01 1.194e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 01:12:49,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1736160.0, ans=0.125 2023-11-22 01:13:10,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1736226.6666666667, ans=0.0 2023-11-22 01:13:13,832 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 7950, loss[loss=0.09341, simple_loss=0.1183, pruned_loss=0.0245, audio_tagging_loss=0.009773, over 16683.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.09387, pruned_loss=0.01555, audio_tagging_loss=0.009708, over 3052329.27 frames. ], batch size: 63, lr: 3.10e-03, grad_scale: 16.0 2023-11-22 01:13:18,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260450 2023-11-22 01:13:28,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1736360.0, ans=0.0 2023-11-22 01:13:29,905 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 01:13:30,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1736360.0, ans=0.125 2023-11-22 01:13:34,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-22 01:13:36,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1736360.0, ans=0.0 2023-11-22 01:13:54,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1736493.3333333333, ans=0.1 2023-11-22 01:14:16,697 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8000, loss[loss=0.07046, simple_loss=0.1014, pruned_loss=0.0112, audio_tagging_loss=0.008537, over 14046.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09398, pruned_loss=0.01562, audio_tagging_loss=0.009743, over 3041596.08 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:14:17,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-22 01:14:21,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260500 2023-11-22 01:14:52,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1736760.0, ans=0.1 2023-11-22 01:14:54,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.689e+01 7.877e+01 8.568e+01 9.236e+01 1.291e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-22 01:15:08,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1736893.3333333333, ans=0.125 2023-11-22 01:15:20,836 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8050, loss[loss=0.07317, simple_loss=0.09799, pruned_loss=0.01486, audio_tagging_loss=0.00932, over 14472.00 frames. ], tot_loss[loss=0.07357, simple_loss=0.09583, pruned_loss=0.01597, audio_tagging_loss=0.009683, over 3049844.12 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 32.0 2023-11-22 01:15:26,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260550 2023-11-22 01:15:30,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1736960.0, ans=0.0 2023-11-22 01:15:32,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2023-11-22 01:16:08,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1737160.0, ans=15.0 2023-11-22 01:16:08,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-22 01:16:11,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1737226.6666666667, ans=0.125 2023-11-22 01:16:19,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1737226.6666666667, ans=0.125 2023-11-22 01:16:25,999 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8100, loss[loss=0.06329, simple_loss=0.08532, pruned_loss=0.01257, audio_tagging_loss=0.008052, over 14840.00 frames. ], tot_loss[loss=0.0732, simple_loss=0.09528, pruned_loss=0.01589, audio_tagging_loss=0.009673, over 3055011.16 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:16:30,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260600 2023-11-22 01:16:37,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1737360.0, ans=0.1 2023-11-22 01:17:02,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1737426.6666666667, ans=0.0 2023-11-22 01:17:04,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.436e+01 7.880e+01 8.583e+01 9.266e+01 1.211e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-22 01:17:09,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1737493.3333333333, ans=0.125 2023-11-22 01:17:16,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2023-11-22 01:17:20,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1737560.0, ans=0.0 2023-11-22 01:17:25,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.01 vs. limit=10.0 2023-11-22 01:17:27,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1737560.0, ans=0.025 2023-11-22 01:17:30,007 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8150, loss[loss=0.04593, simple_loss=0.05477, pruned_loss=0.0089, audio_tagging_loss=0.009643, over 14707.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09507, pruned_loss=0.01583, audio_tagging_loss=0.009518, over 3046678.40 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:17:32,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1737626.6666666667, ans=0.125 2023-11-22 01:17:34,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260650 2023-11-22 01:17:53,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1737693.3333333333, ans=0.09899494936611666 2023-11-22 01:17:57,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1737760.0, ans=0.125 2023-11-22 01:17:57,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1737760.0, ans=0.2 2023-11-22 01:17:58,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1737760.0, ans=0.04949747468305833 2023-11-22 01:17:59,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2023-11-22 01:18:23,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1737893.3333333333, ans=0.0 2023-11-22 01:18:26,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1737893.3333333333, ans=0.125 2023-11-22 01:18:32,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1737960.0, ans=15.0 2023-11-22 01:18:33,273 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8200, loss[loss=0.08058, simple_loss=0.1052, pruned_loss=0.01773, audio_tagging_loss=0.01023, over 15104.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09555, pruned_loss=0.01588, audio_tagging_loss=0.009435, over 3049468.38 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:18:34,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-11-22 01:18:36,359 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 01:18:38,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1737960.0, ans=0.0 2023-11-22 01:18:39,498 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260700 2023-11-22 01:18:43,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1737960.0, ans=0.2 2023-11-22 01:18:55,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1738026.6666666667, ans=0.0 2023-11-22 01:18:59,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1738093.3333333333, ans=0.1 2023-11-22 01:19:00,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-11-22 01:19:12,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.128e+01 8.787e+01 9.499e+01 1.177e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 01:19:29,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1738226.6666666667, ans=0.0 2023-11-22 01:19:34,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1738226.6666666667, ans=0.125 2023-11-22 01:19:38,530 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8250, loss[loss=0.07443, simple_loss=0.1012, pruned_loss=0.01696, audio_tagging_loss=0.006881, over 16449.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.09586, pruned_loss=0.01598, audio_tagging_loss=0.009298, over 3050639.19 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:19:43,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260750 2023-11-22 01:19:45,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-22 01:19:50,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1738360.0, ans=0.0 2023-11-22 01:20:06,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1738426.6666666667, ans=0.0 2023-11-22 01:20:16,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-22 01:20:22,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1738493.3333333333, ans=0.1 2023-11-22 01:20:41,968 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8300, loss[loss=0.1036, simple_loss=0.1481, pruned_loss=0.02226, audio_tagging_loss=0.007316, over 16203.00 frames. ], tot_loss[loss=0.07281, simple_loss=0.09517, pruned_loss=0.01589, audio_tagging_loss=0.009339, over 3047291.32 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:20:43,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1738626.6666666667, ans=0.1 2023-11-22 01:20:46,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260800 2023-11-22 01:20:47,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1738626.6666666667, ans=0.07 2023-11-22 01:20:47,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-22 01:20:48,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1738626.6666666667, ans=0.0 2023-11-22 01:21:17,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1738760.0, ans=0.0 2023-11-22 01:21:20,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2023-11-22 01:21:21,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.003e+01 8.893e+01 9.644e+01 1.813e+02, threshold=1.779e+02, percent-clipped=1.0 2023-11-22 01:21:22,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1738826.6666666667, ans=0.0 2023-11-22 01:21:27,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1738826.6666666667, ans=0.2 2023-11-22 01:21:46,168 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8350, loss[loss=0.09691, simple_loss=0.135, pruned_loss=0.02253, audio_tagging_loss=0.006881, over 16103.00 frames. ], tot_loss[loss=0.07217, simple_loss=0.09458, pruned_loss=0.01557, audio_tagging_loss=0.009317, over 3051170.59 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:21:48,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1738960.0, ans=0.0 2023-11-22 01:21:51,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260850 2023-11-22 01:21:51,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1738960.0, ans=0.2 2023-11-22 01:22:20,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1739093.3333333333, ans=0.125 2023-11-22 01:22:44,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1739226.6666666667, ans=0.125 2023-11-22 01:22:50,537 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8400, loss[loss=0.05619, simple_loss=0.07031, pruned_loss=0.0105, audio_tagging_loss=0.01054, over 14302.00 frames. ], tot_loss[loss=0.07227, simple_loss=0.09456, pruned_loss=0.01567, audio_tagging_loss=0.009322, over 3049041.30 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:22:56,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260900 2023-11-22 01:23:06,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1739360.0, ans=0.0 2023-11-22 01:23:06,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1739360.0, ans=0.125 2023-11-22 01:23:13,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1739360.0, ans=0.0 2023-11-22 01:23:23,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1739426.6666666667, ans=0.125 2023-11-22 01:23:28,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 7.994e+01 8.596e+01 9.193e+01 1.208e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-22 01:23:31,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-22 01:23:35,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1739493.3333333333, ans=0.0 2023-11-22 01:23:47,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-22 01:23:54,880 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8450, loss[loss=0.07487, simple_loss=0.1013, pruned_loss=0.01546, audio_tagging_loss=0.008742, over 16585.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09485, pruned_loss=0.01574, audio_tagging_loss=0.00931, over 3045771.87 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:23:59,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 260950 2023-11-22 01:24:12,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1739693.3333333333, ans=0.125 2023-11-22 01:24:13,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1739693.3333333333, ans=0.2 2023-11-22 01:24:30,072 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:24:35,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-22 01:24:45,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1739893.3333333333, ans=0.0 2023-11-22 01:24:51,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1739893.3333333333, ans=0.125 2023-11-22 01:24:58,329 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8500, loss[loss=0.06846, simple_loss=0.09037, pruned_loss=0.01473, audio_tagging_loss=0.008555, over 15751.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09413, pruned_loss=0.01546, audio_tagging_loss=0.009373, over 3051098.70 frames. ], batch size: 60, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:25:03,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261000 2023-11-22 01:25:11,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1740026.6666666667, ans=0.125 2023-11-22 01:25:12,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1740026.6666666667, ans=0.1 2023-11-22 01:25:28,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2023-11-22 01:25:36,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.117e+01 8.646e+01 9.659e+01 1.248e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 01:25:39,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1740160.0, ans=0.125 2023-11-22 01:25:51,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-22 01:25:55,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1740226.6666666667, ans=0.125 2023-11-22 01:26:02,752 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8550, loss[loss=0.09593, simple_loss=0.1234, pruned_loss=0.02502, audio_tagging_loss=0.009232, over 15311.00 frames. ], tot_loss[loss=0.07201, simple_loss=0.09396, pruned_loss=0.01556, audio_tagging_loss=0.009463, over 3046967.29 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:26:08,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261050 2023-11-22 01:26:11,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1740293.3333333333, ans=0.1 2023-11-22 01:26:21,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1740360.0, ans=0.2 2023-11-22 01:26:35,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1740426.6666666667, ans=0.125 2023-11-22 01:26:38,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1740426.6666666667, ans=0.0 2023-11-22 01:27:06,906 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8600, loss[loss=0.08309, simple_loss=0.1087, pruned_loss=0.01812, audio_tagging_loss=0.0106, over 14912.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09411, pruned_loss=0.01571, audio_tagging_loss=0.00952, over 3048701.95 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:27:12,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261100 2023-11-22 01:27:29,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1740693.3333333333, ans=0.0 2023-11-22 01:27:31,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2023-11-22 01:27:35,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1740760.0, ans=0.2 2023-11-22 01:27:35,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1740760.0, ans=0.125 2023-11-22 01:27:36,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1740760.0, ans=0.2 2023-11-22 01:27:41,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1740760.0, ans=0.125 2023-11-22 01:27:45,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.115e+01 8.738e+01 9.336e+01 1.223e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 01:27:53,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1740826.6666666667, ans=0.1 2023-11-22 01:27:57,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1740893.3333333333, ans=0.2 2023-11-22 01:28:10,849 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8650, loss[loss=0.07917, simple_loss=0.09683, pruned_loss=0.01847, audio_tagging_loss=0.01229, over 15208.00 frames. ], tot_loss[loss=0.07249, simple_loss=0.09437, pruned_loss=0.01569, audio_tagging_loss=0.009611, over 3047223.73 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:28:15,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261150 2023-11-22 01:28:18,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1740960.0, ans=0.125 2023-11-22 01:28:20,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1740960.0, ans=0.0 2023-11-22 01:28:46,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1741093.3333333333, ans=0.125 2023-11-22 01:28:51,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-22 01:28:59,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1741160.0, ans=0.0 2023-11-22 01:29:05,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1741226.6666666667, ans=0.2 2023-11-22 01:29:08,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1741226.6666666667, ans=0.0 2023-11-22 01:29:15,056 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8700, loss[loss=0.08915, simple_loss=0.1114, pruned_loss=0.02297, audio_tagging_loss=0.01048, over 14294.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.09573, pruned_loss=0.01594, audio_tagging_loss=0.00962, over 3049286.82 frames. ], batch size: 54, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:29:20,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261200 2023-11-22 01:29:29,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1741360.0, ans=0.125 2023-11-22 01:29:36,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1741360.0, ans=0.125 2023-11-22 01:29:38,287 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:29:39,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1741426.6666666667, ans=0.0 2023-11-22 01:29:49,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1741426.6666666667, ans=0.04949747468305833 2023-11-22 01:29:50,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1741426.6666666667, ans=0.125 2023-11-22 01:29:52,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1741493.3333333333, ans=0.125 2023-11-22 01:29:54,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.323e+01 9.034e+01 9.944e+01 3.984e+02, threshold=1.807e+02, percent-clipped=2.0 2023-11-22 01:29:55,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1741493.3333333333, ans=0.125 2023-11-22 01:29:55,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1741493.3333333333, ans=0.125 2023-11-22 01:30:03,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-22 01:30:15,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-22 01:30:19,264 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8750, loss[loss=0.07204, simple_loss=0.09124, pruned_loss=0.01642, audio_tagging_loss=0.01, over 15055.00 frames. ], tot_loss[loss=0.07413, simple_loss=0.09649, pruned_loss=0.01619, audio_tagging_loss=0.009688, over 3044415.79 frames. ], batch size: 54, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:30:20,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1741626.6666666667, ans=0.125 2023-11-22 01:30:24,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261250 2023-11-22 01:30:40,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1741693.3333333333, ans=0.2 2023-11-22 01:31:02,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1741826.6666666667, ans=0.0 2023-11-22 01:31:04,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1741826.6666666667, ans=0.0 2023-11-22 01:31:23,446 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8800, loss[loss=0.07003, simple_loss=0.09087, pruned_loss=0.01506, audio_tagging_loss=0.009541, over 15108.00 frames. ], tot_loss[loss=0.0746, simple_loss=0.09713, pruned_loss=0.01633, audio_tagging_loss=0.009706, over 3051589.53 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:31:28,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261300 2023-11-22 01:32:01,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1742160.0, ans=0.125 2023-11-22 01:32:03,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.285e+01 8.942e+01 9.871e+01 1.431e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 01:32:25,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2023-11-22 01:32:28,143 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8850, loss[loss=0.08585, simple_loss=0.1126, pruned_loss=0.02147, audio_tagging_loss=0.008102, over 14699.00 frames. ], tot_loss[loss=0.07484, simple_loss=0.09761, pruned_loss=0.01639, audio_tagging_loss=0.009651, over 3046582.43 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:32:33,127 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261350 2023-11-22 01:32:42,101 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 01:32:44,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-22 01:32:49,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1742360.0, ans=0.0 2023-11-22 01:32:51,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-22 01:32:57,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-22 01:33:09,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1742493.3333333333, ans=0.125 2023-11-22 01:33:23,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1742560.0, ans=0.1 2023-11-22 01:33:27,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:33:31,951 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8900, loss[loss=0.06323, simple_loss=0.08409, pruned_loss=0.008409, audio_tagging_loss=0.01277, over 15705.00 frames. ], tot_loss[loss=0.07378, simple_loss=0.09623, pruned_loss=0.01617, audio_tagging_loss=0.009494, over 3045272.46 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:33:36,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261400 2023-11-22 01:33:50,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1742693.3333333333, ans=0.0 2023-11-22 01:33:54,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-22 01:34:10,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1742826.6666666667, ans=0.1 2023-11-22 01:34:10,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.07 vs. limit=10.0 2023-11-22 01:34:13,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.357e+01 8.809e+01 9.586e+01 1.233e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 01:34:14,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1742826.6666666667, ans=0.1 2023-11-22 01:34:15,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1742826.6666666667, ans=0.2 2023-11-22 01:34:15,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1742826.6666666667, ans=0.125 2023-11-22 01:34:20,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1742826.6666666667, ans=0.0 2023-11-22 01:34:30,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1742893.3333333333, ans=0.1 2023-11-22 01:34:35,073 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 8950, loss[loss=0.07631, simple_loss=0.09975, pruned_loss=0.01619, audio_tagging_loss=0.01024, over 15103.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09645, pruned_loss=0.0164, audio_tagging_loss=0.009342, over 3046008.54 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:34:37,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1742960.0, ans=10.0 2023-11-22 01:34:40,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261450 2023-11-22 01:34:44,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=22.5 2023-11-22 01:34:45,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1742960.0, ans=0.2 2023-11-22 01:34:47,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2023-11-22 01:35:14,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1743160.0, ans=0.125 2023-11-22 01:35:32,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1743226.6666666667, ans=0.0 2023-11-22 01:35:39,531 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9000, loss[loss=0.07315, simple_loss=0.09318, pruned_loss=0.01609, audio_tagging_loss=0.01047, over 15648.00 frames. ], tot_loss[loss=0.07364, simple_loss=0.09632, pruned_loss=0.01623, audio_tagging_loss=0.009251, over 3053855.78 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:35:39,532 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 01:35:59,145 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8856, 4.7202, 4.0514, 4.5768], device='cuda:1') 2023-11-22 01:36:20,051 INFO [train_asr.py:1253] (1/4) Epoch 22, validation: loss=0.0605, simple_loss=0.05183, pruned_loss=0.005175, audio_tagging_loss=0.02941, over 4681554.00 frames. 2023-11-22 01:36:20,052 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 01:36:22,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1743293.3333333333, ans=0.125 2023-11-22 01:36:24,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261500 2023-11-22 01:36:33,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1743360.0, ans=0.05 2023-11-22 01:36:51,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1743426.6666666667, ans=0.0 2023-11-22 01:37:01,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.315e+01 8.919e+01 9.797e+01 1.182e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 01:37:03,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-22 01:37:10,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1743560.0, ans=0.0 2023-11-22 01:37:18,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1743560.0, ans=0.015 2023-11-22 01:37:22,986 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9050, loss[loss=0.06278, simple_loss=0.08347, pruned_loss=0.0127, audio_tagging_loss=0.008343, over 15896.00 frames. ], tot_loss[loss=0.07378, simple_loss=0.09651, pruned_loss=0.01633, audio_tagging_loss=0.009202, over 3058363.23 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:37:27,908 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261550 2023-11-22 01:37:30,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1743626.6666666667, ans=0.0 2023-11-22 01:37:33,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1743626.6666666667, ans=0.125 2023-11-22 01:37:34,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743693.3333333333, ans=0.1 2023-11-22 01:37:40,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2023-11-22 01:37:44,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743693.3333333333, ans=0.1 2023-11-22 01:37:55,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1743760.0, ans=0.125 2023-11-22 01:38:02,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1743826.6666666667, ans=0.125 2023-11-22 01:38:27,017 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9100, loss[loss=0.08696, simple_loss=0.1052, pruned_loss=0.02344, audio_tagging_loss=0.01091, over 15740.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09567, pruned_loss=0.01615, audio_tagging_loss=0.009236, over 3059887.34 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:38:31,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261600 2023-11-22 01:39:06,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.063e+01 8.742e+01 9.387e+01 1.203e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 01:39:06,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-22 01:39:16,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1744226.6666666667, ans=0.125 2023-11-22 01:39:29,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1744293.3333333333, ans=0.0 2023-11-22 01:39:30,541 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9150, loss[loss=0.07053, simple_loss=0.09187, pruned_loss=0.01579, audio_tagging_loss=0.008808, over 14419.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09564, pruned_loss=0.01608, audio_tagging_loss=0.009279, over 3047403.10 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:39:35,510 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261650 2023-11-22 01:39:40,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1744293.3333333333, ans=0.1 2023-11-22 01:39:46,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1744360.0, ans=0.125 2023-11-22 01:40:17,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1744493.3333333333, ans=0.07 2023-11-22 01:40:29,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1744560.0, ans=0.125 2023-11-22 01:40:33,658 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9200, loss[loss=0.0689, simple_loss=0.09305, pruned_loss=0.01358, audio_tagging_loss=0.008796, over 15029.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09504, pruned_loss=0.01595, audio_tagging_loss=0.00932, over 3046285.89 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:40:38,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261700 2023-11-22 01:40:41,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1744626.6666666667, ans=0.125 2023-11-22 01:40:50,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1744693.3333333333, ans=0.0 2023-11-22 01:41:14,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.733e+01 8.066e+01 8.592e+01 9.287e+01 1.258e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-22 01:41:19,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=8.0 2023-11-22 01:41:28,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1744893.3333333333, ans=0.125 2023-11-22 01:41:37,058 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9250, loss[loss=0.06666, simple_loss=0.08199, pruned_loss=0.01717, audio_tagging_loss=0.008493, over 15153.00 frames. ], tot_loss[loss=0.07221, simple_loss=0.09404, pruned_loss=0.01588, audio_tagging_loss=0.009305, over 3051703.35 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:41:43,184 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261750 2023-11-22 01:41:44,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1744960.0, ans=0.125 2023-11-22 01:41:48,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1744960.0, ans=0.0 2023-11-22 01:41:53,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1745026.6666666667, ans=0.0 2023-11-22 01:42:03,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2023-11-22 01:42:05,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1745093.3333333333, ans=0.2 2023-11-22 01:42:22,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1745160.0, ans=0.2 2023-11-22 01:42:41,994 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9300, loss[loss=0.08731, simple_loss=0.1194, pruned_loss=0.02141, audio_tagging_loss=0.006191, over 15963.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09356, pruned_loss=0.01568, audio_tagging_loss=0.009288, over 3050101.30 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:42:46,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261800 2023-11-22 01:42:54,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1745360.0, ans=0.1 2023-11-22 01:42:55,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-22 01:42:56,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2023-11-22 01:43:19,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-22 01:43:24,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.431e+01 7.861e+01 8.461e+01 9.251e+01 1.268e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-22 01:43:24,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1745493.3333333333, ans=0.125 2023-11-22 01:43:29,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-11-22 01:43:37,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1745560.0, ans=0.2 2023-11-22 01:43:45,722 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9350, loss[loss=0.07378, simple_loss=0.09176, pruned_loss=0.01832, audio_tagging_loss=0.00958, over 15349.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.09416, pruned_loss=0.01586, audio_tagging_loss=0.009306, over 3055154.74 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:43:46,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1745626.6666666667, ans=0.1 2023-11-22 01:43:50,727 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261850 2023-11-22 01:44:00,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1745693.3333333333, ans=15.0 2023-11-22 01:44:02,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1745693.3333333333, ans=0.0 2023-11-22 01:44:09,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-22 01:44:23,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1745826.6666666667, ans=0.0 2023-11-22 01:44:49,922 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9400, loss[loss=0.08502, simple_loss=0.1187, pruned_loss=0.01752, audio_tagging_loss=0.008167, over 15317.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09439, pruned_loss=0.01588, audio_tagging_loss=0.009393, over 3055020.07 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:44:55,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261900 2023-11-22 01:45:06,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1746026.6666666667, ans=0.07 2023-11-22 01:45:30,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1746160.0, ans=0.2 2023-11-22 01:45:32,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.299e+01 8.911e+01 9.594e+01 2.055e+02, threshold=1.782e+02, percent-clipped=1.0 2023-11-22 01:45:41,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1746226.6666666667, ans=0.125 2023-11-22 01:45:46,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1746226.6666666667, ans=0.2 2023-11-22 01:45:56,227 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9450, loss[loss=0.08294, simple_loss=0.1052, pruned_loss=0.01928, audio_tagging_loss=0.01106, over 15370.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09492, pruned_loss=0.01592, audio_tagging_loss=0.009489, over 3051887.30 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:45:56,300 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 01:46:01,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 261950 2023-11-22 01:46:12,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=22.5 2023-11-22 01:46:19,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1746360.0, ans=0.1 2023-11-22 01:46:30,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1746426.6666666667, ans=0.125 2023-11-22 01:46:30,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1746426.6666666667, ans=0.0 2023-11-22 01:46:43,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-22 01:46:59,826 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9500, loss[loss=0.08172, simple_loss=0.1061, pruned_loss=0.01906, audio_tagging_loss=0.00963, over 14703.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09477, pruned_loss=0.01588, audio_tagging_loss=0.009505, over 3050122.77 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:47:04,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262000 2023-11-22 01:47:42,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.239e+01 8.709e+01 9.390e+01 1.179e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 01:47:46,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1746826.6666666667, ans=0.125 2023-11-22 01:47:51,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1746893.3333333333, ans=0.0 2023-11-22 01:48:03,598 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9550, loss[loss=0.07189, simple_loss=0.0881, pruned_loss=0.01689, audio_tagging_loss=0.01095, over 15433.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.09452, pruned_loss=0.01573, audio_tagging_loss=0.009623, over 3051541.14 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2023-11-22 01:48:07,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1746960.0, ans=0.1 2023-11-22 01:48:09,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262050 2023-11-22 01:48:18,919 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-22 01:48:50,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1747160.0, ans=0.0 2023-11-22 01:48:55,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1747226.6666666667, ans=0.0 2023-11-22 01:48:55,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1747226.6666666667, ans=0.125 2023-11-22 01:49:08,515 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9600, loss[loss=0.06504, simple_loss=0.08005, pruned_loss=0.01267, audio_tagging_loss=0.01234, over 14870.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09442, pruned_loss=0.01579, audio_tagging_loss=0.0097, over 3048566.19 frames. ], batch size: 57, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:49:12,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-22 01:49:14,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262100 2023-11-22 01:49:41,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1747426.6666666667, ans=0.125 2023-11-22 01:49:49,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1747493.3333333333, ans=0.2 2023-11-22 01:49:51,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.503e+01 8.033e+01 8.722e+01 9.429e+01 2.117e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-22 01:49:53,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1747493.3333333333, ans=0.125 2023-11-22 01:49:58,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1747560.0, ans=0.125 2023-11-22 01:50:03,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1747560.0, ans=0.0 2023-11-22 01:50:13,331 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9650, loss[loss=0.0691, simple_loss=0.08925, pruned_loss=0.01539, audio_tagging_loss=0.009086, over 14018.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.09411, pruned_loss=0.01567, audio_tagging_loss=0.0097, over 3043957.78 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:50:18,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262150 2023-11-22 01:50:22,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1747626.6666666667, ans=0.1 2023-11-22 01:50:24,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1747693.3333333333, ans=0.125 2023-11-22 01:51:16,677 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9700, loss[loss=0.05367, simple_loss=0.06871, pruned_loss=0.008058, audio_tagging_loss=0.01126, over 16358.00 frames. ], tot_loss[loss=0.07278, simple_loss=0.095, pruned_loss=0.01577, audio_tagging_loss=0.009511, over 3042958.67 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 32.0 2023-11-22 01:51:22,199 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262200 2023-11-22 01:51:29,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1748026.6666666667, ans=0.125 2023-11-22 01:51:55,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1748160.0, ans=0.125 2023-11-22 01:51:56,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1748160.0, ans=0.1 2023-11-22 01:52:00,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.573e+01 8.169e+01 8.750e+01 9.625e+01 1.305e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 01:52:12,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1748226.6666666667, ans=0.0 2023-11-22 01:52:15,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1748226.6666666667, ans=0.125 2023-11-22 01:52:21,637 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9750, loss[loss=0.1097, simple_loss=0.1481, pruned_loss=0.02641, audio_tagging_loss=0.00925, over 15497.00 frames. ], tot_loss[loss=0.07347, simple_loss=0.09619, pruned_loss=0.01601, audio_tagging_loss=0.00937, over 3050484.30 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:52:27,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262250 2023-11-22 01:52:34,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1748360.0, ans=0.1 2023-11-22 01:52:47,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1748426.6666666667, ans=0.1 2023-11-22 01:52:51,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1748426.6666666667, ans=0.0 2023-11-22 01:53:14,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2023-11-22 01:53:24,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-22 01:53:26,046 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9800, loss[loss=0.08414, simple_loss=0.1187, pruned_loss=0.01591, audio_tagging_loss=0.008876, over 14869.00 frames. ], tot_loss[loss=0.07356, simple_loss=0.09647, pruned_loss=0.01603, audio_tagging_loss=0.0093, over 3045412.93 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:53:29,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-22 01:53:31,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262300 2023-11-22 01:53:34,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-11-22 01:53:57,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1748760.0, ans=0.0 2023-11-22 01:53:57,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1748760.0, ans=0.1 2023-11-22 01:54:09,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1748826.6666666667, ans=0.1 2023-11-22 01:54:10,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.436e+01 9.164e+01 9.957e+01 1.259e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-22 01:54:15,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1748826.6666666667, ans=10.0 2023-11-22 01:54:19,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-22 01:54:26,525 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 01:54:26,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1748893.3333333333, ans=0.2 2023-11-22 01:54:29,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1748960.0, ans=0.0 2023-11-22 01:54:30,296 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9850, loss[loss=0.08217, simple_loss=0.1069, pruned_loss=0.01848, audio_tagging_loss=0.01022, over 15776.00 frames. ], tot_loss[loss=0.07397, simple_loss=0.09707, pruned_loss=0.01625, audio_tagging_loss=0.009189, over 3041601.15 frames. ], batch size: 60, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:54:35,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262350 2023-11-22 01:54:46,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.20 vs. limit=22.5 2023-11-22 01:54:49,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.25 vs. limit=10.0 2023-11-22 01:54:55,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1749093.3333333333, ans=0.1 2023-11-22 01:55:05,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1749093.3333333333, ans=0.05 2023-11-22 01:55:17,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1749160.0, ans=0.125 2023-11-22 01:55:24,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1749226.6666666667, ans=0.125 2023-11-22 01:55:35,352 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9900, loss[loss=0.08755, simple_loss=0.1129, pruned_loss=0.02391, audio_tagging_loss=0.007182, over 15417.00 frames. ], tot_loss[loss=0.07365, simple_loss=0.09672, pruned_loss=0.01608, audio_tagging_loss=0.009208, over 3039398.51 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:55:40,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262400 2023-11-22 01:56:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1749426.6666666667, ans=0.2 2023-11-22 01:56:19,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.153e+01 9.023e+01 9.668e+01 1.710e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-22 01:56:22,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1749493.3333333333, ans=0.0 2023-11-22 01:56:23,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1749493.3333333333, ans=0.0 2023-11-22 01:56:39,861 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 9950, loss[loss=0.06843, simple_loss=0.0799, pruned_loss=0.01773, audio_tagging_loss=0.01075, over 14601.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09599, pruned_loss=0.01578, audio_tagging_loss=0.009269, over 3037407.13 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:56:44,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262450 2023-11-22 01:56:48,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1749626.6666666667, ans=0.125 2023-11-22 01:56:52,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-22 01:57:22,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1749826.6666666667, ans=0.125 2023-11-22 01:57:43,631 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10000, loss[loss=0.0913, simple_loss=0.1173, pruned_loss=0.02246, audio_tagging_loss=0.01021, over 15316.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09607, pruned_loss=0.01592, audio_tagging_loss=0.009216, over 3033030.69 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 01:57:48,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262500 2023-11-22 01:57:49,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1749960.0, ans=0.2 2023-11-22 01:58:21,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1750160.0, ans=0.125 2023-11-22 01:58:28,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.075e+01 8.690e+01 9.364e+01 1.327e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-22 01:58:32,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1750160.0, ans=0.125 2023-11-22 01:58:44,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1750226.6666666667, ans=0.125 2023-11-22 01:58:47,714 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10050, loss[loss=0.04703, simple_loss=0.053, pruned_loss=0.009482, audio_tagging_loss=0.01105, over 16153.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.09491, pruned_loss=0.01568, audio_tagging_loss=0.009288, over 3035205.81 frames. ], batch size: 64, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:58:53,387 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262550 2023-11-22 01:59:07,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1750360.0, ans=0.125 2023-11-22 01:59:16,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1750426.6666666667, ans=0.0 2023-11-22 01:59:20,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1750426.6666666667, ans=0.125 2023-11-22 01:59:45,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1750560.0, ans=0.1 2023-11-22 01:59:52,859 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10100, loss[loss=0.06569, simple_loss=0.09106, pruned_loss=0.01222, audio_tagging_loss=0.007937, over 15507.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09558, pruned_loss=0.01559, audio_tagging_loss=0.009288, over 3041983.05 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 01:59:57,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262600 2023-11-22 02:00:28,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1750760.0, ans=0.125 2023-11-22 02:00:38,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.193e+01 8.828e+01 9.579e+01 1.146e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 02:00:47,427 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:00:57,212 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10150, loss[loss=0.07172, simple_loss=0.09834, pruned_loss=0.01346, audio_tagging_loss=0.00909, over 15569.00 frames. ], tot_loss[loss=0.07361, simple_loss=0.09706, pruned_loss=0.01583, audio_tagging_loss=0.009248, over 3051524.00 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:01:01,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262650 2023-11-22 02:01:05,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2023-11-22 02:01:17,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1751026.6666666667, ans=0.125 2023-11-22 02:01:30,323 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:01:34,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1751093.3333333333, ans=0.035 2023-11-22 02:01:54,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1751226.6666666667, ans=0.125 2023-11-22 02:01:57,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1751226.6666666667, ans=0.2 2023-11-22 02:02:01,524 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10200, loss[loss=0.1023, simple_loss=0.1306, pruned_loss=0.02749, audio_tagging_loss=0.009439, over 14651.00 frames. ], tot_loss[loss=0.07325, simple_loss=0.09616, pruned_loss=0.01574, audio_tagging_loss=0.009432, over 3055925.79 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:02:03,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-22 02:02:07,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262700 2023-11-22 02:02:24,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1751360.0, ans=0.125 2023-11-22 02:02:27,327 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:02:29,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1751426.6666666667, ans=0.0 2023-11-22 02:02:38,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1751493.3333333333, ans=0.09899494936611666 2023-11-22 02:02:38,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1751493.3333333333, ans=0.0 2023-11-22 02:02:41,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1751493.3333333333, ans=0.125 2023-11-22 02:02:46,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 7.988e+01 8.675e+01 9.636e+01 1.224e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-22 02:02:53,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1751560.0, ans=0.0 2023-11-22 02:03:06,730 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10250, loss[loss=0.05265, simple_loss=0.06443, pruned_loss=0.01048, audio_tagging_loss=0.009953, over 15106.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.09601, pruned_loss=0.01589, audio_tagging_loss=0.009475, over 3057602.96 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 8.0 2023-11-22 02:03:08,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1751626.6666666667, ans=0.0 2023-11-22 02:03:10,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1751626.6666666667, ans=0.125 2023-11-22 02:03:11,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262750 2023-11-22 02:03:52,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1751826.6666666667, ans=0.0 2023-11-22 02:04:00,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2023-11-22 02:04:03,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-11-22 02:04:05,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1751893.3333333333, ans=0.125 2023-11-22 02:04:09,764 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10300, loss[loss=0.07205, simple_loss=0.09392, pruned_loss=0.01459, audio_tagging_loss=0.0105, over 14343.00 frames. ], tot_loss[loss=0.07332, simple_loss=0.09574, pruned_loss=0.01597, audio_tagging_loss=0.009478, over 3053883.41 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 8.0 2023-11-22 02:04:11,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1751960.0, ans=0.0 2023-11-22 02:04:14,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262800 2023-11-22 02:04:56,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.259e+01 8.792e+01 9.429e+01 1.253e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 02:05:02,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1752226.6666666667, ans=0.09899494936611666 2023-11-22 02:05:11,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1752226.6666666667, ans=0.09899494936611666 2023-11-22 02:05:14,247 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10350, loss[loss=0.08, simple_loss=0.1136, pruned_loss=0.01453, audio_tagging_loss=0.008674, over 14666.00 frames. ], tot_loss[loss=0.0729, simple_loss=0.09531, pruned_loss=0.01574, audio_tagging_loss=0.009503, over 3051767.10 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 8.0 2023-11-22 02:05:20,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262850 2023-11-22 02:05:26,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1752360.0, ans=0.07 2023-11-22 02:05:44,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1752426.6666666667, ans=10.0 2023-11-22 02:05:56,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2023-11-22 02:06:19,260 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10400, loss[loss=0.06681, simple_loss=0.08458, pruned_loss=0.01389, audio_tagging_loss=0.01063, over 14966.00 frames. ], tot_loss[loss=0.07316, simple_loss=0.09545, pruned_loss=0.01583, audio_tagging_loss=0.009602, over 3051928.46 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:06:24,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262900 2023-11-22 02:06:25,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1752626.6666666667, ans=0.125 2023-11-22 02:06:26,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-22 02:07:04,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.068e+01 8.726e+01 9.431e+01 1.284e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 02:07:06,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1752826.6666666667, ans=0.035 2023-11-22 02:07:10,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1752893.3333333333, ans=0.125 2023-11-22 02:07:14,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1752893.3333333333, ans=0.09899494936611666 2023-11-22 02:07:20,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.32 vs. limit=10.0 2023-11-22 02:07:22,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1752960.0, ans=15.0 2023-11-22 02:07:22,669 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10450, loss[loss=0.05547, simple_loss=0.07028, pruned_loss=0.01073, audio_tagging_loss=0.009603, over 15296.00 frames. ], tot_loss[loss=0.07339, simple_loss=0.0959, pruned_loss=0.0159, audio_tagging_loss=0.009547, over 3053271.74 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:07:24,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-22 02:07:27,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 262950 2023-11-22 02:07:47,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1753093.3333333333, ans=0.1 2023-11-22 02:07:50,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-22 02:07:52,442 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:07:53,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1753093.3333333333, ans=0.025 2023-11-22 02:08:00,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-11-22 02:08:25,960 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10500, loss[loss=0.05147, simple_loss=0.07166, pruned_loss=0.007091, audio_tagging_loss=0.008553, over 15115.00 frames. ], tot_loss[loss=0.07316, simple_loss=0.09591, pruned_loss=0.01581, audio_tagging_loss=0.009388, over 3046043.25 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:08:27,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-22 02:08:30,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1753293.3333333333, ans=0.07 2023-11-22 02:08:31,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263000 2023-11-22 02:08:40,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-22 02:08:40,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2023-11-22 02:08:47,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1753360.0, ans=0.1 2023-11-22 02:08:48,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-22 02:08:53,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-22 02:09:04,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1753493.3333333333, ans=0.1 2023-11-22 02:09:12,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.420e+01 8.257e+01 8.868e+01 9.626e+01 1.177e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 02:09:12,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1753493.3333333333, ans=0.125 2023-11-22 02:09:31,584 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10550, loss[loss=0.0719, simple_loss=0.0917, pruned_loss=0.01512, audio_tagging_loss=0.01093, over 14439.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09544, pruned_loss=0.01563, audio_tagging_loss=0.009276, over 3043028.81 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:09:37,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263050 2023-11-22 02:09:45,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1753693.3333333333, ans=0.0 2023-11-22 02:09:53,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1753693.3333333333, ans=0.125 2023-11-22 02:09:53,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1753693.3333333333, ans=0.5 2023-11-22 02:10:00,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1753760.0, ans=0.1 2023-11-22 02:10:04,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1753760.0, ans=0.125 2023-11-22 02:10:12,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1753826.6666666667, ans=0.125 2023-11-22 02:10:26,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1753893.3333333333, ans=0.0 2023-11-22 02:10:35,913 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10600, loss[loss=0.06774, simple_loss=0.08753, pruned_loss=0.01341, audio_tagging_loss=0.01056, over 15154.00 frames. ], tot_loss[loss=0.07283, simple_loss=0.09567, pruned_loss=0.01575, audio_tagging_loss=0.009244, over 3042910.98 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:10:36,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1753960.0, ans=0.1 2023-11-22 02:10:40,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263100 2023-11-22 02:10:57,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1754026.6666666667, ans=0.125 2023-11-22 02:10:57,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1754026.6666666667, ans=0.125 2023-11-22 02:10:58,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1754026.6666666667, ans=0.0 2023-11-22 02:11:17,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1754160.0, ans=0.2 2023-11-22 02:11:19,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1754160.0, ans=0.1 2023-11-22 02:11:21,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 8.070e+01 8.614e+01 9.337e+01 1.249e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-22 02:11:22,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1754160.0, ans=0.0 2023-11-22 02:11:22,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1754160.0, ans=0.125 2023-11-22 02:11:29,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1754226.6666666667, ans=0.125 2023-11-22 02:11:38,648 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10650, loss[loss=0.05978, simple_loss=0.0732, pruned_loss=0.01246, audio_tagging_loss=0.01073, over 14880.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09594, pruned_loss=0.01573, audio_tagging_loss=0.009233, over 3040829.75 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:11:43,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263150 2023-11-22 02:12:32,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1754560.0, ans=0.0 2023-11-22 02:12:36,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-22 02:12:42,107 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10700, loss[loss=0.0877, simple_loss=0.1181, pruned_loss=0.02275, audio_tagging_loss=0.005907, over 16525.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.0959, pruned_loss=0.01584, audio_tagging_loss=0.009278, over 3032962.40 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:12:42,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1754626.6666666667, ans=0.0 2023-11-22 02:12:46,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1754626.6666666667, ans=0.2 2023-11-22 02:12:47,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263200 2023-11-22 02:13:18,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1754760.0, ans=0.125 2023-11-22 02:13:28,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2023-11-22 02:13:29,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.759e+01 7.890e+01 8.587e+01 9.339e+01 1.239e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-22 02:13:47,141 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10750, loss[loss=0.08613, simple_loss=0.1231, pruned_loss=0.01695, audio_tagging_loss=0.007604, over 15328.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09657, pruned_loss=0.01583, audio_tagging_loss=0.009174, over 3043153.90 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:13:52,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263250 2023-11-22 02:14:04,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1755026.6666666667, ans=0.0 2023-11-22 02:14:26,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1755160.0, ans=0.125 2023-11-22 02:14:46,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-22 02:14:49,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2023-11-22 02:14:49,801 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10800, loss[loss=0.08942, simple_loss=0.1105, pruned_loss=0.02302, audio_tagging_loss=0.01113, over 16429.00 frames. ], tot_loss[loss=0.0732, simple_loss=0.09607, pruned_loss=0.01598, audio_tagging_loss=0.009186, over 3048367.34 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:14:49,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1755293.3333333333, ans=0.0 2023-11-22 02:14:51,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1755293.3333333333, ans=0.125 2023-11-22 02:14:54,764 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263300 2023-11-22 02:15:11,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1755360.0, ans=0.04949747468305833 2023-11-22 02:15:37,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.483e+01 8.311e+01 8.848e+01 9.855e+01 1.832e+02, threshold=1.770e+02, percent-clipped=2.0 2023-11-22 02:15:44,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1755560.0, ans=0.1 2023-11-22 02:15:48,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1755560.0, ans=0.125 2023-11-22 02:15:54,654 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10850, loss[loss=0.05991, simple_loss=0.07257, pruned_loss=0.0129, audio_tagging_loss=0.01072, over 15095.00 frames. ], tot_loss[loss=0.07254, simple_loss=0.09538, pruned_loss=0.01572, audio_tagging_loss=0.009126, over 3051711.87 frames. ], batch size: 58, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:15:59,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263350 2023-11-22 02:16:33,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1755826.6666666667, ans=0.0 2023-11-22 02:16:54,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1755893.3333333333, ans=0.125 2023-11-22 02:16:57,469 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:16:58,685 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10900, loss[loss=0.06958, simple_loss=0.09536, pruned_loss=0.01354, audio_tagging_loss=0.008365, over 15284.00 frames. ], tot_loss[loss=0.07292, simple_loss=0.09579, pruned_loss=0.01588, audio_tagging_loss=0.009143, over 3052487.95 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:17:04,295 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263400 2023-11-22 02:17:27,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1756093.3333333333, ans=0.1 2023-11-22 02:17:28,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1756093.3333333333, ans=0.1 2023-11-22 02:17:36,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1756160.0, ans=0.1 2023-11-22 02:17:40,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-11-22 02:17:46,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.172e+01 8.770e+01 9.474e+01 2.328e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-22 02:17:52,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-22 02:18:02,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1756293.3333333333, ans=0.125 2023-11-22 02:18:03,846 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 10950, loss[loss=0.05037, simple_loss=0.06638, pruned_loss=0.006681, audio_tagging_loss=0.0105, over 16148.00 frames. ], tot_loss[loss=0.07289, simple_loss=0.09567, pruned_loss=0.01585, audio_tagging_loss=0.009208, over 3052939.12 frames. ], batch size: 61, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:18:06,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1756293.3333333333, ans=0.0 2023-11-22 02:18:08,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263450 2023-11-22 02:18:10,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1756293.3333333333, ans=0.0 2023-11-22 02:18:10,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1756293.3333333333, ans=0.0 2023-11-22 02:18:17,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1756360.0, ans=0.125 2023-11-22 02:18:17,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1756360.0, ans=0.0 2023-11-22 02:18:22,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2023-11-22 02:18:27,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1756360.0, ans=0.125 2023-11-22 02:18:34,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1756426.6666666667, ans=0.125 2023-11-22 02:18:48,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1756493.3333333333, ans=0.125 2023-11-22 02:19:07,445 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11000, loss[loss=0.05371, simple_loss=0.07696, pruned_loss=0.00783, audio_tagging_loss=0.007396, over 14410.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09584, pruned_loss=0.01595, audio_tagging_loss=0.009301, over 3045490.96 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:19:13,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263500 2023-11-22 02:19:21,177 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:19:26,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1756693.3333333333, ans=0.125 2023-11-22 02:19:33,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1756760.0, ans=0.125 2023-11-22 02:19:36,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1756760.0, ans=0.125 2023-11-22 02:19:55,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.467e+01 8.140e+01 8.725e+01 9.449e+01 1.514e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 02:20:02,087 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:20:12,220 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11050, loss[loss=0.05681, simple_loss=0.0671, pruned_loss=0.01153, audio_tagging_loss=0.01172, over 15125.00 frames. ], tot_loss[loss=0.07331, simple_loss=0.09555, pruned_loss=0.01601, audio_tagging_loss=0.009528, over 3044415.74 frames. ], batch size: 58, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:20:17,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263550 2023-11-22 02:20:17,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1756960.0, ans=0.1 2023-11-22 02:20:55,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2023-11-22 02:20:58,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1757160.0, ans=0.125 2023-11-22 02:21:16,232 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11100, loss[loss=0.05291, simple_loss=0.06901, pruned_loss=0.008233, audio_tagging_loss=0.01017, over 14641.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09513, pruned_loss=0.01585, audio_tagging_loss=0.009594, over 3048335.78 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:21:21,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263600 2023-11-22 02:21:28,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1757360.0, ans=0.2 2023-11-22 02:21:28,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1757360.0, ans=0.125 2023-11-22 02:21:29,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1757360.0, ans=0.125 2023-11-22 02:21:57,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1757493.3333333333, ans=0.1 2023-11-22 02:22:03,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.113e+01 8.819e+01 9.747e+01 1.374e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 02:22:15,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1757560.0, ans=0.0 2023-11-22 02:22:20,501 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11150, loss[loss=0.07107, simple_loss=0.08985, pruned_loss=0.01316, audio_tagging_loss=0.01298, over 13895.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.0953, pruned_loss=0.01585, audio_tagging_loss=0.009773, over 3049450.16 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 16.0 2023-11-22 02:22:23,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1757626.6666666667, ans=0.1 2023-11-22 02:22:25,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263650 2023-11-22 02:22:45,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1757760.0, ans=0.1 2023-11-22 02:22:45,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1757760.0, ans=0.125 2023-11-22 02:23:10,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1757893.3333333333, ans=0.0 2023-11-22 02:23:25,252 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11200, loss[loss=0.05871, simple_loss=0.07806, pruned_loss=0.01027, audio_tagging_loss=0.009409, over 15108.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09607, pruned_loss=0.01598, audio_tagging_loss=0.009675, over 3056152.97 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:23:30,083 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263700 2023-11-22 02:23:36,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1758026.6666666667, ans=0.025 2023-11-22 02:23:56,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1758093.3333333333, ans=0.125 2023-11-22 02:23:58,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1758093.3333333333, ans=0.125 2023-11-22 02:24:05,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.43 vs. limit=10.0 2023-11-22 02:24:11,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.183e+01 8.908e+01 9.614e+01 1.304e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 02:24:21,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1758226.6666666667, ans=0.125 2023-11-22 02:24:25,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1758226.6666666667, ans=0.1 2023-11-22 02:24:27,700 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11250, loss[loss=0.0441, simple_loss=0.0593, pruned_loss=0.006421, audio_tagging_loss=0.008029, over 16677.00 frames. ], tot_loss[loss=0.07305, simple_loss=0.09521, pruned_loss=0.01576, audio_tagging_loss=0.009684, over 3059818.96 frames. ], batch size: 63, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:24:27,982 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:24:30,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1758293.3333333333, ans=0.125 2023-11-22 02:24:32,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263750 2023-11-22 02:24:46,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.87 vs. limit=10.0 2023-11-22 02:24:53,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1758426.6666666667, ans=0.0 2023-11-22 02:24:58,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1758426.6666666667, ans=0.1 2023-11-22 02:25:25,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1758560.0, ans=0.125 2023-11-22 02:25:28,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-22 02:25:31,819 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11300, loss[loss=0.0654, simple_loss=0.08712, pruned_loss=0.01216, audio_tagging_loss=0.009683, over 15613.00 frames. ], tot_loss[loss=0.07338, simple_loss=0.09564, pruned_loss=0.01599, audio_tagging_loss=0.009565, over 3052425.15 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:25:36,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263800 2023-11-22 02:25:44,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1758693.3333333333, ans=0.125 2023-11-22 02:25:55,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1758693.3333333333, ans=0.125 2023-11-22 02:26:04,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1758760.0, ans=0.125 2023-11-22 02:26:19,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.464e+01 8.064e+01 8.666e+01 9.719e+01 1.287e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 02:26:26,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1758893.3333333333, ans=0.125 2023-11-22 02:26:32,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1758893.3333333333, ans=0.0 2023-11-22 02:26:36,258 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11350, loss[loss=0.0545, simple_loss=0.06961, pruned_loss=0.009492, audio_tagging_loss=0.0102, over 15952.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09483, pruned_loss=0.01578, audio_tagging_loss=0.00945, over 3044966.35 frames. ], batch size: 60, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:26:41,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263850 2023-11-22 02:27:27,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1759226.6666666667, ans=0.1 2023-11-22 02:27:39,807 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11400, loss[loss=0.06222, simple_loss=0.08116, pruned_loss=0.01316, audio_tagging_loss=0.00848, over 15062.00 frames. ], tot_loss[loss=0.07258, simple_loss=0.0951, pruned_loss=0.01572, audio_tagging_loss=0.009309, over 3043161.51 frames. ], batch size: 58, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:27:43,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759293.3333333333, ans=0.1 2023-11-22 02:27:44,839 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263900 2023-11-22 02:27:46,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-22 02:27:49,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-22 02:27:51,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1759360.0, ans=0.125 2023-11-22 02:28:07,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1759426.6666666667, ans=0.125 2023-11-22 02:28:10,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-22 02:28:21,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1759493.3333333333, ans=0.125 2023-11-22 02:28:27,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.049e+01 8.601e+01 9.494e+01 1.206e+02, threshold=1.720e+02, percent-clipped=0.0 2023-11-22 02:28:38,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-22 02:28:43,095 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11450, loss[loss=0.0571, simple_loss=0.07772, pruned_loss=0.0109, audio_tagging_loss=0.00734, over 14408.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09537, pruned_loss=0.01586, audio_tagging_loss=0.009365, over 3046667.93 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 32.0 2023-11-22 02:28:43,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-11-22 02:28:48,625 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 263950 2023-11-22 02:28:55,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1759693.3333333333, ans=0.0 2023-11-22 02:29:22,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1759826.6666666667, ans=0.125 2023-11-22 02:29:47,450 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11500, loss[loss=0.08562, simple_loss=0.124, pruned_loss=0.01686, audio_tagging_loss=0.006737, over 15650.00 frames. ], tot_loss[loss=0.07316, simple_loss=0.09578, pruned_loss=0.01591, audio_tagging_loss=0.009355, over 3047035.15 frames. ], batch size: 58, lr: 3.07e-03, grad_scale: 32.0 2023-11-22 02:29:53,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264000 2023-11-22 02:29:59,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1759960.0, ans=0.125 2023-11-22 02:30:07,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1760026.6666666667, ans=0.025 2023-11-22 02:30:34,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1760160.0, ans=0.125 2023-11-22 02:30:38,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.114e+01 8.860e+01 9.560e+01 1.133e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 02:30:42,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-22 02:30:54,559 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11550, loss[loss=0.09045, simple_loss=0.122, pruned_loss=0.02073, audio_tagging_loss=0.00873, over 14508.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.09557, pruned_loss=0.01588, audio_tagging_loss=0.009321, over 3042098.66 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 32.0 2023-11-22 02:30:59,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264050 2023-11-22 02:31:03,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1760293.3333333333, ans=0.125 2023-11-22 02:31:32,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1760493.3333333333, ans=0.125 2023-11-22 02:31:37,432 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 02:31:57,966 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11600, loss[loss=0.07396, simple_loss=0.1011, pruned_loss=0.01564, audio_tagging_loss=0.00775, over 14544.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09569, pruned_loss=0.01603, audio_tagging_loss=0.009318, over 3045560.95 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 32.0 2023-11-22 02:32:03,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264100 2023-11-22 02:32:10,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1760693.3333333333, ans=0.125 2023-11-22 02:32:14,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1760693.3333333333, ans=0.2 2023-11-22 02:32:25,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:32:26,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1760760.0, ans=0.125 2023-11-22 02:32:47,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.841e+01 8.153e+01 8.705e+01 9.370e+01 1.172e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-22 02:32:57,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1760893.3333333333, ans=0.0 2023-11-22 02:33:02,412 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11650, loss[loss=0.0644, simple_loss=0.08553, pruned_loss=0.01436, audio_tagging_loss=0.007273, over 15001.00 frames. ], tot_loss[loss=0.07336, simple_loss=0.09589, pruned_loss=0.01599, audio_tagging_loss=0.009428, over 3050664.87 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:33:05,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2023-11-22 02:33:07,483 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264150 2023-11-22 02:33:14,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761026.6666666667, ans=0.1 2023-11-22 02:33:31,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-22 02:33:34,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1761093.3333333333, ans=0.0 2023-11-22 02:34:07,599 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11700, loss[loss=0.06507, simple_loss=0.08503, pruned_loss=0.01532, audio_tagging_loss=0.007235, over 14614.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09579, pruned_loss=0.01588, audio_tagging_loss=0.009396, over 3048627.68 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:34:09,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1761293.3333333333, ans=0.05 2023-11-22 02:34:12,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264200 2023-11-22 02:34:12,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1761293.3333333333, ans=0.0 2023-11-22 02:34:16,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-22 02:34:17,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1761293.3333333333, ans=0.0 2023-11-22 02:34:21,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1761360.0, ans=0.0 2023-11-22 02:34:30,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1761360.0, ans=0.125 2023-11-22 02:34:38,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1761426.6666666667, ans=0.125 2023-11-22 02:34:44,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1761426.6666666667, ans=0.0 2023-11-22 02:34:57,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.071e+01 8.694e+01 9.444e+01 1.226e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 02:35:00,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1761560.0, ans=0.125 2023-11-22 02:35:01,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1761560.0, ans=0.125 2023-11-22 02:35:03,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1761560.0, ans=0.125 2023-11-22 02:35:05,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2023-11-22 02:35:11,162 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11750, loss[loss=0.06648, simple_loss=0.0895, pruned_loss=0.01291, audio_tagging_loss=0.008821, over 15401.00 frames. ], tot_loss[loss=0.07332, simple_loss=0.09581, pruned_loss=0.01597, audio_tagging_loss=0.009445, over 3041399.22 frames. ], batch size: 57, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:35:16,067 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264250 2023-11-22 02:35:33,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1761693.3333333333, ans=0.125 2023-11-22 02:35:40,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1761760.0, ans=10.0 2023-11-22 02:36:02,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1761893.3333333333, ans=0.0 2023-11-22 02:36:11,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1761893.3333333333, ans=0.0 2023-11-22 02:36:11,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2023-11-22 02:36:15,725 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11800, loss[loss=0.07579, simple_loss=0.09955, pruned_loss=0.01733, audio_tagging_loss=0.008689, over 15684.00 frames. ], tot_loss[loss=0.07323, simple_loss=0.09553, pruned_loss=0.01597, audio_tagging_loss=0.009495, over 3040435.93 frames. ], batch size: 57, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:36:16,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1761960.0, ans=0.0 2023-11-22 02:36:21,274 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264300 2023-11-22 02:37:06,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.883e+01 8.285e+01 8.657e+01 9.300e+01 1.154e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-22 02:37:10,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2023-11-22 02:37:15,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-22 02:37:20,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-22 02:37:21,191 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11850, loss[loss=0.07658, simple_loss=0.09839, pruned_loss=0.01711, audio_tagging_loss=0.01027, over 14177.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09531, pruned_loss=0.01604, audio_tagging_loss=0.009583, over 3045274.39 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:37:26,196 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264350 2023-11-22 02:37:32,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1762360.0, ans=0.0 2023-11-22 02:37:37,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1762360.0, ans=0.0 2023-11-22 02:37:43,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1762360.0, ans=0.125 2023-11-22 02:38:08,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-22 02:38:22,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-22 02:38:23,846 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11900, loss[loss=0.09492, simple_loss=0.1212, pruned_loss=0.02417, audio_tagging_loss=0.01014, over 15267.00 frames. ], tot_loss[loss=0.07358, simple_loss=0.09584, pruned_loss=0.016, audio_tagging_loss=0.009666, over 3045204.95 frames. ], batch size: 56, lr: 3.07e-03, grad_scale: 8.0 2023-11-22 02:38:27,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1762626.6666666667, ans=0.2 2023-11-22 02:38:28,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264400 2023-11-22 02:38:33,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1762626.6666666667, ans=0.1 2023-11-22 02:38:45,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1762693.3333333333, ans=0.0 2023-11-22 02:38:59,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1762760.0, ans=0.125 2023-11-22 02:39:15,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.906e+01 8.188e+01 8.826e+01 9.362e+01 1.336e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-22 02:39:28,019 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 11950, loss[loss=0.05866, simple_loss=0.07337, pruned_loss=0.009165, audio_tagging_loss=0.01281, over 15634.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.09453, pruned_loss=0.01573, audio_tagging_loss=0.009721, over 3048014.60 frames. ], batch size: 60, lr: 3.07e-03, grad_scale: 8.0 2023-11-22 02:39:33,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264450 2023-11-22 02:39:34,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1762960.0, ans=0.125 2023-11-22 02:39:35,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2023-11-22 02:39:47,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1763026.6666666667, ans=0.2 2023-11-22 02:39:51,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1763026.6666666667, ans=0.0 2023-11-22 02:39:51,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1763026.6666666667, ans=0.1 2023-11-22 02:39:58,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1763093.3333333333, ans=0.125 2023-11-22 02:40:02,472 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:40:07,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1763160.0, ans=0.0 2023-11-22 02:40:28,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1763226.6666666667, ans=0.125 2023-11-22 02:40:31,116 INFO [train_asr.py:1221] (1/4) Epoch 22, batch 12000, loss[loss=0.06198, simple_loss=0.07899, pruned_loss=0.01401, audio_tagging_loss=0.008481, over 14586.00 frames. ], tot_loss[loss=0.07299, simple_loss=0.095, pruned_loss=0.01568, audio_tagging_loss=0.009808, over 3045515.96 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 16.0 2023-11-22 02:40:31,117 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 02:40:59,633 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.1471, 3.1315, 2.6979, 2.7596, 3.3198, 3.4116, 3.0148, 3.6291], device='cuda:1') 2023-11-22 02:41:14,433 INFO [train_asr.py:1253] (1/4) Epoch 22, validation: loss=0.05922, simple_loss=0.05191, pruned_loss=0.005254, audio_tagging_loss=0.02801, over 4681554.00 frames. 2023-11-22 02:41:14,434 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 02:41:19,290 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264500 2023-11-22 02:41:39,177 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:41:40,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1763426.6666666667, ans=0.125 2023-11-22 02:42:19,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-22 02:42:19,902 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 0, loss[loss=0.0672, simple_loss=0.07637, pruned_loss=0.0068, audio_tagging_loss=0.02222, over 14520.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.07637, pruned_loss=0.0068, audio_tagging_loss=0.02222, over 14520.00 frames. ], batch size: 55, lr: 3.00e-03, grad_scale: 32.0 2023-11-22 02:42:19,903 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 02:42:55,355 INFO [train_asr.py:1253] (1/4) Epoch 23, validation: loss=0.05874, simple_loss=0.05183, pruned_loss=0.005194, audio_tagging_loss=0.02763, over 4681554.00 frames. 2023-11-22 02:42:55,356 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 02:43:05,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-22 02:43:06,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2023-11-22 02:43:13,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.370e+01 9.150e+01 9.856e+01 1.621e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-22 02:43:29,093 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:43:30,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264550 2023-11-22 02:43:44,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-22 02:43:48,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1763740.0, ans=0.0 2023-11-22 02:43:56,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1763740.0, ans=0.0 2023-11-22 02:43:58,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1763806.6666666667, ans=0.1 2023-11-22 02:43:59,736 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 50, loss[loss=0.08593, simple_loss=0.1033, pruned_loss=0.0202, audio_tagging_loss=0.01409, over 15325.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.09675, pruned_loss=0.01611, audio_tagging_loss=0.01801, over 687604.19 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:44:04,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1763806.6666666667, ans=0.1 2023-11-22 02:44:07,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1763806.6666666667, ans=0.0 2023-11-22 02:44:13,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1763873.3333333333, ans=0.2 2023-11-22 02:44:35,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264600 2023-11-22 02:45:05,764 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 100, loss[loss=0.0656, simple_loss=0.07154, pruned_loss=0.01577, audio_tagging_loss=0.01407, over 14464.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09287, pruned_loss=0.01552, audio_tagging_loss=0.01743, over 1216079.90 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:45:23,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 9.017e+01 9.520e+01 1.030e+02 1.259e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-22 02:45:30,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-22 02:45:40,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264650 2023-11-22 02:46:02,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1764406.6666666667, ans=0.1 2023-11-22 02:46:06,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-11-22 02:46:11,379 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 150, loss[loss=0.06648, simple_loss=0.08805, pruned_loss=0.01115, audio_tagging_loss=0.0113, over 16485.00 frames. ], tot_loss[loss=0.07869, simple_loss=0.09418, pruned_loss=0.0158, audio_tagging_loss=0.01581, over 1620252.07 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:46:34,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-22 02:46:40,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=15.0 2023-11-22 02:46:41,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1764606.6666666667, ans=0.0 2023-11-22 02:46:46,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264700 2023-11-22 02:47:08,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1764740.0, ans=0.0 2023-11-22 02:47:09,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-22 02:47:16,073 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 200, loss[loss=0.06817, simple_loss=0.08609, pruned_loss=0.01429, audio_tagging_loss=0.01083, over 14837.00 frames. ], tot_loss[loss=0.0765, simple_loss=0.09453, pruned_loss=0.01534, audio_tagging_loss=0.01389, over 1944526.96 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:47:33,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.877e+01 8.445e+01 8.941e+01 9.618e+01 1.553e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 02:47:35,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1764873.3333333333, ans=0.1 2023-11-22 02:47:40,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1764940.0, ans=0.125 2023-11-22 02:47:50,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264750 2023-11-22 02:47:52,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1764940.0, ans=0.0 2023-11-22 02:47:56,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1765006.6666666667, ans=0.2 2023-11-22 02:48:10,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1765073.3333333333, ans=0.1 2023-11-22 02:48:20,503 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 250, loss[loss=0.06602, simple_loss=0.09088, pruned_loss=0.01116, audio_tagging_loss=0.009419, over 15548.00 frames. ], tot_loss[loss=0.07602, simple_loss=0.09544, pruned_loss=0.01575, audio_tagging_loss=0.01255, over 2194819.94 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:48:23,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1765140.0, ans=0.0 2023-11-22 02:48:24,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1765140.0, ans=0.0 2023-11-22 02:48:41,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1765206.6666666667, ans=0.0 2023-11-22 02:48:49,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1765273.3333333333, ans=0.2 2023-11-22 02:48:55,727 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264800 2023-11-22 02:48:55,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1765273.3333333333, ans=0.2 2023-11-22 02:49:00,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1765340.0, ans=0.125 2023-11-22 02:49:11,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=12.0 2023-11-22 02:49:26,627 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 300, loss[loss=0.06434, simple_loss=0.08383, pruned_loss=0.014, audio_tagging_loss=0.008419, over 15382.00 frames. ], tot_loss[loss=0.07491, simple_loss=0.09526, pruned_loss=0.01574, audio_tagging_loss=0.01155, over 2377785.83 frames. ], batch size: 57, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:49:30,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-22 02:49:35,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1765473.3333333333, ans=0.2 2023-11-22 02:49:44,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.292e+01 9.030e+01 9.528e+01 1.689e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-22 02:50:00,655 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 02:50:01,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264850 2023-11-22 02:50:04,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1765673.3333333333, ans=0.0 2023-11-22 02:50:13,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1765673.3333333333, ans=0.125 2023-11-22 02:50:19,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1765740.0, ans=0.0 2023-11-22 02:50:31,833 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 350, loss[loss=0.0632, simple_loss=0.08227, pruned_loss=0.01163, audio_tagging_loss=0.01044, over 14198.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.09469, pruned_loss=0.01567, audio_tagging_loss=0.01094, over 2529194.59 frames. ], batch size: 53, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:50:38,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1765806.6666666667, ans=0.2 2023-11-22 02:50:56,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1765940.0, ans=0.0 2023-11-22 02:51:01,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1765940.0, ans=0.05 2023-11-22 02:51:07,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264900 2023-11-22 02:51:22,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1766073.3333333333, ans=0.0 2023-11-22 02:51:23,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1766073.3333333333, ans=0.1 2023-11-22 02:51:27,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1766073.3333333333, ans=0.0 2023-11-22 02:51:34,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=22.5 2023-11-22 02:51:36,544 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 400, loss[loss=0.05734, simple_loss=0.06756, pruned_loss=0.01233, audio_tagging_loss=0.01123, over 15156.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.09512, pruned_loss=0.01584, audio_tagging_loss=0.01052, over 2652806.13 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 32.0 2023-11-22 02:51:43,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1766140.0, ans=0.0 2023-11-22 02:51:55,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.193e+01 8.089e+01 8.847e+01 9.636e+01 1.262e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 02:51:56,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-22 02:52:10,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=12.0 2023-11-22 02:52:11,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 264950 2023-11-22 02:52:16,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1766340.0, ans=0.125 2023-11-22 02:52:41,678 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 450, loss[loss=0.06398, simple_loss=0.08042, pruned_loss=0.01313, audio_tagging_loss=0.01064, over 15229.00 frames. ], tot_loss[loss=0.07369, simple_loss=0.09536, pruned_loss=0.01583, audio_tagging_loss=0.01017, over 2739678.94 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:52:42,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-22 02:53:16,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1766606.6666666667, ans=0.125 2023-11-22 02:53:16,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1766606.6666666667, ans=0.0 2023-11-22 02:53:17,303 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265000 2023-11-22 02:53:22,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1766673.3333333333, ans=0.125 2023-11-22 02:53:24,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-22 02:53:46,791 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 500, loss[loss=0.04929, simple_loss=0.05399, pruned_loss=0.01067, audio_tagging_loss=0.01162, over 14704.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09459, pruned_loss=0.01566, audio_tagging_loss=0.009989, over 2810001.71 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:53:51,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1766806.6666666667, ans=0.0 2023-11-22 02:54:06,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.266e+01 9.127e+01 9.856e+01 1.336e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-22 02:54:21,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265050 2023-11-22 02:54:23,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1766940.0, ans=0.0 2023-11-22 02:54:25,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-11-22 02:54:26,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1767006.6666666667, ans=0.0 2023-11-22 02:54:28,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1767006.6666666667, ans=0.125 2023-11-22 02:54:36,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-22 02:54:51,593 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 550, loss[loss=0.07131, simple_loss=0.09445, pruned_loss=0.01576, audio_tagging_loss=0.008327, over 16183.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09374, pruned_loss=0.01541, audio_tagging_loss=0.009922, over 2855046.58 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:54:56,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1767140.0, ans=0.125 2023-11-22 02:55:03,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1767206.6666666667, ans=0.125 2023-11-22 02:55:03,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1767206.6666666667, ans=0.2 2023-11-22 02:55:18,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1767273.3333333333, ans=0.125 2023-11-22 02:55:26,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265100 2023-11-22 02:55:41,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1767340.0, ans=0.0 2023-11-22 02:55:55,627 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 600, loss[loss=0.07841, simple_loss=0.1042, pruned_loss=0.01659, audio_tagging_loss=0.009718, over 14958.00 frames. ], tot_loss[loss=0.0725, simple_loss=0.09411, pruned_loss=0.01563, audio_tagging_loss=0.009818, over 2901300.73 frames. ], batch size: 55, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:56:00,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1767473.3333333333, ans=0.125 2023-11-22 02:56:04,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1767473.3333333333, ans=0.125 2023-11-22 02:56:15,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.111e+01 8.755e+01 9.380e+01 1.139e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 02:56:31,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265150 2023-11-22 02:56:31,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1767606.6666666667, ans=0.09899494936611666 2023-11-22 02:56:32,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-22 02:57:01,405 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 650, loss[loss=0.07195, simple_loss=0.09451, pruned_loss=0.01595, audio_tagging_loss=0.008743, over 16047.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.09485, pruned_loss=0.0156, audio_tagging_loss=0.009715, over 2929517.63 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:57:26,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2023-11-22 02:57:35,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265200 2023-11-22 02:57:44,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1768006.6666666667, ans=0.0 2023-11-22 02:58:06,640 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 700, loss[loss=0.06601, simple_loss=0.08427, pruned_loss=0.01392, audio_tagging_loss=0.009964, over 13991.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09447, pruned_loss=0.01563, audio_tagging_loss=0.009754, over 2950690.91 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 02:58:17,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1768140.0, ans=0.0 2023-11-22 02:58:22,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1768206.6666666667, ans=0.125 2023-11-22 02:58:26,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.743e+01 8.011e+01 8.675e+01 9.413e+01 1.282e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-22 02:58:31,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1768273.3333333333, ans=0.2 2023-11-22 02:58:42,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265250 2023-11-22 02:59:11,952 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 750, loss[loss=0.08341, simple_loss=0.108, pruned_loss=0.0217, audio_tagging_loss=0.007731, over 14546.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.09508, pruned_loss=0.0157, audio_tagging_loss=0.009736, over 2975228.17 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 02:59:18,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1768473.3333333333, ans=0.0 2023-11-22 02:59:24,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1768540.0, ans=0.125 2023-11-22 02:59:36,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1768540.0, ans=0.125 2023-11-22 02:59:47,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265300 2023-11-22 02:59:54,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1768673.3333333333, ans=0.1 2023-11-22 03:00:12,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1768740.0, ans=0.05 2023-11-22 03:00:17,346 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 800, loss[loss=0.079, simple_loss=0.1099, pruned_loss=0.01744, audio_tagging_loss=0.006631, over 15808.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09548, pruned_loss=0.01579, audio_tagging_loss=0.009584, over 2992490.30 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:00:18,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=22.5 2023-11-22 03:00:25,028 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:00:39,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.565e+01 9.192e+01 1.025e+02 1.525e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-22 03:00:53,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265350 2023-11-22 03:01:03,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1769006.6666666667, ans=0.0 2023-11-22 03:01:05,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1769006.6666666667, ans=0.2 2023-11-22 03:01:15,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1769073.3333333333, ans=0.0 2023-11-22 03:01:23,382 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 850, loss[loss=0.06867, simple_loss=0.09048, pruned_loss=0.01407, audio_tagging_loss=0.009359, over 16056.00 frames. ], tot_loss[loss=0.07339, simple_loss=0.09558, pruned_loss=0.01582, audio_tagging_loss=0.009779, over 3006499.90 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:01:35,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1769206.6666666667, ans=0.125 2023-11-22 03:01:52,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1769273.3333333333, ans=0.0 2023-11-22 03:01:58,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265400 2023-11-22 03:02:19,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1769406.6666666667, ans=0.125 2023-11-22 03:02:22,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1769406.6666666667, ans=0.125 2023-11-22 03:02:28,382 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 900, loss[loss=0.0694, simple_loss=0.09595, pruned_loss=0.01529, audio_tagging_loss=0.006135, over 15849.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09495, pruned_loss=0.01581, audio_tagging_loss=0.009803, over 3014842.61 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:02:34,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1769473.3333333333, ans=0.2 2023-11-22 03:02:50,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.531e+01 8.073e+01 8.740e+01 9.361e+01 1.142e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 03:02:56,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1769606.6666666667, ans=0.125 2023-11-22 03:02:59,393 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:03:01,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1769606.6666666667, ans=0.0 2023-11-22 03:03:04,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265450 2023-11-22 03:03:12,561 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:03:24,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1769740.0, ans=0.2 2023-11-22 03:03:33,880 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 950, loss[loss=0.0725, simple_loss=0.09457, pruned_loss=0.01738, audio_tagging_loss=0.007834, over 14933.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.09562, pruned_loss=0.01586, audio_tagging_loss=0.009665, over 3020734.10 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:03:38,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-22 03:04:08,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265500 2023-11-22 03:04:11,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1770006.6666666667, ans=0.0 2023-11-22 03:04:11,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1770006.6666666667, ans=0.2 2023-11-22 03:04:29,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1770073.3333333333, ans=0.125 2023-11-22 03:04:39,205 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1000, loss[loss=0.06911, simple_loss=0.08635, pruned_loss=0.01651, audio_tagging_loss=0.009423, over 15640.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09505, pruned_loss=0.01585, audio_tagging_loss=0.009589, over 3023555.54 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:04:49,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1770140.0, ans=0.1 2023-11-22 03:05:00,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.104e+01 8.710e+01 9.492e+01 1.362e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 03:05:05,612 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:05:06,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1770273.3333333333, ans=0.125 2023-11-22 03:05:12,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1770273.3333333333, ans=0.025 2023-11-22 03:05:13,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265550 2023-11-22 03:05:28,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1770340.0, ans=0.125 2023-11-22 03:05:37,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1770406.6666666667, ans=15.0 2023-11-22 03:05:43,705 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1050, loss[loss=0.05992, simple_loss=0.07861, pruned_loss=0.0109, audio_tagging_loss=0.009713, over 15392.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09521, pruned_loss=0.01572, audio_tagging_loss=0.009469, over 3022948.31 frames. ], batch size: 58, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:05:43,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1770473.3333333333, ans=0.125 2023-11-22 03:06:19,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265600 2023-11-22 03:06:42,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1770740.0, ans=0.125 2023-11-22 03:06:45,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1770740.0, ans=0.035 2023-11-22 03:06:49,311 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1100, loss[loss=0.07528, simple_loss=0.104, pruned_loss=0.01405, audio_tagging_loss=0.00924, over 15025.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09527, pruned_loss=0.01562, audio_tagging_loss=0.009339, over 3024541.59 frames. ], batch size: 55, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:06:51,778 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:06:54,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1770806.6666666667, ans=0.125 2023-11-22 03:07:10,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1770873.3333333333, ans=0.125 2023-11-22 03:07:11,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.139e+01 7.983e+01 8.697e+01 9.396e+01 1.258e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 03:07:22,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1770940.0, ans=0.05 2023-11-22 03:07:23,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1770940.0, ans=0.125 2023-11-22 03:07:24,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265650 2023-11-22 03:07:24,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1770940.0, ans=0.125 2023-11-22 03:07:45,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1771073.3333333333, ans=0.125 2023-11-22 03:07:47,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-22 03:07:54,472 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1150, loss[loss=0.0617, simple_loss=0.08206, pruned_loss=0.01253, audio_tagging_loss=0.008134, over 13841.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09442, pruned_loss=0.01551, audio_tagging_loss=0.00936, over 3028878.31 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 8.0 2023-11-22 03:07:58,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1771140.0, ans=0.125 2023-11-22 03:08:03,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1771140.0, ans=0.125 2023-11-22 03:08:08,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1771206.6666666667, ans=0.125 2023-11-22 03:08:08,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1771206.6666666667, ans=0.1 2023-11-22 03:08:09,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1771206.6666666667, ans=0.125 2023-11-22 03:08:29,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265700 2023-11-22 03:08:31,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1771273.3333333333, ans=0.125 2023-11-22 03:08:32,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2023-11-22 03:08:37,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1771340.0, ans=0.125 2023-11-22 03:08:39,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2023-11-22 03:08:40,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1771340.0, ans=0.125 2023-11-22 03:08:44,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1771340.0, ans=0.125 2023-11-22 03:08:48,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-22 03:08:51,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1771406.6666666667, ans=0.0 2023-11-22 03:08:58,919 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:08:59,955 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1200, loss[loss=0.06378, simple_loss=0.07544, pruned_loss=0.01397, audio_tagging_loss=0.01209, over 14536.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09444, pruned_loss=0.01553, audio_tagging_loss=0.009341, over 3030345.02 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:09:21,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.059e+01 8.626e+01 9.376e+01 1.132e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 03:09:31,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1771606.6666666667, ans=0.05 2023-11-22 03:09:34,687 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265750 2023-11-22 03:10:01,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1771740.0, ans=0.125 2023-11-22 03:10:04,602 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1250, loss[loss=0.05903, simple_loss=0.06967, pruned_loss=0.01542, audio_tagging_loss=0.008778, over 14207.00 frames. ], tot_loss[loss=0.07214, simple_loss=0.0945, pruned_loss=0.01562, audio_tagging_loss=0.009269, over 3032069.59 frames. ], batch size: 55, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:10:07,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-22 03:10:12,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1771806.6666666667, ans=0.0 2023-11-22 03:10:25,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1771873.3333333333, ans=0.125 2023-11-22 03:10:39,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265800 2023-11-22 03:11:07,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1772073.3333333333, ans=0.0 2023-11-22 03:11:10,106 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1300, loss[loss=0.06094, simple_loss=0.07268, pruned_loss=0.01146, audio_tagging_loss=0.01314, over 15238.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09459, pruned_loss=0.01575, audio_tagging_loss=0.009306, over 3030025.12 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:11:32,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.081e+01 8.746e+01 9.574e+01 1.188e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 03:11:45,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265850 2023-11-22 03:11:46,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1772273.3333333333, ans=10.0 2023-11-22 03:11:55,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1772340.0, ans=0.0 2023-11-22 03:11:57,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1772340.0, ans=0.125 2023-11-22 03:12:09,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1772406.6666666667, ans=0.125 2023-11-22 03:12:14,910 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1350, loss[loss=0.09159, simple_loss=0.1326, pruned_loss=0.01862, audio_tagging_loss=0.006663, over 14469.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09529, pruned_loss=0.01577, audio_tagging_loss=0.009206, over 3028367.77 frames. ], batch size: 52, lr: 3.00e-03, grad_scale: 16.0 2023-11-22 03:12:30,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1772540.0, ans=0.05 2023-11-22 03:12:48,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1772606.6666666667, ans=0.2 2023-11-22 03:12:49,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265900 2023-11-22 03:12:52,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1772673.3333333333, ans=0.125 2023-11-22 03:13:01,462 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:13:04,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1772673.3333333333, ans=0.0 2023-11-22 03:13:08,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1772740.0, ans=0.125 2023-11-22 03:13:14,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1772740.0, ans=0.125 2023-11-22 03:13:19,373 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1400, loss[loss=0.05778, simple_loss=0.07235, pruned_loss=0.01189, audio_tagging_loss=0.009708, over 14134.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09557, pruned_loss=0.01587, audio_tagging_loss=0.009288, over 3042615.00 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:13:41,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 8.105e+01 8.866e+01 9.820e+01 1.190e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 03:13:45,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1772940.0, ans=0.125 2023-11-22 03:13:54,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 265950 2023-11-22 03:14:02,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1773006.6666666667, ans=0.0 2023-11-22 03:14:09,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1773073.3333333333, ans=0.125 2023-11-22 03:14:22,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1773140.0, ans=0.5 2023-11-22 03:14:23,820 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1450, loss[loss=0.064, simple_loss=0.07255, pruned_loss=0.01685, audio_tagging_loss=0.01088, over 14722.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09544, pruned_loss=0.01591, audio_tagging_loss=0.009369, over 3044480.93 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:14:31,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1773140.0, ans=0.125 2023-11-22 03:14:33,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1773140.0, ans=0.125 2023-11-22 03:14:34,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1773140.0, ans=0.1 2023-11-22 03:14:48,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1773273.3333333333, ans=0.0 2023-11-22 03:14:57,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1773273.3333333333, ans=0.125 2023-11-22 03:14:58,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266000 2023-11-22 03:15:16,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1773406.6666666667, ans=0.0 2023-11-22 03:15:19,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1773406.6666666667, ans=0.125 2023-11-22 03:15:26,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1773406.6666666667, ans=0.125 2023-11-22 03:15:28,955 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1500, loss[loss=0.06275, simple_loss=0.08079, pruned_loss=0.009759, audio_tagging_loss=0.0126, over 15936.00 frames. ], tot_loss[loss=0.0731, simple_loss=0.09529, pruned_loss=0.01593, audio_tagging_loss=0.009525, over 3046081.90 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:15:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1773473.3333333333, ans=0.125 2023-11-22 03:15:51,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.300e+01 8.986e+01 9.912e+01 1.319e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 03:16:00,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:16:04,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266050 2023-11-22 03:16:25,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-22 03:16:33,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1773806.6666666667, ans=0.0 2023-11-22 03:16:34,335 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1550, loss[loss=0.06542, simple_loss=0.07736, pruned_loss=0.01265, audio_tagging_loss=0.01409, over 15254.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.0954, pruned_loss=0.0159, audio_tagging_loss=0.009528, over 3052072.34 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:16:55,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1773873.3333333333, ans=0.02 2023-11-22 03:16:57,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1773873.3333333333, ans=0.07 2023-11-22 03:17:09,183 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266100 2023-11-22 03:17:19,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1774006.6666666667, ans=0.125 2023-11-22 03:17:24,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1774073.3333333333, ans=0.125 2023-11-22 03:17:37,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1774140.0, ans=0.125 2023-11-22 03:17:37,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1774140.0, ans=0.125 2023-11-22 03:17:38,616 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1600, loss[loss=0.05772, simple_loss=0.07346, pruned_loss=0.01065, audio_tagging_loss=0.01035, over 15549.00 frames. ], tot_loss[loss=0.07239, simple_loss=0.0943, pruned_loss=0.01555, audio_tagging_loss=0.009688, over 3047977.63 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:18:00,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.210e+01 8.853e+01 9.521e+01 1.233e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 03:18:02,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1774206.6666666667, ans=0.1 2023-11-22 03:18:07,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1774273.3333333333, ans=0.1 2023-11-22 03:18:08,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1774273.3333333333, ans=0.1 2023-11-22 03:18:13,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266150 2023-11-22 03:18:24,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1774340.0, ans=0.0 2023-11-22 03:18:32,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1774406.6666666667, ans=0.125 2023-11-22 03:18:43,446 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1650, loss[loss=0.08232, simple_loss=0.09097, pruned_loss=0.02722, audio_tagging_loss=0.009619, over 14418.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09429, pruned_loss=0.01552, audio_tagging_loss=0.009707, over 3042730.81 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:18:53,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1774473.3333333333, ans=0.0 2023-11-22 03:19:18,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266200 2023-11-22 03:19:39,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-22 03:19:47,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1774806.6666666667, ans=0.125 2023-11-22 03:19:48,714 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1700, loss[loss=0.07321, simple_loss=0.09752, pruned_loss=0.01426, audio_tagging_loss=0.0102, over 16124.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09464, pruned_loss=0.01551, audio_tagging_loss=0.009699, over 3044869.30 frames. ], batch size: 63, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:19:49,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-22 03:20:05,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1774873.3333333333, ans=0.0 2023-11-22 03:20:06,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1774873.3333333333, ans=0.125 2023-11-22 03:20:11,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.074e+01 8.655e+01 9.245e+01 1.270e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-22 03:20:24,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266250 2023-11-22 03:20:33,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1775006.6666666667, ans=0.125 2023-11-22 03:20:49,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1775073.3333333333, ans=0.2 2023-11-22 03:20:51,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-11-22 03:20:54,026 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1750, loss[loss=0.05651, simple_loss=0.07105, pruned_loss=0.009027, audio_tagging_loss=0.01196, over 15730.00 frames. ], tot_loss[loss=0.07221, simple_loss=0.09443, pruned_loss=0.01538, audio_tagging_loss=0.009616, over 3047833.38 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:21:08,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1775206.6666666667, ans=0.125 2023-11-22 03:21:20,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1775273.3333333333, ans=0.05 2023-11-22 03:21:28,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266300 2023-11-22 03:21:30,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1775273.3333333333, ans=0.1 2023-11-22 03:21:41,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1775340.0, ans=0.07 2023-11-22 03:21:58,587 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1800, loss[loss=0.05643, simple_loss=0.07182, pruned_loss=0.01127, audio_tagging_loss=0.009249, over 15255.00 frames. ], tot_loss[loss=0.07254, simple_loss=0.09485, pruned_loss=0.01559, audio_tagging_loss=0.009524, over 3045555.34 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:22:03,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1775473.3333333333, ans=0.125 2023-11-22 03:22:05,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1775473.3333333333, ans=0.1 2023-11-22 03:22:21,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.483e+01 7.973e+01 8.651e+01 9.483e+01 1.357e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-22 03:22:33,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266350 2023-11-22 03:22:52,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1775740.0, ans=0.1 2023-11-22 03:22:58,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1775740.0, ans=0.0 2023-11-22 03:23:02,237 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1850, loss[loss=0.09419, simple_loss=0.1272, pruned_loss=0.02243, audio_tagging_loss=0.008164, over 15558.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09547, pruned_loss=0.01586, audio_tagging_loss=0.009413, over 3043055.93 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:23:07,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1775806.6666666667, ans=0.125 2023-11-22 03:23:09,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1775806.6666666667, ans=0.125 2023-11-22 03:23:15,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1775873.3333333333, ans=0.0 2023-11-22 03:23:37,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266400 2023-11-22 03:24:06,597 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1900, loss[loss=0.06965, simple_loss=0.09256, pruned_loss=0.01417, audio_tagging_loss=0.009202, over 14458.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09511, pruned_loss=0.01579, audio_tagging_loss=0.009314, over 3043074.87 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:24:28,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-22 03:24:30,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.182e+01 8.329e+01 9.047e+01 9.606e+01 1.236e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-22 03:24:30,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1776206.6666666667, ans=0.09899494936611666 2023-11-22 03:24:39,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1776273.3333333333, ans=0.0 2023-11-22 03:24:41,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266450 2023-11-22 03:24:48,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1776340.0, ans=0.1 2023-11-22 03:25:11,704 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 1950, loss[loss=0.07314, simple_loss=0.1085, pruned_loss=0.01077, audio_tagging_loss=0.008101, over 17179.00 frames. ], tot_loss[loss=0.07263, simple_loss=0.09498, pruned_loss=0.01577, audio_tagging_loss=0.009375, over 3048387.70 frames. ], batch size: 62, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:25:15,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1776473.3333333333, ans=0.125 2023-11-22 03:25:16,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1776473.3333333333, ans=0.1 2023-11-22 03:25:23,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1776540.0, ans=0.1 2023-11-22 03:25:46,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266500 2023-11-22 03:25:50,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1776673.3333333333, ans=0.1 2023-11-22 03:26:14,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2023-11-22 03:26:16,491 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2000, loss[loss=0.07038, simple_loss=0.09261, pruned_loss=0.01353, audio_tagging_loss=0.01054, over 15093.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09474, pruned_loss=0.01571, audio_tagging_loss=0.009377, over 3051471.28 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:26:39,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.529e+01 7.854e+01 8.461e+01 9.076e+01 1.159e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-22 03:26:43,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2023-11-22 03:26:51,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266550 2023-11-22 03:27:07,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1777073.3333333333, ans=0.0 2023-11-22 03:27:21,306 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2050, loss[loss=0.08637, simple_loss=0.1136, pruned_loss=0.02173, audio_tagging_loss=0.007836, over 14587.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09597, pruned_loss=0.01582, audio_tagging_loss=0.009313, over 3055350.30 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:27:22,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1777140.0, ans=0.125 2023-11-22 03:27:46,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1777273.3333333333, ans=0.0 2023-11-22 03:27:46,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1777273.3333333333, ans=0.125 2023-11-22 03:27:56,182 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266600 2023-11-22 03:27:59,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1777340.0, ans=0.1 2023-11-22 03:28:27,068 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2100, loss[loss=0.06277, simple_loss=0.08712, pruned_loss=0.01051, audio_tagging_loss=0.008705, over 15293.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09619, pruned_loss=0.01569, audio_tagging_loss=0.009216, over 3051170.19 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:28:34,724 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:28:47,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1777540.0, ans=0.1 2023-11-22 03:28:49,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.317e+01 8.958e+01 9.611e+01 1.215e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 03:28:55,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1777606.6666666667, ans=0.125 2023-11-22 03:29:01,618 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266650 2023-11-22 03:29:03,119 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.830e-02 2023-11-22 03:29:15,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1777673.3333333333, ans=0.07 2023-11-22 03:29:20,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1777740.0, ans=0.0 2023-11-22 03:29:25,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1777740.0, ans=0.2 2023-11-22 03:29:31,453 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2150, loss[loss=0.06124, simple_loss=0.07655, pruned_loss=0.0113, audio_tagging_loss=0.01166, over 15945.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09559, pruned_loss=0.01559, audio_tagging_loss=0.009233, over 3041806.21 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:29:38,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-11-22 03:29:40,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-22 03:29:51,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1777873.3333333333, ans=0.04949747468305833 2023-11-22 03:30:07,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266700 2023-11-22 03:30:10,802 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:30:36,528 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2200, loss[loss=0.07293, simple_loss=0.08343, pruned_loss=0.02169, audio_tagging_loss=0.009534, over 13863.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09585, pruned_loss=0.01577, audio_tagging_loss=0.009183, over 3045348.73 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:31:01,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.446e+01 9.352e+01 1.016e+02 2.574e+02, threshold=1.870e+02, percent-clipped=1.0 2023-11-22 03:31:12,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266750 2023-11-22 03:31:17,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1778340.0, ans=10.0 2023-11-22 03:31:25,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1778340.0, ans=0.1 2023-11-22 03:31:36,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1778406.6666666667, ans=0.0 2023-11-22 03:31:41,882 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2250, loss[loss=0.07982, simple_loss=0.11, pruned_loss=0.01812, audio_tagging_loss=0.006699, over 15670.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09682, pruned_loss=0.01605, audio_tagging_loss=0.009207, over 3050012.35 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:31:48,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1778473.3333333333, ans=0.125 2023-11-22 03:31:50,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1778473.3333333333, ans=0.2 2023-11-22 03:32:03,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2023-11-22 03:32:14,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1778606.6666666667, ans=0.0 2023-11-22 03:32:17,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266800 2023-11-22 03:32:25,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1778673.3333333333, ans=0.125 2023-11-22 03:32:33,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2023-11-22 03:32:39,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1778740.0, ans=0.125 2023-11-22 03:32:47,668 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2300, loss[loss=0.06092, simple_loss=0.07708, pruned_loss=0.01281, audio_tagging_loss=0.009561, over 15575.00 frames. ], tot_loss[loss=0.07336, simple_loss=0.09628, pruned_loss=0.01595, audio_tagging_loss=0.009271, over 3045961.46 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:32:58,975 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:33:09,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1778873.3333333333, ans=0.0 2023-11-22 03:33:11,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.208e+01 8.733e+01 9.600e+01 1.238e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 03:33:15,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1778940.0, ans=0.1 2023-11-22 03:33:20,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1778940.0, ans=0.0 2023-11-22 03:33:22,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266850 2023-11-22 03:33:43,842 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:33:52,746 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2350, loss[loss=0.06731, simple_loss=0.08237, pruned_loss=0.01282, audio_tagging_loss=0.0133, over 15313.00 frames. ], tot_loss[loss=0.07329, simple_loss=0.09598, pruned_loss=0.01588, audio_tagging_loss=0.00942, over 3043632.44 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:33:58,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1779140.0, ans=0.2 2023-11-22 03:34:28,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266900 2023-11-22 03:34:37,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1779340.0, ans=0.125 2023-11-22 03:34:50,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-11-22 03:34:57,447 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2400, loss[loss=0.07001, simple_loss=0.09061, pruned_loss=0.01265, audio_tagging_loss=0.01206, over 15009.00 frames. ], tot_loss[loss=0.07332, simple_loss=0.09588, pruned_loss=0.01583, audio_tagging_loss=0.00955, over 3045440.30 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:35:20,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1779540.0, ans=0.125 2023-11-22 03:35:22,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 7.925e+01 8.589e+01 9.306e+01 1.063e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-22 03:35:30,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1779606.6666666667, ans=0.0 2023-11-22 03:35:33,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 266950 2023-11-22 03:35:41,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1779673.3333333333, ans=0.125 2023-11-22 03:35:58,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1779740.0, ans=0.0 2023-11-22 03:36:03,420 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2450, loss[loss=0.06699, simple_loss=0.08633, pruned_loss=0.013, audio_tagging_loss=0.01082, over 14416.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.09479, pruned_loss=0.01562, audio_tagging_loss=0.009676, over 3045216.09 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:36:23,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1779873.3333333333, ans=0.1 2023-11-22 03:36:36,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1779940.0, ans=0.125 2023-11-22 03:36:39,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267000 2023-11-22 03:37:05,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1780073.3333333333, ans=0.05 2023-11-22 03:37:05,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1780073.3333333333, ans=0.0 2023-11-22 03:37:09,372 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2500, loss[loss=0.0628, simple_loss=0.07522, pruned_loss=0.01592, audio_tagging_loss=0.009276, over 15085.00 frames. ], tot_loss[loss=0.0728, simple_loss=0.09499, pruned_loss=0.01567, audio_tagging_loss=0.009633, over 3043730.44 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:37:09,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1780140.0, ans=0.2 2023-11-22 03:37:10,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1780140.0, ans=0.125 2023-11-22 03:37:21,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1780206.6666666667, ans=0.0 2023-11-22 03:37:34,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.102e+01 8.757e+01 9.446e+01 1.143e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 03:37:45,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267050 2023-11-22 03:37:49,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1780340.0, ans=0.125 2023-11-22 03:38:04,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-22 03:38:14,913 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2550, loss[loss=0.06979, simple_loss=0.09228, pruned_loss=0.01476, audio_tagging_loss=0.008883, over 15674.00 frames. ], tot_loss[loss=0.0733, simple_loss=0.09588, pruned_loss=0.01591, audio_tagging_loss=0.009448, over 3049783.60 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:38:39,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1780540.0, ans=0.0 2023-11-22 03:38:45,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1780606.6666666667, ans=0.2 2023-11-22 03:38:50,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267100 2023-11-22 03:38:50,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1780606.6666666667, ans=0.95 2023-11-22 03:39:02,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1780673.3333333333, ans=0.025 2023-11-22 03:39:07,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1780740.0, ans=0.1 2023-11-22 03:39:07,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1780740.0, ans=0.05 2023-11-22 03:39:07,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-22 03:39:20,354 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2600, loss[loss=0.0599, simple_loss=0.07624, pruned_loss=0.01312, audio_tagging_loss=0.00866, over 15814.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09483, pruned_loss=0.01561, audio_tagging_loss=0.009262, over 3047588.29 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:39:30,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1780806.6666666667, ans=0.125 2023-11-22 03:39:45,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.100e+01 8.795e+01 9.749e+01 1.306e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 03:39:55,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267150 2023-11-22 03:40:25,442 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2650, loss[loss=0.07319, simple_loss=0.1042, pruned_loss=0.01541, audio_tagging_loss=0.00566, over 15644.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09616, pruned_loss=0.01573, audio_tagging_loss=0.009153, over 3049318.91 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:40:35,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1781140.0, ans=0.2 2023-11-22 03:40:53,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2023-11-22 03:40:55,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-22 03:40:59,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1781273.3333333333, ans=0.5 2023-11-22 03:40:59,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2023-11-22 03:41:00,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267200 2023-11-22 03:41:08,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1781340.0, ans=0.0 2023-11-22 03:41:30,830 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2700, loss[loss=0.0708, simple_loss=0.09636, pruned_loss=0.01525, audio_tagging_loss=0.007372, over 13943.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09577, pruned_loss=0.01552, audio_tagging_loss=0.009147, over 3050646.90 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:41:31,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1781473.3333333333, ans=0.1 2023-11-22 03:41:45,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1781540.0, ans=0.05 2023-11-22 03:41:49,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-11-22 03:41:54,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1781540.0, ans=0.125 2023-11-22 03:41:57,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.172e+01 8.019e+01 8.519e+01 9.203e+01 1.202e+02, threshold=1.704e+02, percent-clipped=0.0 2023-11-22 03:42:00,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1781606.6666666667, ans=0.125 2023-11-22 03:42:06,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267250 2023-11-22 03:42:20,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=22.5 2023-11-22 03:42:36,485 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2750, loss[loss=0.06353, simple_loss=0.08067, pruned_loss=0.01285, audio_tagging_loss=0.01035, over 13994.00 frames. ], tot_loss[loss=0.07276, simple_loss=0.09593, pruned_loss=0.01559, audio_tagging_loss=0.009201, over 3044406.36 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:42:39,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1781806.6666666667, ans=0.1 2023-11-22 03:42:42,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1781806.6666666667, ans=0.2 2023-11-22 03:42:42,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1781806.6666666667, ans=0.1 2023-11-22 03:43:08,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-22 03:43:11,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267300 2023-11-22 03:43:17,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1782006.6666666667, ans=0.0 2023-11-22 03:43:21,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1782006.6666666667, ans=0.125 2023-11-22 03:43:31,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1782073.3333333333, ans=0.125 2023-11-22 03:43:32,609 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:43:36,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1782073.3333333333, ans=0.125 2023-11-22 03:43:41,235 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2800, loss[loss=0.06521, simple_loss=0.07763, pruned_loss=0.01376, audio_tagging_loss=0.01264, over 15274.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.0954, pruned_loss=0.01555, audio_tagging_loss=0.009281, over 3047548.72 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 32.0 2023-11-22 03:43:53,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-22 03:44:07,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.173e+01 8.676e+01 9.379e+01 1.430e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-22 03:44:09,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1782273.3333333333, ans=0.0 2023-11-22 03:44:13,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1782273.3333333333, ans=0.125 2023-11-22 03:44:16,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267350 2023-11-22 03:44:17,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-22 03:44:18,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1782273.3333333333, ans=0.07 2023-11-22 03:44:25,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1782340.0, ans=0.0 2023-11-22 03:44:46,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1782473.3333333333, ans=0.125 2023-11-22 03:44:47,356 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2850, loss[loss=0.05941, simple_loss=0.07669, pruned_loss=0.01016, audio_tagging_loss=0.0109, over 14221.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09366, pruned_loss=0.01532, audio_tagging_loss=0.009356, over 3048589.72 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:45:12,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1782606.6666666667, ans=0.125 2023-11-22 03:45:22,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267400 2023-11-22 03:45:37,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1782673.3333333333, ans=0.025 2023-11-22 03:45:42,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-22 03:45:43,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1782740.0, ans=0.0 2023-11-22 03:45:45,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1782740.0, ans=0.0 2023-11-22 03:45:48,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1782740.0, ans=0.2 2023-11-22 03:45:52,367 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2900, loss[loss=0.0766, simple_loss=0.1017, pruned_loss=0.0156, audio_tagging_loss=0.01015, over 16229.00 frames. ], tot_loss[loss=0.07112, simple_loss=0.09289, pruned_loss=0.01528, audio_tagging_loss=0.009401, over 3042965.08 frames. ], batch size: 61, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:46:18,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1782940.0, ans=0.125 2023-11-22 03:46:19,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.173e+01 8.778e+01 9.450e+01 1.328e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 03:46:27,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267450 2023-11-22 03:46:27,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1782940.0, ans=0.125 2023-11-22 03:46:46,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1783073.3333333333, ans=0.2 2023-11-22 03:46:49,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1783073.3333333333, ans=0.125 2023-11-22 03:46:56,571 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 2950, loss[loss=0.06941, simple_loss=0.0834, pruned_loss=0.01599, audio_tagging_loss=0.01171, over 15341.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09365, pruned_loss=0.01542, audio_tagging_loss=0.00938, over 3045621.24 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:46:56,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1783140.0, ans=0.0 2023-11-22 03:47:08,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783206.6666666667, ans=0.1 2023-11-22 03:47:15,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1783206.6666666667, ans=0.125 2023-11-22 03:47:21,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1783273.3333333333, ans=0.0 2023-11-22 03:47:25,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-11-22 03:47:26,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1783273.3333333333, ans=0.125 2023-11-22 03:47:31,600 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267500 2023-11-22 03:47:41,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-22 03:47:43,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-22 03:47:55,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1783406.6666666667, ans=0.125 2023-11-22 03:48:01,446 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3000, loss[loss=0.09283, simple_loss=0.134, pruned_loss=0.01933, audio_tagging_loss=0.006481, over 16405.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09471, pruned_loss=0.0157, audio_tagging_loss=0.009399, over 3052054.96 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 16.0 2023-11-22 03:48:01,446 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 03:48:34,243 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8893, 3.7129, 4.8605, 4.3971], device='cuda:1') 2023-11-22 03:48:41,392 INFO [train_asr.py:1253] (1/4) Epoch 23, validation: loss=0.05946, simple_loss=0.05181, pruned_loss=0.005129, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-22 03:48:41,393 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 03:48:48,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783473.3333333333, ans=0.1 2023-11-22 03:49:07,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.414e+01 8.252e+01 8.878e+01 9.473e+01 1.263e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-22 03:49:10,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1783606.6666666667, ans=0.015 2023-11-22 03:49:15,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267550 2023-11-22 03:49:44,971 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3050, loss[loss=0.06421, simple_loss=0.0771, pruned_loss=0.01511, audio_tagging_loss=0.01055, over 14917.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.0944, pruned_loss=0.01568, audio_tagging_loss=0.009532, over 3052436.54 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 8.0 2023-11-22 03:49:50,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1783806.6666666667, ans=0.125 2023-11-22 03:49:50,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1783806.6666666667, ans=0.0 2023-11-22 03:50:20,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267600 2023-11-22 03:50:23,286 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:50:26,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-22 03:50:39,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1784073.3333333333, ans=0.0 2023-11-22 03:50:49,202 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3100, loss[loss=0.05914, simple_loss=0.07956, pruned_loss=0.01201, audio_tagging_loss=0.007352, over 14778.00 frames. ], tot_loss[loss=0.07211, simple_loss=0.09409, pruned_loss=0.01544, audio_tagging_loss=0.009628, over 3049518.65 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 8.0 2023-11-22 03:50:56,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1784140.0, ans=0.125 2023-11-22 03:51:05,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1784206.6666666667, ans=0.125 2023-11-22 03:51:17,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.127e+01 8.791e+01 9.439e+01 1.386e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 03:51:22,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1784273.3333333333, ans=0.125 2023-11-22 03:51:23,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267650 2023-11-22 03:51:53,225 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3150, loss[loss=0.07782, simple_loss=0.1031, pruned_loss=0.01614, audio_tagging_loss=0.01012, over 16568.00 frames. ], tot_loss[loss=0.07289, simple_loss=0.09535, pruned_loss=0.01555, audio_tagging_loss=0.009661, over 3048215.49 frames. ], batch size: 61, lr: 2.99e-03, grad_scale: 8.0 2023-11-22 03:51:56,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1784473.3333333333, ans=0.0 2023-11-22 03:52:13,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1784540.0, ans=0.2 2023-11-22 03:52:15,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1784540.0, ans=0.125 2023-11-22 03:52:27,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267700 2023-11-22 03:52:35,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1784673.3333333333, ans=0.0 2023-11-22 03:52:48,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1784740.0, ans=0.125 2023-11-22 03:52:57,569 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3200, loss[loss=0.08769, simple_loss=0.1139, pruned_loss=0.02082, audio_tagging_loss=0.009916, over 15419.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09437, pruned_loss=0.0154, audio_tagging_loss=0.009727, over 3050863.39 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 03:52:57,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1784806.6666666667, ans=0.125 2023-11-22 03:53:10,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1784873.3333333333, ans=0.0 2023-11-22 03:53:14,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1784873.3333333333, ans=0.2 2023-11-22 03:53:21,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1784940.0, ans=0.125 2023-11-22 03:53:26,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.278e+01 8.640e+01 9.610e+01 1.233e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-22 03:53:33,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267750 2023-11-22 03:53:36,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1785006.6666666667, ans=0.2 2023-11-22 03:54:01,748 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3250, loss[loss=0.05904, simple_loss=0.07448, pruned_loss=0.01074, audio_tagging_loss=0.01105, over 15849.00 frames. ], tot_loss[loss=0.07299, simple_loss=0.09558, pruned_loss=0.01554, audio_tagging_loss=0.009661, over 3059906.03 frames. ], batch size: 61, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 03:54:02,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1785140.0, ans=0.1 2023-11-22 03:54:29,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2023-11-22 03:54:35,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267800 2023-11-22 03:54:38,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1785340.0, ans=0.025 2023-11-22 03:54:55,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-22 03:55:06,144 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3300, loss[loss=0.06567, simple_loss=0.08761, pruned_loss=0.01288, audio_tagging_loss=0.008984, over 14673.00 frames. ], tot_loss[loss=0.0725, simple_loss=0.09473, pruned_loss=0.01545, audio_tagging_loss=0.009681, over 3052554.96 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 03:55:34,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.246e+01 8.906e+01 9.569e+01 1.172e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-22 03:55:40,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267850 2023-11-22 03:55:45,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1785673.3333333333, ans=0.05 2023-11-22 03:56:07,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1785740.0, ans=0.125 2023-11-22 03:56:09,853 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3350, loss[loss=0.06139, simple_loss=0.07453, pruned_loss=0.01533, audio_tagging_loss=0.008795, over 16012.00 frames. ], tot_loss[loss=0.0725, simple_loss=0.09484, pruned_loss=0.01557, audio_tagging_loss=0.009514, over 3049376.16 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 03:56:25,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1785873.3333333333, ans=0.07 2023-11-22 03:56:35,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1785940.0, ans=0.2 2023-11-22 03:56:38,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1785940.0, ans=0.125 2023-11-22 03:56:43,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:56:44,816 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267900 2023-11-22 03:57:05,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-22 03:57:13,699 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3400, loss[loss=0.07229, simple_loss=0.09749, pruned_loss=0.01245, audio_tagging_loss=0.01109, over 15359.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09486, pruned_loss=0.0154, audio_tagging_loss=0.009457, over 3052801.45 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 03:57:22,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1786140.0, ans=0.125 2023-11-22 03:57:25,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1786206.6666666667, ans=0.0 2023-11-22 03:57:33,053 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:57:42,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.335e+01 8.871e+01 9.703e+01 1.398e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 03:57:42,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1786273.3333333333, ans=0.0 2023-11-22 03:57:48,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 267950 2023-11-22 03:58:18,127 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3450, loss[loss=0.08143, simple_loss=0.1005, pruned_loss=0.02068, audio_tagging_loss=0.01049, over 13496.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09465, pruned_loss=0.01558, audio_tagging_loss=0.009412, over 3044973.15 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 03:58:18,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1786473.3333333333, ans=0.2 2023-11-22 03:58:20,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1786473.3333333333, ans=0.125 2023-11-22 03:58:23,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1786473.3333333333, ans=0.0 2023-11-22 03:58:23,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1786473.3333333333, ans=0.0 2023-11-22 03:58:30,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 03:58:30,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1786540.0, ans=0.125 2023-11-22 03:58:35,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1786540.0, ans=0.0 2023-11-22 03:58:44,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1786606.6666666667, ans=0.1 2023-11-22 03:58:46,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1786606.6666666667, ans=0.0 2023-11-22 03:58:52,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1786606.6666666667, ans=0.0 2023-11-22 03:58:53,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268000 2023-11-22 03:59:00,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1786673.3333333333, ans=0.125 2023-11-22 03:59:07,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1786673.3333333333, ans=0.125 2023-11-22 03:59:07,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1786673.3333333333, ans=0.125 2023-11-22 03:59:23,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1786740.0, ans=0.0 2023-11-22 03:59:25,607 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3500, loss[loss=0.05924, simple_loss=0.08112, pruned_loss=0.01042, audio_tagging_loss=0.008264, over 13763.00 frames. ], tot_loss[loss=0.072, simple_loss=0.09409, pruned_loss=0.01558, audio_tagging_loss=0.009374, over 3031041.31 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 03:59:29,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1786806.6666666667, ans=0.1 2023-11-22 03:59:31,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1786806.6666666667, ans=0.2 2023-11-22 03:59:54,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 7.936e+01 8.648e+01 9.422e+01 1.502e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-22 03:59:58,437 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 03:59:59,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268050 2023-11-22 04:00:08,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2023-11-22 04:00:26,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1787073.3333333333, ans=0.0 2023-11-22 04:00:28,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1787140.0, ans=0.0 2023-11-22 04:00:29,142 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3550, loss[loss=0.06704, simple_loss=0.08068, pruned_loss=0.01439, audio_tagging_loss=0.0123, over 15907.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09459, pruned_loss=0.01574, audio_tagging_loss=0.009316, over 3038295.94 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:00:45,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1787206.6666666667, ans=0.125 2023-11-22 04:00:48,193 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:00:52,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2023-11-22 04:01:03,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268100 2023-11-22 04:01:11,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1787340.0, ans=0.125 2023-11-22 04:01:16,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1787340.0, ans=0.125 2023-11-22 04:01:31,959 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3600, loss[loss=0.05857, simple_loss=0.07544, pruned_loss=0.01203, audio_tagging_loss=0.008819, over 14646.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09453, pruned_loss=0.01563, audio_tagging_loss=0.009201, over 3044497.82 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:01:35,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1787473.3333333333, ans=0.02 2023-11-22 04:01:35,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1787473.3333333333, ans=0.025 2023-11-22 04:01:35,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1787473.3333333333, ans=0.2 2023-11-22 04:01:48,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1787540.0, ans=0.0 2023-11-22 04:01:49,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1787540.0, ans=0.0 2023-11-22 04:01:53,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1787540.0, ans=15.0 2023-11-22 04:01:58,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1787606.6666666667, ans=0.125 2023-11-22 04:02:00,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-22 04:02:01,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.327e+01 8.799e+01 9.625e+01 1.533e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 04:02:02,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1787606.6666666667, ans=0.1 2023-11-22 04:02:07,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268150 2023-11-22 04:02:31,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1787740.0, ans=0.04949747468305833 2023-11-22 04:02:37,513 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3650, loss[loss=0.1155, simple_loss=0.162, pruned_loss=0.02962, audio_tagging_loss=0.004903, over 15824.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09583, pruned_loss=0.01589, audio_tagging_loss=0.009101, over 3048440.42 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:03:05,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1787940.0, ans=0.125 2023-11-22 04:03:10,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268200 2023-11-22 04:03:26,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1788006.6666666667, ans=0.0 2023-11-22 04:03:31,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1788073.3333333333, ans=0.2 2023-11-22 04:03:37,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1788073.3333333333, ans=0.125 2023-11-22 04:03:40,426 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3700, loss[loss=0.07355, simple_loss=0.0946, pruned_loss=0.01687, audio_tagging_loss=0.009379, over 15557.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.09582, pruned_loss=0.01585, audio_tagging_loss=0.009262, over 3049156.98 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:03:43,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1788140.0, ans=0.125 2023-11-22 04:04:09,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.401e+01 8.835e+01 9.477e+01 1.225e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 04:04:15,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268250 2023-11-22 04:04:15,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1788273.3333333333, ans=0.2 2023-11-22 04:04:43,989 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3750, loss[loss=0.06951, simple_loss=0.09098, pruned_loss=0.01565, audio_tagging_loss=0.008374, over 14337.00 frames. ], tot_loss[loss=0.07248, simple_loss=0.09507, pruned_loss=0.01564, audio_tagging_loss=0.009312, over 3054977.16 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:05:00,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1788540.0, ans=0.0 2023-11-22 04:05:13,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1788606.6666666667, ans=0.125 2023-11-22 04:05:18,988 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268300 2023-11-22 04:05:28,634 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:05:31,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2023-11-22 04:05:31,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-22 04:05:35,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1788740.0, ans=0.125 2023-11-22 04:05:38,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1788740.0, ans=0.125 2023-11-22 04:05:43,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1788740.0, ans=0.125 2023-11-22 04:05:48,206 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3800, loss[loss=0.09885, simple_loss=0.1369, pruned_loss=0.02307, audio_tagging_loss=0.00731, over 15065.00 frames. ], tot_loss[loss=0.07273, simple_loss=0.09513, pruned_loss=0.01573, audio_tagging_loss=0.009444, over 3051303.16 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:06:18,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.152e+01 8.764e+01 9.512e+01 1.350e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 04:06:20,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1788940.0, ans=0.0 2023-11-22 04:06:22,411 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268350 2023-11-22 04:06:25,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1789006.6666666667, ans=0.1 2023-11-22 04:06:26,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1789006.6666666667, ans=0.0 2023-11-22 04:06:52,249 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3850, loss[loss=0.07722, simple_loss=0.1077, pruned_loss=0.01473, audio_tagging_loss=0.008656, over 15773.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09377, pruned_loss=0.01541, audio_tagging_loss=0.009541, over 3048681.05 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:07:04,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1789206.6666666667, ans=0.125 2023-11-22 04:07:16,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-22 04:07:17,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1789273.3333333333, ans=0.0 2023-11-22 04:07:20,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1789273.3333333333, ans=0.125 2023-11-22 04:07:23,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1789273.3333333333, ans=0.125 2023-11-22 04:07:27,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268400 2023-11-22 04:07:35,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1789340.0, ans=0.0 2023-11-22 04:07:43,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-22 04:07:45,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1789406.6666666667, ans=0.2 2023-11-22 04:07:57,331 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3900, loss[loss=0.06768, simple_loss=0.09161, pruned_loss=0.01424, audio_tagging_loss=0.007632, over 16691.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09315, pruned_loss=0.01538, audio_tagging_loss=0.009574, over 3047371.37 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:08:00,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1789473.3333333333, ans=0.125 2023-11-22 04:08:00,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1789473.3333333333, ans=0.1 2023-11-22 04:08:19,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-22 04:08:21,016 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:08:28,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.625e+01 8.044e+01 8.875e+01 9.539e+01 1.176e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 04:08:32,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268450 2023-11-22 04:08:32,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1789606.6666666667, ans=0.2 2023-11-22 04:08:59,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1789740.0, ans=0.125 2023-11-22 04:09:00,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1789806.6666666667, ans=0.1 2023-11-22 04:09:01,974 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 3950, loss[loss=0.06286, simple_loss=0.09035, pruned_loss=0.01174, audio_tagging_loss=0.005954, over 14857.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09347, pruned_loss=0.01529, audio_tagging_loss=0.009632, over 3044946.66 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:09:05,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1789806.6666666667, ans=0.1 2023-11-22 04:09:06,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1789806.6666666667, ans=0.125 2023-11-22 04:09:22,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1789873.3333333333, ans=0.0 2023-11-22 04:09:35,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268500 2023-11-22 04:09:37,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-22 04:10:01,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1790073.3333333333, ans=0.2 2023-11-22 04:10:02,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1790073.3333333333, ans=0.95 2023-11-22 04:10:04,741 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4000, loss[loss=0.06541, simple_loss=0.07567, pruned_loss=0.01361, audio_tagging_loss=0.01397, over 14636.00 frames. ], tot_loss[loss=0.07218, simple_loss=0.09431, pruned_loss=0.01542, audio_tagging_loss=0.009604, over 3040861.36 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:10:09,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1790140.0, ans=0.125 2023-11-22 04:10:32,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1790273.3333333333, ans=0.1 2023-11-22 04:10:36,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.416e+01 8.758e+01 9.509e+01 1.185e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 04:10:40,274 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268550 2023-11-22 04:10:40,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1790273.3333333333, ans=0.0 2023-11-22 04:10:42,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=22.5 2023-11-22 04:10:55,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2023-11-22 04:11:03,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1790406.6666666667, ans=0.2 2023-11-22 04:11:09,609 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4050, loss[loss=0.0762, simple_loss=0.08951, pruned_loss=0.01893, audio_tagging_loss=0.01251, over 14146.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09432, pruned_loss=0.01543, audio_tagging_loss=0.009757, over 3042251.84 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:11:12,177 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:11:28,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1790540.0, ans=0.125 2023-11-22 04:11:43,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268600 2023-11-22 04:11:56,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1790673.3333333333, ans=10.0 2023-11-22 04:12:13,286 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4100, loss[loss=0.06785, simple_loss=0.09053, pruned_loss=0.01376, audio_tagging_loss=0.008826, over 15134.00 frames. ], tot_loss[loss=0.07263, simple_loss=0.09493, pruned_loss=0.01545, audio_tagging_loss=0.009715, over 3047731.33 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:12:34,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1790873.3333333333, ans=0.125 2023-11-22 04:12:43,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1790940.0, ans=0.125 2023-11-22 04:12:44,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.196e+01 8.304e+01 9.033e+01 9.578e+01 1.202e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-22 04:12:47,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268650 2023-11-22 04:13:17,314 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4150, loss[loss=0.07998, simple_loss=0.1074, pruned_loss=0.01524, audio_tagging_loss=0.01104, over 15512.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09473, pruned_loss=0.01548, audio_tagging_loss=0.009572, over 3047008.71 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:13:33,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1791206.6666666667, ans=0.5 2023-11-22 04:13:45,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1791273.3333333333, ans=0.125 2023-11-22 04:13:49,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-22 04:13:52,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268700 2023-11-22 04:14:03,567 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:14:05,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1791340.0, ans=0.125 2023-11-22 04:14:15,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1791406.6666666667, ans=0.125 2023-11-22 04:14:21,878 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4200, loss[loss=0.07721, simple_loss=0.1061, pruned_loss=0.01802, audio_tagging_loss=0.00615, over 13830.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.09401, pruned_loss=0.01506, audio_tagging_loss=0.009532, over 3048352.33 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:14:37,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1791540.0, ans=0.0 2023-11-22 04:14:45,156 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:14:52,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.143e+01 8.798e+01 9.723e+01 1.732e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 04:14:55,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268750 2023-11-22 04:14:58,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1791673.3333333333, ans=0.125 2023-11-22 04:15:21,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1791740.0, ans=0.125 2023-11-22 04:15:25,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1791806.6666666667, ans=0.125 2023-11-22 04:15:25,752 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4250, loss[loss=0.06803, simple_loss=0.08713, pruned_loss=0.01291, audio_tagging_loss=0.01155, over 14458.00 frames. ], tot_loss[loss=0.07189, simple_loss=0.0944, pruned_loss=0.01525, audio_tagging_loss=0.009445, over 3049271.37 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:15:27,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1791806.6666666667, ans=0.2 2023-11-22 04:15:30,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1791806.6666666667, ans=0.0 2023-11-22 04:15:34,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1791806.6666666667, ans=0.025 2023-11-22 04:15:42,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1791873.3333333333, ans=10.0 2023-11-22 04:15:58,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1791940.0, ans=0.125 2023-11-22 04:16:00,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268800 2023-11-22 04:16:05,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=12.0 2023-11-22 04:16:11,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1792006.6666666667, ans=0.125 2023-11-22 04:16:19,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1792073.3333333333, ans=0.125 2023-11-22 04:16:19,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:16:23,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1792073.3333333333, ans=0.0 2023-11-22 04:16:29,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-11-22 04:16:30,668 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4300, loss[loss=0.0469, simple_loss=0.05803, pruned_loss=0.008786, audio_tagging_loss=0.009097, over 14883.00 frames. ], tot_loss[loss=0.07192, simple_loss=0.09448, pruned_loss=0.01526, audio_tagging_loss=0.009417, over 3047571.48 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:16:48,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1792206.6666666667, ans=0.125 2023-11-22 04:17:01,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.380e+01 9.006e+01 9.846e+01 1.214e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 04:17:05,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268850 2023-11-22 04:17:33,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1792406.6666666667, ans=0.2 2023-11-22 04:17:35,216 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4350, loss[loss=0.07172, simple_loss=0.09552, pruned_loss=0.01349, audio_tagging_loss=0.01046, over 14758.00 frames. ], tot_loss[loss=0.07221, simple_loss=0.09506, pruned_loss=0.01532, audio_tagging_loss=0.009362, over 3047213.91 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:18:01,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2023-11-22 04:18:10,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268900 2023-11-22 04:18:12,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1792606.6666666667, ans=0.1 2023-11-22 04:18:28,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1792740.0, ans=0.2 2023-11-22 04:18:39,823 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4400, loss[loss=0.05278, simple_loss=0.06616, pruned_loss=0.009772, audio_tagging_loss=0.009927, over 15010.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09524, pruned_loss=0.0155, audio_tagging_loss=0.009286, over 3045271.95 frames. ], batch size: 60, lr: 2.98e-03, grad_scale: 32.0 2023-11-22 04:19:10,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 7.984e+01 8.605e+01 9.395e+01 1.319e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-22 04:19:14,620 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 268950 2023-11-22 04:19:21,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-22 04:19:21,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2023-11-22 04:19:32,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-22 04:19:33,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1793073.3333333333, ans=0.1 2023-11-22 04:19:45,067 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4450, loss[loss=0.07049, simple_loss=0.09183, pruned_loss=0.01442, audio_tagging_loss=0.01016, over 15481.00 frames. ], tot_loss[loss=0.07291, simple_loss=0.09591, pruned_loss=0.01577, audio_tagging_loss=0.009193, over 3044506.50 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 32.0 2023-11-22 04:19:46,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1793140.0, ans=0.125 2023-11-22 04:20:12,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1793273.3333333333, ans=0.1 2023-11-22 04:20:19,926 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269000 2023-11-22 04:20:27,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1793340.0, ans=0.0 2023-11-22 04:20:27,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1793340.0, ans=0.0 2023-11-22 04:20:45,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1793406.6666666667, ans=0.125 2023-11-22 04:20:49,443 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4500, loss[loss=0.06247, simple_loss=0.07437, pruned_loss=0.01503, audio_tagging_loss=0.01025, over 14048.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09614, pruned_loss=0.01572, audio_tagging_loss=0.009184, over 3048323.12 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:20:54,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1793473.3333333333, ans=0.125 2023-11-22 04:20:57,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=22.5 2023-11-22 04:21:23,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.138e+01 8.864e+01 9.705e+01 1.342e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 04:21:24,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269050 2023-11-22 04:21:27,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1793673.3333333333, ans=0.0 2023-11-22 04:21:35,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1793673.3333333333, ans=0.0 2023-11-22 04:21:45,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-11-22 04:21:46,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1793740.0, ans=22.5 2023-11-22 04:21:47,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1793740.0, ans=0.1 2023-11-22 04:21:52,985 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4550, loss[loss=0.04555, simple_loss=0.0582, pruned_loss=0.007796, audio_tagging_loss=0.00865, over 14581.00 frames. ], tot_loss[loss=0.07269, simple_loss=0.09589, pruned_loss=0.01556, audio_tagging_loss=0.009184, over 3045899.79 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:22:02,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1793806.6666666667, ans=0.1 2023-11-22 04:22:12,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1793873.3333333333, ans=0.125 2023-11-22 04:22:28,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269100 2023-11-22 04:22:29,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-22 04:22:41,473 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:22:57,682 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4600, loss[loss=0.08013, simple_loss=0.109, pruned_loss=0.01894, audio_tagging_loss=0.006702, over 15598.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.095, pruned_loss=0.01531, audio_tagging_loss=0.009235, over 3052520.83 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:23:05,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2023-11-22 04:23:12,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1794206.6666666667, ans=0.0 2023-11-22 04:23:16,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-22 04:23:22,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1794273.3333333333, ans=0.125 2023-11-22 04:23:27,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1794273.3333333333, ans=0.1 2023-11-22 04:23:28,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-22 04:23:30,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.306e+01 8.737e+01 9.544e+01 1.284e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 04:23:31,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269150 2023-11-22 04:23:48,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1794406.6666666667, ans=0.0 2023-11-22 04:23:51,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-11-22 04:24:02,027 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4650, loss[loss=0.07672, simple_loss=0.0977, pruned_loss=0.01742, audio_tagging_loss=0.01045, over 15043.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09477, pruned_loss=0.01528, audio_tagging_loss=0.009424, over 3054874.88 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:24:29,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1794606.6666666667, ans=0.0 2023-11-22 04:24:37,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269200 2023-11-22 04:24:40,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1794673.3333333333, ans=0.0 2023-11-22 04:24:42,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1794673.3333333333, ans=0.125 2023-11-22 04:24:43,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2023-11-22 04:24:55,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1794740.0, ans=0.125 2023-11-22 04:24:56,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1794740.0, ans=0.125 2023-11-22 04:24:59,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1794740.0, ans=0.0 2023-11-22 04:25:06,250 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4700, loss[loss=0.05037, simple_loss=0.06698, pruned_loss=0.008051, audio_tagging_loss=0.00883, over 14717.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.0945, pruned_loss=0.01533, audio_tagging_loss=0.009503, over 3056131.50 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:25:21,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1794873.3333333333, ans=0.125 2023-11-22 04:25:26,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1794873.3333333333, ans=0.125 2023-11-22 04:25:39,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.243e+01 8.677e+01 9.667e+01 1.211e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-22 04:25:40,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269250 2023-11-22 04:25:59,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1795073.3333333333, ans=0.035 2023-11-22 04:26:11,053 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4750, loss[loss=0.05324, simple_loss=0.06472, pruned_loss=0.009867, audio_tagging_loss=0.01101, over 13832.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09425, pruned_loss=0.01547, audio_tagging_loss=0.009606, over 3058146.74 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 8.0 2023-11-22 04:26:12,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1795140.0, ans=0.125 2023-11-22 04:26:14,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-22 04:26:14,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1795140.0, ans=0.125 2023-11-22 04:26:19,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1795140.0, ans=0.125 2023-11-22 04:26:32,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1795206.6666666667, ans=0.125 2023-11-22 04:26:39,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1795273.3333333333, ans=0.2 2023-11-22 04:26:44,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269300 2023-11-22 04:26:53,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1795340.0, ans=0.1 2023-11-22 04:27:14,474 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4800, loss[loss=0.08205, simple_loss=0.1029, pruned_loss=0.01854, audio_tagging_loss=0.01209, over 15309.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09424, pruned_loss=0.01558, audio_tagging_loss=0.00971, over 3053622.07 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:27:23,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1795473.3333333333, ans=0.2 2023-11-22 04:27:27,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1795540.0, ans=0.05 2023-11-22 04:27:39,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1795606.6666666667, ans=0.125 2023-11-22 04:27:40,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-11-22 04:27:43,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1795606.6666666667, ans=0.0 2023-11-22 04:27:47,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.253e+01 9.179e+01 1.004e+02 1.459e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-22 04:27:49,061 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269350 2023-11-22 04:27:50,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1795606.6666666667, ans=0.125 2023-11-22 04:27:55,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1795673.3333333333, ans=0.2 2023-11-22 04:28:05,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1795740.0, ans=0.0 2023-11-22 04:28:17,844 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4850, loss[loss=0.05585, simple_loss=0.06378, pruned_loss=0.0139, audio_tagging_loss=0.01006, over 15708.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.095, pruned_loss=0.0157, audio_tagging_loss=0.009668, over 3051553.37 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:28:24,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1795806.6666666667, ans=0.125 2023-11-22 04:28:45,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1795940.0, ans=0.5 2023-11-22 04:28:52,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269400 2023-11-22 04:28:52,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1795940.0, ans=0.125 2023-11-22 04:28:52,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1795940.0, ans=0.0 2023-11-22 04:28:55,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1796006.6666666667, ans=0.07 2023-11-22 04:29:07,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1796073.3333333333, ans=0.125 2023-11-22 04:29:11,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-22 04:29:21,634 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4900, loss[loss=0.08222, simple_loss=0.1088, pruned_loss=0.01964, audio_tagging_loss=0.008165, over 15513.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09426, pruned_loss=0.01537, audio_tagging_loss=0.009647, over 3048808.63 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:29:53,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1796273.3333333333, ans=0.125 2023-11-22 04:29:55,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 7.981e+01 8.746e+01 9.394e+01 1.160e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 04:29:56,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269450 2023-11-22 04:29:56,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1796273.3333333333, ans=0.5 2023-11-22 04:30:00,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1796340.0, ans=0.025 2023-11-22 04:30:26,659 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 4950, loss[loss=0.06344, simple_loss=0.08571, pruned_loss=0.01126, audio_tagging_loss=0.009325, over 16578.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09361, pruned_loss=0.01516, audio_tagging_loss=0.009501, over 3046495.09 frames. ], batch size: 63, lr: 2.98e-03, grad_scale: 16.0 2023-11-22 04:31:01,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269500 2023-11-22 04:31:30,688 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5000, loss[loss=0.04352, simple_loss=0.05095, pruned_loss=0.008664, audio_tagging_loss=0.009383, over 14175.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.09377, pruned_loss=0.0152, audio_tagging_loss=0.00932, over 3048361.34 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:31:47,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1796873.3333333333, ans=0.0 2023-11-22 04:32:04,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.014e+01 8.549e+01 9.215e+01 1.274e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-22 04:32:05,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269550 2023-11-22 04:32:17,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1797006.6666666667, ans=0.0 2023-11-22 04:32:35,484 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5050, loss[loss=0.06691, simple_loss=0.09064, pruned_loss=0.01207, audio_tagging_loss=0.009518, over 14699.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.0943, pruned_loss=0.01533, audio_tagging_loss=0.009245, over 3048667.90 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:32:42,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-22 04:32:43,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1797140.0, ans=0.1 2023-11-22 04:32:56,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2023-11-22 04:33:09,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-11-22 04:33:10,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269600 2023-11-22 04:33:11,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-22 04:33:37,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1797406.6666666667, ans=0.1 2023-11-22 04:33:40,609 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5100, loss[loss=0.04605, simple_loss=0.05614, pruned_loss=0.008434, audio_tagging_loss=0.009544, over 15553.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09433, pruned_loss=0.01549, audio_tagging_loss=0.009189, over 3043130.87 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:33:55,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1797540.0, ans=0.125 2023-11-22 04:34:05,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1797606.6666666667, ans=0.0 2023-11-22 04:34:14,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.728e+01 7.951e+01 8.483e+01 9.214e+01 1.214e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-22 04:34:15,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269650 2023-11-22 04:34:18,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1797673.3333333333, ans=0.125 2023-11-22 04:34:31,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1797740.0, ans=0.1 2023-11-22 04:34:45,955 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5150, loss[loss=0.06171, simple_loss=0.07577, pruned_loss=0.01241, audio_tagging_loss=0.01141, over 14888.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.09441, pruned_loss=0.01554, audio_tagging_loss=0.009224, over 3043589.33 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:35:20,617 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269700 2023-11-22 04:35:21,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1797940.0, ans=0.125 2023-11-22 04:35:38,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-22 04:35:47,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1798073.3333333333, ans=0.125 2023-11-22 04:35:47,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-11-22 04:35:51,081 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5200, loss[loss=0.08754, simple_loss=0.1175, pruned_loss=0.0204, audio_tagging_loss=0.00839, over 14613.00 frames. ], tot_loss[loss=0.07226, simple_loss=0.09495, pruned_loss=0.01558, audio_tagging_loss=0.0092, over 3045984.13 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 04:35:53,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1798140.0, ans=0.0 2023-11-22 04:36:09,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1798206.6666666667, ans=0.125 2023-11-22 04:36:09,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1798206.6666666667, ans=0.2 2023-11-22 04:36:20,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1798273.3333333333, ans=0.0 2023-11-22 04:36:20,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2023-11-22 04:36:24,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2023-11-22 04:36:24,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.234e+01 8.647e+01 9.394e+01 1.220e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 04:36:26,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269750 2023-11-22 04:36:26,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1798273.3333333333, ans=0.125 2023-11-22 04:36:55,991 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5250, loss[loss=0.05756, simple_loss=0.07354, pruned_loss=0.01169, audio_tagging_loss=0.009097, over 15322.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09576, pruned_loss=0.01574, audio_tagging_loss=0.009129, over 3046486.48 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 04:37:01,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1798473.3333333333, ans=0.0 2023-11-22 04:37:30,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269800 2023-11-22 04:38:00,442 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5300, loss[loss=0.06352, simple_loss=0.0853, pruned_loss=0.01141, audio_tagging_loss=0.009457, over 15857.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09497, pruned_loss=0.01547, audio_tagging_loss=0.009198, over 3041563.86 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 04:38:15,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1798873.3333333333, ans=0.125 2023-11-22 04:38:23,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1798873.3333333333, ans=0.125 2023-11-22 04:38:34,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.362e+01 8.825e+01 9.313e+01 1.444e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-22 04:38:34,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269850 2023-11-22 04:38:52,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1799073.3333333333, ans=0.2 2023-11-22 04:39:04,228 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5350, loss[loss=0.08912, simple_loss=0.1261, pruned_loss=0.01942, audio_tagging_loss=0.006649, over 15032.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09635, pruned_loss=0.01579, audio_tagging_loss=0.00922, over 3044362.48 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:39:39,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269900 2023-11-22 04:40:09,597 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5400, loss[loss=0.09161, simple_loss=0.1249, pruned_loss=0.02356, audio_tagging_loss=0.005622, over 15636.00 frames. ], tot_loss[loss=0.07325, simple_loss=0.09624, pruned_loss=0.0159, audio_tagging_loss=0.009235, over 3049749.05 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:40:17,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-22 04:40:30,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1799540.0, ans=0.125 2023-11-22 04:40:40,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1799606.6666666667, ans=0.125 2023-11-22 04:40:43,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.520e+01 8.148e+01 8.900e+01 9.649e+01 1.296e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-22 04:40:44,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 269950 2023-11-22 04:40:47,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-22 04:40:49,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1799673.3333333333, ans=0.05 2023-11-22 04:40:53,588 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:41:00,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1799740.0, ans=0.0 2023-11-22 04:41:05,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1799740.0, ans=0.015 2023-11-22 04:41:13,923 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5450, loss[loss=0.07245, simple_loss=0.09146, pruned_loss=0.01714, audio_tagging_loss=0.00958, over 15187.00 frames. ], tot_loss[loss=0.07357, simple_loss=0.09645, pruned_loss=0.01603, audio_tagging_loss=0.009319, over 3046635.48 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:41:31,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1799873.3333333333, ans=0.05 2023-11-22 04:41:39,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-22 04:41:46,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1799940.0, ans=0.0 2023-11-22 04:41:49,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270000 2023-11-22 04:41:58,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2023-11-22 04:42:01,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1800006.6666666667, ans=0.125 2023-11-22 04:42:19,729 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5500, loss[loss=0.07396, simple_loss=0.09425, pruned_loss=0.0158, audio_tagging_loss=0.01103, over 14199.00 frames. ], tot_loss[loss=0.07349, simple_loss=0.09611, pruned_loss=0.01609, audio_tagging_loss=0.009345, over 3044620.94 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:42:22,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1800140.0, ans=0.0 2023-11-22 04:42:32,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1800206.6666666667, ans=0.125 2023-11-22 04:42:39,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1800206.6666666667, ans=0.0 2023-11-22 04:42:46,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1800273.3333333333, ans=0.125 2023-11-22 04:42:49,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1800273.3333333333, ans=0.125 2023-11-22 04:42:53,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.241e+01 8.879e+01 9.729e+01 1.307e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-22 04:42:53,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270050 2023-11-22 04:43:17,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1800406.6666666667, ans=0.0 2023-11-22 04:43:24,369 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5550, loss[loss=0.05686, simple_loss=0.07463, pruned_loss=0.009651, audio_tagging_loss=0.00989, over 16734.00 frames. ], tot_loss[loss=0.07374, simple_loss=0.09665, pruned_loss=0.01608, audio_tagging_loss=0.009344, over 3055541.46 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:43:27,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-11-22 04:43:29,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-11-22 04:43:32,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1800473.3333333333, ans=0.2 2023-11-22 04:43:56,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1800606.6666666667, ans=0.07 2023-11-22 04:43:59,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270100 2023-11-22 04:44:20,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1800740.0, ans=0.125 2023-11-22 04:44:28,572 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5600, loss[loss=0.04463, simple_loss=0.05372, pruned_loss=0.007816, audio_tagging_loss=0.009955, over 14803.00 frames. ], tot_loss[loss=0.07352, simple_loss=0.09641, pruned_loss=0.01588, audio_tagging_loss=0.009433, over 3055822.04 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 04:44:28,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1800806.6666666667, ans=0.125 2023-11-22 04:44:30,085 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:45:04,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.061e+01 8.801e+01 9.492e+01 1.565e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 04:45:04,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270150 2023-11-22 04:45:15,357 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:45:17,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-11-22 04:45:32,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1801140.0, ans=0.025 2023-11-22 04:45:33,020 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5650, loss[loss=0.06918, simple_loss=0.09491, pruned_loss=0.01132, audio_tagging_loss=0.01041, over 15085.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09571, pruned_loss=0.0157, audio_tagging_loss=0.00962, over 3059960.88 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:45:33,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1801140.0, ans=0.07 2023-11-22 04:46:07,674 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270200 2023-11-22 04:46:35,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1801406.6666666667, ans=0.125 2023-11-22 04:46:37,707 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5700, loss[loss=0.06495, simple_loss=0.08456, pruned_loss=0.01315, audio_tagging_loss=0.009516, over 13968.00 frames. ], tot_loss[loss=0.07305, simple_loss=0.09553, pruned_loss=0.01576, audio_tagging_loss=0.009528, over 3055263.38 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:46:40,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1801473.3333333333, ans=0.0 2023-11-22 04:47:05,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-22 04:47:12,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270250 2023-11-22 04:47:13,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.662e+01 8.286e+01 8.958e+01 9.545e+01 1.493e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 04:47:31,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1801740.0, ans=0.2 2023-11-22 04:47:32,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-22 04:47:34,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1801740.0, ans=15.0 2023-11-22 04:47:41,459 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5750, loss[loss=0.04559, simple_loss=0.04733, pruned_loss=0.009084, audio_tagging_loss=0.01284, over 13600.00 frames. ], tot_loss[loss=0.07234, simple_loss=0.09461, pruned_loss=0.01562, audio_tagging_loss=0.00941, over 3056871.61 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:47:49,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1801806.6666666667, ans=0.1 2023-11-22 04:47:59,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1801873.3333333333, ans=0.2 2023-11-22 04:48:15,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1801940.0, ans=0.09899494936611666 2023-11-22 04:48:16,423 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270300 2023-11-22 04:48:23,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1802006.6666666667, ans=0.2 2023-11-22 04:48:27,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1802006.6666666667, ans=0.125 2023-11-22 04:48:42,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1802073.3333333333, ans=0.125 2023-11-22 04:48:45,855 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5800, loss[loss=0.06172, simple_loss=0.07828, pruned_loss=0.01313, audio_tagging_loss=0.009457, over 14859.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.09489, pruned_loss=0.0156, audio_tagging_loss=0.009373, over 3054200.33 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 8.0 2023-11-22 04:48:49,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1802140.0, ans=0.0 2023-11-22 04:48:51,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1802140.0, ans=0.0 2023-11-22 04:48:53,532 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:49:21,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270350 2023-11-22 04:49:23,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.122e+01 8.117e+01 8.783e+01 9.431e+01 1.385e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 04:49:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1802340.0, ans=0.125 2023-11-22 04:49:48,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-11-22 04:49:50,742 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5850, loss[loss=0.05904, simple_loss=0.07822, pruned_loss=0.008607, audio_tagging_loss=0.01132, over 14023.00 frames. ], tot_loss[loss=0.07191, simple_loss=0.09472, pruned_loss=0.0154, audio_tagging_loss=0.009153, over 3050008.45 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 8.0 2023-11-22 04:49:54,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1802473.3333333333, ans=0.125 2023-11-22 04:50:11,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=1802540.0, ans=12.0 2023-11-22 04:50:14,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-22 04:50:25,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270400 2023-11-22 04:50:28,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1802673.3333333333, ans=0.0 2023-11-22 04:50:53,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1802740.0, ans=0.125 2023-11-22 04:50:55,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-22 04:50:55,522 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5900, loss[loss=0.05437, simple_loss=0.05761, pruned_loss=0.01103, audio_tagging_loss=0.01454, over 15551.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09561, pruned_loss=0.01563, audio_tagging_loss=0.009162, over 3048902.86 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 8.0 2023-11-22 04:51:30,451 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270450 2023-11-22 04:51:31,836 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:51:32,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.248e+01 8.937e+01 9.600e+01 1.150e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 04:52:00,261 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 5950, loss[loss=0.06668, simple_loss=0.08452, pruned_loss=0.01346, audio_tagging_loss=0.01096, over 14850.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09537, pruned_loss=0.01572, audio_tagging_loss=0.009152, over 3050547.79 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 8.0 2023-11-22 04:52:01,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1803140.0, ans=0.125 2023-11-22 04:52:04,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-22 04:52:24,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1803206.6666666667, ans=0.1 2023-11-22 04:52:30,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-11-22 04:52:36,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270500 2023-11-22 04:52:44,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1803340.0, ans=0.2 2023-11-22 04:52:47,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1803340.0, ans=0.125 2023-11-22 04:53:03,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1803406.6666666667, ans=0.0 2023-11-22 04:53:05,924 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6000, loss[loss=0.07981, simple_loss=0.1027, pruned_loss=0.01985, audio_tagging_loss=0.008621, over 13715.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09529, pruned_loss=0.01583, audio_tagging_loss=0.009177, over 3054142.91 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:53:05,925 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 04:53:45,802 INFO [train_asr.py:1253] (1/4) Epoch 23, validation: loss=0.05955, simple_loss=0.05175, pruned_loss=0.005139, audio_tagging_loss=0.02853, over 4681554.00 frames. 2023-11-22 04:53:45,803 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 04:53:52,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-22 04:53:58,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 04:54:05,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1803540.0, ans=0.05 2023-11-22 04:54:20,426 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270550 2023-11-22 04:54:23,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.322e+01 8.050e+01 8.627e+01 9.368e+01 1.674e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 04:54:26,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2023-11-22 04:54:29,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-22 04:54:31,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1803673.3333333333, ans=0.0 2023-11-22 04:54:33,796 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 04:54:50,471 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6050, loss[loss=0.05886, simple_loss=0.0715, pruned_loss=0.01357, audio_tagging_loss=0.009535, over 14396.00 frames. ], tot_loss[loss=0.07224, simple_loss=0.09474, pruned_loss=0.01565, audio_tagging_loss=0.009227, over 3053324.99 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:55:01,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2023-11-22 04:55:10,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1803873.3333333333, ans=0.125 2023-11-22 04:55:24,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270600 2023-11-22 04:55:30,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-22 04:55:34,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1804006.6666666667, ans=0.125 2023-11-22 04:55:37,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1804006.6666666667, ans=0.125 2023-11-22 04:55:54,010 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6100, loss[loss=0.06314, simple_loss=0.08261, pruned_loss=0.01198, audio_tagging_loss=0.009853, over 14827.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.094, pruned_loss=0.01559, audio_tagging_loss=0.009176, over 3055623.62 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:56:01,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1804140.0, ans=0.0 2023-11-22 04:56:05,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1804140.0, ans=0.125 2023-11-22 04:56:26,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-22 04:56:28,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1804273.3333333333, ans=0.125 2023-11-22 04:56:29,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270650 2023-11-22 04:56:31,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.259e+01 8.763e+01 9.618e+01 1.406e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 04:56:36,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1804340.0, ans=0.0 2023-11-22 04:56:41,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-22 04:56:59,491 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6150, loss[loss=0.08546, simple_loss=0.1129, pruned_loss=0.02061, audio_tagging_loss=0.008395, over 14395.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09408, pruned_loss=0.01554, audio_tagging_loss=0.009197, over 3050665.70 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:57:26,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2023-11-22 04:57:34,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270700 2023-11-22 04:57:50,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1804740.0, ans=0.0 2023-11-22 04:58:03,985 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6200, loss[loss=0.06733, simple_loss=0.08305, pruned_loss=0.01393, audio_tagging_loss=0.01188, over 14956.00 frames. ], tot_loss[loss=0.07187, simple_loss=0.09412, pruned_loss=0.01554, audio_tagging_loss=0.009271, over 3044311.11 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:58:36,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1804940.0, ans=0.0 2023-11-22 04:58:38,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270750 2023-11-22 04:58:40,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-11-22 04:58:40,807 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.162e+01 9.001e+01 9.547e+01 1.182e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-22 04:59:07,554 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6250, loss[loss=0.07424, simple_loss=0.102, pruned_loss=0.01342, audio_tagging_loss=0.009844, over 15346.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.09376, pruned_loss=0.01543, audio_tagging_loss=0.0094, over 3048186.64 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 04:59:23,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1805206.6666666667, ans=0.1 2023-11-22 04:59:32,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1805273.3333333333, ans=0.1 2023-11-22 04:59:41,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1805273.3333333333, ans=0.09899494936611666 2023-11-22 04:59:42,726 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270800 2023-11-22 04:59:43,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2023-11-22 04:59:51,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-22 04:59:56,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-22 05:00:09,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1805406.6666666667, ans=0.0 2023-11-22 05:00:11,869 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6300, loss[loss=0.07066, simple_loss=0.08709, pruned_loss=0.01652, audio_tagging_loss=0.01058, over 14883.00 frames. ], tot_loss[loss=0.07181, simple_loss=0.09381, pruned_loss=0.01546, audio_tagging_loss=0.009447, over 3051611.46 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:00:46,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270850 2023-11-22 05:00:48,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1805606.6666666667, ans=0.125 2023-11-22 05:00:49,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.509e+01 8.345e+01 8.951e+01 9.677e+01 1.248e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-22 05:00:56,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1805673.3333333333, ans=0.05 2023-11-22 05:01:16,578 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6350, loss[loss=0.07441, simple_loss=0.09836, pruned_loss=0.01403, audio_tagging_loss=0.0112, over 14629.00 frames. ], tot_loss[loss=0.07157, simple_loss=0.09316, pruned_loss=0.01543, audio_tagging_loss=0.00956, over 3043045.48 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:01:34,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1805873.3333333333, ans=0.0 2023-11-22 05:01:49,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1805940.0, ans=0.1 2023-11-22 05:01:50,966 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270900 2023-11-22 05:02:20,570 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6400, loss[loss=0.06745, simple_loss=0.0891, pruned_loss=0.01236, audio_tagging_loss=0.01055, over 14958.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09278, pruned_loss=0.01525, audio_tagging_loss=0.009562, over 3041152.51 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 05:02:28,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1806140.0, ans=0.0 2023-11-22 05:02:34,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1806206.6666666667, ans=0.1 2023-11-22 05:02:47,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1806273.3333333333, ans=0.0 2023-11-22 05:02:48,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1806273.3333333333, ans=0.0 2023-11-22 05:02:51,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1806273.3333333333, ans=0.125 2023-11-22 05:02:54,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 270950 2023-11-22 05:02:56,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1806273.3333333333, ans=0.125 2023-11-22 05:02:57,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.214e+01 8.231e+01 8.800e+01 9.655e+01 1.370e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 05:02:57,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1806340.0, ans=0.125 2023-11-22 05:03:23,823 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6450, loss[loss=0.07938, simple_loss=0.1133, pruned_loss=0.01568, audio_tagging_loss=0.007043, over 15154.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09327, pruned_loss=0.01525, audio_tagging_loss=0.009612, over 3044534.39 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 05:03:58,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271000 2023-11-22 05:04:05,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.76 vs. limit=5.0 2023-11-22 05:04:13,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1806673.3333333333, ans=0.125 2023-11-22 05:04:18,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1806740.0, ans=0.125 2023-11-22 05:04:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1806740.0, ans=0.125 2023-11-22 05:04:28,864 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6500, loss[loss=0.09739, simple_loss=0.1242, pruned_loss=0.02461, audio_tagging_loss=0.01069, over 14058.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.09455, pruned_loss=0.0154, audio_tagging_loss=0.009635, over 3044989.59 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 05:04:35,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1806806.6666666667, ans=0.025 2023-11-22 05:05:02,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271050 2023-11-22 05:05:03,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1806940.0, ans=0.1 2023-11-22 05:05:05,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.344e+01 8.837e+01 9.717e+01 1.139e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 05:05:32,574 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6550, loss[loss=0.08013, simple_loss=0.09154, pruned_loss=0.02193, audio_tagging_loss=0.01242, over 14870.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.09494, pruned_loss=0.01549, audio_tagging_loss=0.009466, over 3048089.26 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 05:05:54,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1807206.6666666667, ans=0.0 2023-11-22 05:05:59,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1807273.3333333333, ans=0.1 2023-11-22 05:06:03,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1807273.3333333333, ans=0.125 2023-11-22 05:06:07,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271100 2023-11-22 05:06:19,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1807340.0, ans=0.125 2023-11-22 05:06:21,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1807340.0, ans=0.125 2023-11-22 05:06:26,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1807406.6666666667, ans=0.125 2023-11-22 05:06:36,259 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6600, loss[loss=0.0799, simple_loss=0.1101, pruned_loss=0.01743, audio_tagging_loss=0.0074, over 15492.00 frames. ], tot_loss[loss=0.07211, simple_loss=0.09441, pruned_loss=0.01548, audio_tagging_loss=0.009425, over 3054096.80 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:06:48,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1807540.0, ans=0.125 2023-11-22 05:06:59,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-22 05:07:02,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1807606.6666666667, ans=0.0 2023-11-22 05:07:11,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271150 2023-11-22 05:07:15,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.298e+01 8.931e+01 9.754e+01 1.164e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 05:07:34,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1807740.0, ans=0.0 2023-11-22 05:07:40,500 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6650, loss[loss=0.07036, simple_loss=0.09262, pruned_loss=0.01459, audio_tagging_loss=0.00946, over 15402.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.0944, pruned_loss=0.0155, audio_tagging_loss=0.009393, over 3055275.95 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:07:52,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1807873.3333333333, ans=0.0 2023-11-22 05:07:53,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-22 05:07:58,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2023-11-22 05:08:03,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1807873.3333333333, ans=0.1 2023-11-22 05:08:10,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=22.5 2023-11-22 05:08:15,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271200 2023-11-22 05:08:22,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-22 05:08:43,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-11-22 05:08:45,695 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6700, loss[loss=0.08741, simple_loss=0.1133, pruned_loss=0.02257, audio_tagging_loss=0.008186, over 15225.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09424, pruned_loss=0.0155, audio_tagging_loss=0.009304, over 3056278.70 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:08:53,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-22 05:08:55,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1808140.0, ans=0.125 2023-11-22 05:09:19,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-22 05:09:20,491 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271250 2023-11-22 05:09:22,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1808273.3333333333, ans=0.07 2023-11-22 05:09:24,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.573e+01 8.261e+01 8.925e+01 9.693e+01 1.242e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 05:09:50,110 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6750, loss[loss=0.07374, simple_loss=0.09537, pruned_loss=0.01725, audio_tagging_loss=0.008797, over 14608.00 frames. ], tot_loss[loss=0.07248, simple_loss=0.09513, pruned_loss=0.01564, audio_tagging_loss=0.009276, over 3045882.34 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 16.0 2023-11-22 05:09:51,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-22 05:09:53,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1808473.3333333333, ans=0.125 2023-11-22 05:10:25,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271300 2023-11-22 05:10:31,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1808673.3333333333, ans=0.0 2023-11-22 05:10:39,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=22.5 2023-11-22 05:10:42,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=12.0 2023-11-22 05:10:45,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1808740.0, ans=0.125 2023-11-22 05:10:47,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1808740.0, ans=0.0 2023-11-22 05:10:53,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1808806.6666666667, ans=0.0 2023-11-22 05:10:54,828 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6800, loss[loss=0.07903, simple_loss=0.11, pruned_loss=0.01759, audio_tagging_loss=0.006455, over 15321.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09551, pruned_loss=0.01571, audio_tagging_loss=0.009233, over 3049150.80 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 32.0 2023-11-22 05:10:55,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1808806.6666666667, ans=0.0 2023-11-22 05:11:05,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1808806.6666666667, ans=0.0 2023-11-22 05:11:19,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1808873.3333333333, ans=0.125 2023-11-22 05:11:25,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1808940.0, ans=0.125 2023-11-22 05:11:30,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271350 2023-11-22 05:11:34,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.372e+01 8.044e+01 8.828e+01 9.747e+01 1.352e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 05:12:00,276 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6850, loss[loss=0.08489, simple_loss=0.113, pruned_loss=0.02123, audio_tagging_loss=0.007141, over 14676.00 frames. ], tot_loss[loss=0.07224, simple_loss=0.09495, pruned_loss=0.01554, audio_tagging_loss=0.009218, over 3044417.42 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:12:00,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1809140.0, ans=0.125 2023-11-22 05:12:00,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-22 05:12:11,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1809140.0, ans=0.1 2023-11-22 05:12:15,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1809206.6666666667, ans=0.125 2023-11-22 05:12:20,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1809206.6666666667, ans=0.125 2023-11-22 05:12:20,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1809206.6666666667, ans=0.0 2023-11-22 05:12:23,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1809206.6666666667, ans=0.025 2023-11-22 05:12:35,952 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271400 2023-11-22 05:12:45,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1809340.0, ans=0.0 2023-11-22 05:12:52,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1809406.6666666667, ans=0.125 2023-11-22 05:13:03,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1809406.6666666667, ans=0.2 2023-11-22 05:13:06,050 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6900, loss[loss=0.05868, simple_loss=0.07222, pruned_loss=0.01144, audio_tagging_loss=0.01113, over 14965.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09514, pruned_loss=0.01556, audio_tagging_loss=0.009216, over 3040296.12 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:13:09,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1809473.3333333333, ans=0.0 2023-11-22 05:13:16,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-22 05:13:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1809540.0, ans=0.09899494936611666 2023-11-22 05:13:24,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1809540.0, ans=0.2 2023-11-22 05:13:36,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1809606.6666666667, ans=0.2 2023-11-22 05:13:40,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271450 2023-11-22 05:13:47,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.045e+01 8.623e+01 9.194e+01 1.221e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 05:13:51,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1809673.3333333333, ans=0.0 2023-11-22 05:13:55,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2023-11-22 05:13:55,799 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 05:14:06,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1809740.0, ans=0.0 2023-11-22 05:14:08,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-22 05:14:10,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-22 05:14:11,291 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 6950, loss[loss=0.09499, simple_loss=0.1134, pruned_loss=0.02634, audio_tagging_loss=0.01193, over 16066.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09529, pruned_loss=0.01554, audio_tagging_loss=0.009149, over 3039551.83 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:14:19,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1809806.6666666667, ans=0.0 2023-11-22 05:14:46,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271500 2023-11-22 05:14:51,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1810006.6666666667, ans=0.0 2023-11-22 05:14:53,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1810006.6666666667, ans=0.05 2023-11-22 05:14:55,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1810006.6666666667, ans=0.1 2023-11-22 05:14:58,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-22 05:15:08,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-22 05:15:15,683 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7000, loss[loss=0.0719, simple_loss=0.09695, pruned_loss=0.01429, audio_tagging_loss=0.009134, over 15772.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09542, pruned_loss=0.01566, audio_tagging_loss=0.009178, over 3035960.03 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:15:39,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-22 05:15:43,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1810273.3333333333, ans=0.125 2023-11-22 05:15:44,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1810273.3333333333, ans=0.1 2023-11-22 05:15:46,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1810273.3333333333, ans=15.0 2023-11-22 05:15:50,523 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271550 2023-11-22 05:15:55,125 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:15:55,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.160e+01 8.872e+01 9.665e+01 1.181e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 05:16:02,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1810340.0, ans=0.125 2023-11-22 05:16:15,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1810406.6666666667, ans=0.0 2023-11-22 05:16:20,962 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7050, loss[loss=0.08197, simple_loss=0.1113, pruned_loss=0.01803, audio_tagging_loss=0.008297, over 15536.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09576, pruned_loss=0.01567, audio_tagging_loss=0.009222, over 3045456.53 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:16:26,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1810473.3333333333, ans=0.0 2023-11-22 05:16:37,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1810540.0, ans=0.0 2023-11-22 05:16:46,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1810606.6666666667, ans=0.125 2023-11-22 05:16:54,690 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:16:55,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271600 2023-11-22 05:17:26,014 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7100, loss[loss=0.07555, simple_loss=0.09819, pruned_loss=0.01658, audio_tagging_loss=0.009883, over 14784.00 frames. ], tot_loss[loss=0.07218, simple_loss=0.09451, pruned_loss=0.01551, audio_tagging_loss=0.009415, over 3052602.08 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:17:31,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1810806.6666666667, ans=0.0 2023-11-22 05:17:37,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1810873.3333333333, ans=0.125 2023-11-22 05:17:57,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1810940.0, ans=0.125 2023-11-22 05:17:59,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271650 2023-11-22 05:18:05,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.205e+01 9.069e+01 1.012e+02 2.750e+02, threshold=1.814e+02, percent-clipped=1.0 2023-11-22 05:18:10,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1811006.6666666667, ans=0.2 2023-11-22 05:18:30,084 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7150, loss[loss=0.06244, simple_loss=0.08423, pruned_loss=0.01063, audio_tagging_loss=0.009699, over 16341.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09492, pruned_loss=0.01553, audio_tagging_loss=0.009468, over 3057039.07 frames. ], batch size: 62, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:18:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1811140.0, ans=0.09899494936611666 2023-11-22 05:18:45,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1811206.6666666667, ans=0.0 2023-11-22 05:19:05,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271700 2023-11-22 05:19:23,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1811406.6666666667, ans=0.125 2023-11-22 05:19:34,522 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7200, loss[loss=0.08023, simple_loss=0.1043, pruned_loss=0.0174, audio_tagging_loss=0.01066, over 15710.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09498, pruned_loss=0.01559, audio_tagging_loss=0.009594, over 3056309.20 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:19:50,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1811540.0, ans=0.0 2023-11-22 05:19:54,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1811540.0, ans=0.2 2023-11-22 05:20:09,903 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271750 2023-11-22 05:20:14,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.124e+01 9.002e+01 9.849e+01 1.230e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-22 05:20:26,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1811740.0, ans=0.09899494936611666 2023-11-22 05:20:40,015 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7250, loss[loss=0.06925, simple_loss=0.08616, pruned_loss=0.01544, audio_tagging_loss=0.01072, over 14831.00 frames. ], tot_loss[loss=0.07318, simple_loss=0.09557, pruned_loss=0.01577, audio_tagging_loss=0.009614, over 3053503.85 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:20:40,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1811806.6666666667, ans=0.1 2023-11-22 05:20:42,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1811806.6666666667, ans=0.125 2023-11-22 05:20:56,903 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:21:00,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1811873.3333333333, ans=0.0 2023-11-22 05:21:05,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1811940.0, ans=0.0 2023-11-22 05:21:13,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1811940.0, ans=0.125 2023-11-22 05:21:14,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271800 2023-11-22 05:21:26,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1812006.6666666667, ans=0.125 2023-11-22 05:21:29,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1812006.6666666667, ans=0.125 2023-11-22 05:21:30,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1812073.3333333333, ans=0.0 2023-11-22 05:21:44,760 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7300, loss[loss=0.0788, simple_loss=0.1108, pruned_loss=0.01557, audio_tagging_loss=0.007847, over 15067.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09548, pruned_loss=0.01574, audio_tagging_loss=0.009496, over 3050594.30 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:21:53,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1812140.0, ans=0.125 2023-11-22 05:21:59,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1812206.6666666667, ans=0.0 2023-11-22 05:21:59,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2023-11-22 05:22:08,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-22 05:22:19,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271850 2023-11-22 05:22:21,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1812273.3333333333, ans=0.125 2023-11-22 05:22:25,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1812340.0, ans=0.125 2023-11-22 05:22:25,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.604e+01 8.310e+01 9.010e+01 9.721e+01 2.858e+02, threshold=1.802e+02, percent-clipped=1.0 2023-11-22 05:22:28,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-22 05:22:49,434 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7350, loss[loss=0.06791, simple_loss=0.1011, pruned_loss=0.01224, audio_tagging_loss=0.005102, over 14782.00 frames. ], tot_loss[loss=0.07222, simple_loss=0.09484, pruned_loss=0.01552, audio_tagging_loss=0.009278, over 3042325.32 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:22:50,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2023-11-22 05:23:11,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-22 05:23:17,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1812606.6666666667, ans=0.125 2023-11-22 05:23:21,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1812606.6666666667, ans=0.0 2023-11-22 05:23:25,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271900 2023-11-22 05:23:31,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1812673.3333333333, ans=0.125 2023-11-22 05:23:54,477 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7400, loss[loss=0.0539, simple_loss=0.07173, pruned_loss=0.009491, audio_tagging_loss=0.008542, over 15964.00 frames. ], tot_loss[loss=0.07214, simple_loss=0.09469, pruned_loss=0.01549, audio_tagging_loss=0.0093, over 3042423.47 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:24:11,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:24:13,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-11-22 05:24:14,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1812873.3333333333, ans=0.125 2023-11-22 05:24:15,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-22 05:24:22,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1812940.0, ans=0.0 2023-11-22 05:24:30,097 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 271950 2023-11-22 05:24:36,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 7.965e+01 8.642e+01 9.211e+01 1.272e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-22 05:24:36,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1813006.6666666667, ans=0.125 2023-11-22 05:24:58,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1813140.0, ans=0.125 2023-11-22 05:25:00,072 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7450, loss[loss=0.06003, simple_loss=0.07924, pruned_loss=0.0106, audio_tagging_loss=0.0098, over 15538.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09431, pruned_loss=0.01535, audio_tagging_loss=0.00924, over 3047976.85 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:25:11,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1813206.6666666667, ans=0.125 2023-11-22 05:25:28,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1813273.3333333333, ans=0.125 2023-11-22 05:25:34,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272000 2023-11-22 05:26:00,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1813406.6666666667, ans=0.125 2023-11-22 05:26:08,229 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7500, loss[loss=0.05752, simple_loss=0.07542, pruned_loss=0.0117, audio_tagging_loss=0.008113, over 15013.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.09401, pruned_loss=0.01548, audio_tagging_loss=0.009223, over 3053841.87 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:26:08,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1813473.3333333333, ans=0.0 2023-11-22 05:26:18,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1813473.3333333333, ans=0.125 2023-11-22 05:26:26,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1813540.0, ans=0.125 2023-11-22 05:26:43,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272050 2023-11-22 05:26:46,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1813673.3333333333, ans=0.125 2023-11-22 05:26:49,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.448e+01 8.968e+01 9.582e+01 2.186e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-22 05:27:00,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1813740.0, ans=0.1 2023-11-22 05:27:06,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1813740.0, ans=0.125 2023-11-22 05:27:10,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1813740.0, ans=0.125 2023-11-22 05:27:13,285 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7550, loss[loss=0.06367, simple_loss=0.08286, pruned_loss=0.01183, audio_tagging_loss=0.01041, over 14645.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09321, pruned_loss=0.01543, audio_tagging_loss=0.009347, over 3054652.27 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:27:38,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1813940.0, ans=0.1 2023-11-22 05:27:39,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1813940.0, ans=0.0 2023-11-22 05:27:48,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272100 2023-11-22 05:27:53,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1814006.6666666667, ans=0.125 2023-11-22 05:28:03,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1814073.3333333333, ans=0.0 2023-11-22 05:28:18,019 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7600, loss[loss=0.08554, simple_loss=0.1211, pruned_loss=0.01823, audio_tagging_loss=0.00675, over 15355.00 frames. ], tot_loss[loss=0.07156, simple_loss=0.09369, pruned_loss=0.01538, audio_tagging_loss=0.009343, over 3051833.13 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:28:51,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272150 2023-11-22 05:28:59,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.478e+01 8.103e+01 8.710e+01 9.390e+01 1.351e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 05:29:20,739 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7650, loss[loss=0.07821, simple_loss=0.1105, pruned_loss=0.01583, audio_tagging_loss=0.007125, over 15060.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09343, pruned_loss=0.01523, audio_tagging_loss=0.009379, over 3042465.87 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:29:37,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1814540.0, ans=0.1 2023-11-22 05:29:40,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1814540.0, ans=0.0 2023-11-22 05:29:43,163 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:29:49,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2023-11-22 05:29:56,429 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272200 2023-11-22 05:30:00,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1814673.3333333333, ans=0.125 2023-11-22 05:30:10,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1814673.3333333333, ans=0.125 2023-11-22 05:30:25,864 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7700, loss[loss=0.08611, simple_loss=0.1165, pruned_loss=0.02025, audio_tagging_loss=0.007598, over 15911.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09317, pruned_loss=0.01512, audio_tagging_loss=0.009406, over 3041849.13 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:30:30,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1814806.6666666667, ans=0.0 2023-11-22 05:30:53,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1814940.0, ans=0.05 2023-11-22 05:31:01,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272250 2023-11-22 05:31:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1814940.0, ans=0.1 2023-11-22 05:31:03,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1814940.0, ans=0.1 2023-11-22 05:31:09,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 7.911e+01 8.690e+01 9.336e+01 1.339e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-22 05:31:16,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1815006.6666666667, ans=0.125 2023-11-22 05:31:31,244 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7750, loss[loss=0.08905, simple_loss=0.1141, pruned_loss=0.02279, audio_tagging_loss=0.009235, over 14529.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09426, pruned_loss=0.01539, audio_tagging_loss=0.009247, over 3042137.64 frames. ], batch size: 52, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:31:55,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1815206.6666666667, ans=0.1 2023-11-22 05:32:06,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272300 2023-11-22 05:32:27,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1815406.6666666667, ans=0.125 2023-11-22 05:32:36,684 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7800, loss[loss=0.08702, simple_loss=0.1132, pruned_loss=0.02087, audio_tagging_loss=0.009564, over 15244.00 frames. ], tot_loss[loss=0.07257, simple_loss=0.09559, pruned_loss=0.01559, audio_tagging_loss=0.009182, over 3045636.96 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:32:42,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1815473.3333333333, ans=0.0 2023-11-22 05:32:44,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1815473.3333333333, ans=0.125 2023-11-22 05:32:47,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1815473.3333333333, ans=0.035 2023-11-22 05:32:48,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1815540.0, ans=0.04949747468305833 2023-11-22 05:33:05,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1815606.6666666667, ans=0.125 2023-11-22 05:33:11,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272350 2023-11-22 05:33:19,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.265e+01 8.937e+01 9.454e+01 1.226e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 05:33:26,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1815673.3333333333, ans=0.0 2023-11-22 05:33:38,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2023-11-22 05:33:41,803 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7850, loss[loss=0.09443, simple_loss=0.1253, pruned_loss=0.02348, audio_tagging_loss=0.008307, over 14880.00 frames. ], tot_loss[loss=0.07311, simple_loss=0.09609, pruned_loss=0.01579, audio_tagging_loss=0.00927, over 3045486.40 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:33:48,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1815806.6666666667, ans=0.1 2023-11-22 05:33:56,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1815873.3333333333, ans=0.035 2023-11-22 05:33:57,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1815873.3333333333, ans=0.125 2023-11-22 05:33:58,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1815873.3333333333, ans=0.0 2023-11-22 05:34:05,326 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:34:17,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272400 2023-11-22 05:34:27,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1816006.6666666667, ans=0.05 2023-11-22 05:34:41,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1816073.3333333333, ans=0.0 2023-11-22 05:34:47,295 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7900, loss[loss=0.06846, simple_loss=0.09072, pruned_loss=0.01439, audio_tagging_loss=0.008717, over 14832.00 frames. ], tot_loss[loss=0.07354, simple_loss=0.09672, pruned_loss=0.01591, audio_tagging_loss=0.009265, over 3048999.02 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:34:52,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1816140.0, ans=0.0 2023-11-22 05:35:16,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1816273.3333333333, ans=0.125 2023-11-22 05:35:22,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272450 2023-11-22 05:35:30,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.475e+01 8.108e+01 8.831e+01 9.576e+01 1.163e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 05:35:41,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1816406.6666666667, ans=0.125 2023-11-22 05:35:51,672 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 7950, loss[loss=0.07539, simple_loss=0.1046, pruned_loss=0.01383, audio_tagging_loss=0.009272, over 15361.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.09577, pruned_loss=0.01589, audio_tagging_loss=0.009344, over 3050685.64 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:35:56,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1816473.3333333333, ans=0.02 2023-11-22 05:36:07,932 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 05:36:15,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1816540.0, ans=0.125 2023-11-22 05:36:22,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1816606.6666666667, ans=0.125 2023-11-22 05:36:27,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272500 2023-11-22 05:36:42,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-22 05:36:48,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-11-22 05:36:52,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1816740.0, ans=0.125 2023-11-22 05:36:57,545 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8000, loss[loss=0.07733, simple_loss=0.1061, pruned_loss=0.01482, audio_tagging_loss=0.009447, over 15023.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09497, pruned_loss=0.01569, audio_tagging_loss=0.00938, over 3046851.40 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:37:32,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272550 2023-11-22 05:37:40,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.294e+01 8.813e+01 9.754e+01 1.240e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 05:38:02,644 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8050, loss[loss=0.06497, simple_loss=0.07176, pruned_loss=0.01289, audio_tagging_loss=0.0162, over 15440.00 frames. ], tot_loss[loss=0.07302, simple_loss=0.09551, pruned_loss=0.0158, audio_tagging_loss=0.009463, over 3046623.91 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:38:03,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-22 05:38:16,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1817206.6666666667, ans=0.125 2023-11-22 05:38:16,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1817206.6666666667, ans=0.0 2023-11-22 05:38:25,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1817206.6666666667, ans=0.125 2023-11-22 05:38:37,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272600 2023-11-22 05:38:48,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.14 vs. limit=22.5 2023-11-22 05:39:04,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1817406.6666666667, ans=0.125 2023-11-22 05:39:06,909 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8100, loss[loss=0.07902, simple_loss=0.1109, pruned_loss=0.01632, audio_tagging_loss=0.007244, over 16350.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09496, pruned_loss=0.01568, audio_tagging_loss=0.009482, over 3045424.78 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:39:17,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1817473.3333333333, ans=0.1 2023-11-22 05:39:20,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1817540.0, ans=0.0 2023-11-22 05:39:26,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1817540.0, ans=0.125 2023-11-22 05:39:30,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2023-11-22 05:39:41,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272650 2023-11-22 05:39:44,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-22 05:39:50,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.525e+01 8.265e+01 8.788e+01 9.527e+01 1.186e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 05:39:56,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1817673.3333333333, ans=15.0 2023-11-22 05:39:58,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1817740.0, ans=0.125 2023-11-22 05:40:10,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-22 05:40:11,848 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8150, loss[loss=0.0593, simple_loss=0.07712, pruned_loss=0.01401, audio_tagging_loss=0.006731, over 14671.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09542, pruned_loss=0.01559, audio_tagging_loss=0.009351, over 3049085.21 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:40:19,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-22 05:40:45,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2023-11-22 05:40:46,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272700 2023-11-22 05:41:16,698 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8200, loss[loss=0.07083, simple_loss=0.09339, pruned_loss=0.01662, audio_tagging_loss=0.007519, over 14604.00 frames. ], tot_loss[loss=0.07333, simple_loss=0.09631, pruned_loss=0.01584, audio_tagging_loss=0.009331, over 3046865.53 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:41:16,763 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 05:41:29,789 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:41:50,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272750 2023-11-22 05:41:57,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-22 05:42:00,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.465e+01 8.322e+01 8.892e+01 9.603e+01 1.403e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 05:42:21,091 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8250, loss[loss=0.07053, simple_loss=0.08799, pruned_loss=0.01646, audio_tagging_loss=0.01008, over 14884.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09627, pruned_loss=0.0158, audio_tagging_loss=0.009337, over 3053678.80 frames. ], batch size: 60, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:42:21,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1818473.3333333333, ans=0.2 2023-11-22 05:42:48,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1818606.6666666667, ans=0.0 2023-11-22 05:42:49,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1818606.6666666667, ans=0.0 2023-11-22 05:42:56,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272800 2023-11-22 05:43:12,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1818740.0, ans=10.0 2023-11-22 05:43:25,813 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8300, loss[loss=0.06307, simple_loss=0.08512, pruned_loss=0.01195, audio_tagging_loss=0.008565, over 15806.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09498, pruned_loss=0.01554, audio_tagging_loss=0.009303, over 3053140.43 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:43:56,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1818940.0, ans=0.0 2023-11-22 05:43:57,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-22 05:44:01,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272850 2023-11-22 05:44:09,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.221e+01 8.844e+01 9.406e+01 1.147e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 05:44:31,029 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8350, loss[loss=0.07729, simple_loss=0.1014, pruned_loss=0.01796, audio_tagging_loss=0.008651, over 15007.00 frames. ], tot_loss[loss=0.0724, simple_loss=0.09525, pruned_loss=0.01559, audio_tagging_loss=0.009181, over 3056901.24 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 16.0 2023-11-22 05:44:43,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1819206.6666666667, ans=0.125 2023-11-22 05:45:04,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272900 2023-11-22 05:45:22,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1819406.6666666667, ans=0.0 2023-11-22 05:45:34,869 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8400, loss[loss=0.06398, simple_loss=0.08826, pruned_loss=0.01106, audio_tagging_loss=0.008783, over 14941.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.0953, pruned_loss=0.01549, audio_tagging_loss=0.009118, over 3063997.94 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:45:49,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1819540.0, ans=0.125 2023-11-22 05:45:52,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1819540.0, ans=0.0 2023-11-22 05:45:53,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1819540.0, ans=0.1 2023-11-22 05:46:09,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 272950 2023-11-22 05:46:17,487 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.034e+01 8.663e+01 9.140e+01 1.267e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 05:46:37,833 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8450, loss[loss=0.08219, simple_loss=0.1048, pruned_loss=0.01989, audio_tagging_loss=0.009908, over 14737.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.0953, pruned_loss=0.01552, audio_tagging_loss=0.009165, over 3059747.57 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:46:47,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1819806.6666666667, ans=0.1 2023-11-22 05:47:06,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-22 05:47:12,785 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273000 2023-11-22 05:47:41,849 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8500, loss[loss=0.07715, simple_loss=0.1, pruned_loss=0.01868, audio_tagging_loss=0.008452, over 15206.00 frames. ], tot_loss[loss=0.07326, simple_loss=0.0967, pruned_loss=0.01575, audio_tagging_loss=0.009154, over 3058999.52 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:48:00,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1820206.6666666667, ans=0.2 2023-11-22 05:48:09,355 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:48:16,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273050 2023-11-22 05:48:25,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.253e+01 8.845e+01 9.555e+01 1.261e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 05:48:36,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1820406.6666666667, ans=0.04949747468305833 2023-11-22 05:48:46,826 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8550, loss[loss=0.07484, simple_loss=0.0997, pruned_loss=0.01523, audio_tagging_loss=0.009761, over 15095.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.09595, pruned_loss=0.01575, audio_tagging_loss=0.009256, over 3062337.41 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:49:01,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1820540.0, ans=0.0 2023-11-22 05:49:03,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1820540.0, ans=0.1 2023-11-22 05:49:21,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273100 2023-11-22 05:49:29,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1820673.3333333333, ans=0.05 2023-11-22 05:49:29,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1820673.3333333333, ans=0.0 2023-11-22 05:49:49,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1820806.6666666667, ans=0.0 2023-11-22 05:49:49,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2023-11-22 05:49:50,448 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8600, loss[loss=0.07893, simple_loss=0.1141, pruned_loss=0.01292, audio_tagging_loss=0.008974, over 14325.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.09495, pruned_loss=0.01553, audio_tagging_loss=0.009311, over 3062961.54 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 32.0 2023-11-22 05:49:56,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1820806.6666666667, ans=0.0 2023-11-22 05:50:25,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273150 2023-11-22 05:50:35,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.265e+01 8.649e+01 9.295e+01 1.205e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-22 05:50:35,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1821006.6666666667, ans=0.05 2023-11-22 05:50:50,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-22 05:50:51,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-11-22 05:50:54,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-22 05:50:54,628 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8650, loss[loss=0.07807, simple_loss=0.1088, pruned_loss=0.01609, audio_tagging_loss=0.007575, over 15556.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.09606, pruned_loss=0.0155, audio_tagging_loss=0.009337, over 3064372.11 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 05:50:58,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2023-11-22 05:51:01,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1821140.0, ans=0.125 2023-11-22 05:51:08,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1821206.6666666667, ans=0.0 2023-11-22 05:51:29,184 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273200 2023-11-22 05:51:35,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1821340.0, ans=0.0 2023-11-22 05:51:48,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1821406.6666666667, ans=0.125 2023-11-22 05:51:59,113 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8700, loss[loss=0.09005, simple_loss=0.1243, pruned_loss=0.02021, audio_tagging_loss=0.007698, over 15152.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09553, pruned_loss=0.01535, audio_tagging_loss=0.009412, over 3064252.98 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 05:52:04,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1821473.3333333333, ans=0.1 2023-11-22 05:52:24,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1821606.6666666667, ans=0.0 2023-11-22 05:52:33,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273250 2023-11-22 05:52:39,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1821673.3333333333, ans=0.07 2023-11-22 05:52:43,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1821673.3333333333, ans=0.0 2023-11-22 05:52:43,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 7.986e+01 8.723e+01 9.547e+01 1.201e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 05:52:50,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1821740.0, ans=0.125 2023-11-22 05:53:03,597 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8750, loss[loss=0.08785, simple_loss=0.1107, pruned_loss=0.02172, audio_tagging_loss=0.0108, over 15638.00 frames. ], tot_loss[loss=0.07254, simple_loss=0.09527, pruned_loss=0.01539, audio_tagging_loss=0.009514, over 3062526.24 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 05:53:03,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1821806.6666666667, ans=0.125 2023-11-22 05:53:16,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1821873.3333333333, ans=0.2 2023-11-22 05:53:38,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273300 2023-11-22 05:53:47,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-22 05:54:07,858 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8800, loss[loss=0.07246, simple_loss=0.09155, pruned_loss=0.01667, audio_tagging_loss=0.01001, over 14712.00 frames. ], tot_loss[loss=0.07256, simple_loss=0.09495, pruned_loss=0.01546, audio_tagging_loss=0.009627, over 3056055.91 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 05:54:09,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1822140.0, ans=0.0 2023-11-22 05:54:11,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1822140.0, ans=0.0 2023-11-22 05:54:11,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1822140.0, ans=0.0 2023-11-22 05:54:12,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2023-11-22 05:54:13,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1822140.0, ans=0.0 2023-11-22 05:54:13,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1822140.0, ans=0.125 2023-11-22 05:54:21,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1822206.6666666667, ans=0.125 2023-11-22 05:54:42,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273350 2023-11-22 05:54:52,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.080e+01 8.934e+01 9.573e+01 1.295e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 05:55:11,557 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8850, loss[loss=0.06188, simple_loss=0.07702, pruned_loss=0.0151, audio_tagging_loss=0.008266, over 15088.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09516, pruned_loss=0.01545, audio_tagging_loss=0.009667, over 3054399.89 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 05:55:23,831 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 05:55:45,320 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:55:46,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273400 2023-11-22 05:55:54,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1822673.3333333333, ans=0.125 2023-11-22 05:56:14,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2023-11-22 05:56:14,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1822740.0, ans=0.025 2023-11-22 05:56:16,870 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8900, loss[loss=0.06379, simple_loss=0.08495, pruned_loss=0.01138, audio_tagging_loss=0.009926, over 15153.00 frames. ], tot_loss[loss=0.0725, simple_loss=0.09507, pruned_loss=0.01546, audio_tagging_loss=0.009509, over 3049605.18 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 05:56:20,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1822806.6666666667, ans=0.125 2023-11-22 05:56:51,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273450 2023-11-22 05:56:52,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1822940.0, ans=0.1 2023-11-22 05:56:59,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1823006.6666666667, ans=0.125 2023-11-22 05:56:59,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1823006.6666666667, ans=0.0 2023-11-22 05:57:03,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.200e+01 8.739e+01 9.306e+01 1.295e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 05:57:20,948 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 8950, loss[loss=0.07356, simple_loss=0.1015, pruned_loss=0.01436, audio_tagging_loss=0.00846, over 15303.00 frames. ], tot_loss[loss=0.07227, simple_loss=0.0952, pruned_loss=0.01537, audio_tagging_loss=0.009298, over 3050635.45 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 05:57:29,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1823140.0, ans=0.2 2023-11-22 05:57:36,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1823206.6666666667, ans=0.125 2023-11-22 05:57:36,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1823206.6666666667, ans=0.1 2023-11-22 05:57:44,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1823206.6666666667, ans=0.025 2023-11-22 05:57:51,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1823273.3333333333, ans=0.1 2023-11-22 05:57:56,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273500 2023-11-22 05:58:11,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2023-11-22 05:58:25,536 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9000, loss[loss=0.06532, simple_loss=0.08911, pruned_loss=0.01214, audio_tagging_loss=0.008631, over 14666.00 frames. ], tot_loss[loss=0.07243, simple_loss=0.09543, pruned_loss=0.01547, audio_tagging_loss=0.009248, over 3051963.46 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 05:58:25,538 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 05:59:06,000 INFO [train_asr.py:1253] (1/4) Epoch 23, validation: loss=0.06035, simple_loss=0.05169, pruned_loss=0.005137, audio_tagging_loss=0.02937, over 4681554.00 frames. 2023-11-22 05:59:06,001 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 05:59:11,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1823473.3333333333, ans=0.125 2023-11-22 05:59:14,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1823473.3333333333, ans=0.2 2023-11-22 05:59:34,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1823606.6666666667, ans=0.125 2023-11-22 05:59:41,212 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273550 2023-11-22 05:59:41,473 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 05:59:52,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.165e+01 8.843e+01 9.497e+01 1.751e+02, threshold=1.769e+02, percent-clipped=1.0 2023-11-22 06:00:01,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1823740.0, ans=0.125 2023-11-22 06:00:08,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1823740.0, ans=0.0 2023-11-22 06:00:10,310 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9050, loss[loss=0.06475, simple_loss=0.08467, pruned_loss=0.01183, audio_tagging_loss=0.01058, over 14647.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09535, pruned_loss=0.01554, audio_tagging_loss=0.009233, over 3050333.57 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:00:10,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1823806.6666666667, ans=0.0 2023-11-22 06:00:13,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=12.0 2023-11-22 06:00:27,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1823873.3333333333, ans=0.2 2023-11-22 06:00:45,214 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273600 2023-11-22 06:01:12,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-11-22 06:01:14,747 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9100, loss[loss=0.06225, simple_loss=0.0731, pruned_loss=0.0127, audio_tagging_loss=0.013, over 15519.00 frames. ], tot_loss[loss=0.07266, simple_loss=0.09597, pruned_loss=0.01552, audio_tagging_loss=0.009156, over 3052079.97 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:01:15,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1824140.0, ans=0.2 2023-11-22 06:01:17,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1824140.0, ans=0.1 2023-11-22 06:01:24,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1824140.0, ans=0.1 2023-11-22 06:01:33,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1824206.6666666667, ans=10.0 2023-11-22 06:01:45,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1824273.3333333333, ans=0.125 2023-11-22 06:01:46,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1824273.3333333333, ans=0.0 2023-11-22 06:01:49,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273650 2023-11-22 06:01:57,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-11-22 06:02:00,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1824340.0, ans=0.125 2023-11-22 06:02:01,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.073e+01 8.633e+01 9.456e+01 1.130e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-22 06:02:19,623 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9150, loss[loss=0.06254, simple_loss=0.08357, pruned_loss=0.01077, audio_tagging_loss=0.009988, over 15511.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.09707, pruned_loss=0.0157, audio_tagging_loss=0.009126, over 3048554.08 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:02:34,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1824540.0, ans=0.2 2023-11-22 06:02:38,307 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 06:02:47,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1824606.6666666667, ans=0.125 2023-11-22 06:02:54,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273700 2023-11-22 06:03:00,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1824673.3333333333, ans=0.2 2023-11-22 06:03:02,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2023-11-22 06:03:20,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1824740.0, ans=0.125 2023-11-22 06:03:20,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1824740.0, ans=10.0 2023-11-22 06:03:22,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-22 06:03:24,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1824806.6666666667, ans=0.2 2023-11-22 06:03:25,051 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9200, loss[loss=0.06001, simple_loss=0.08694, pruned_loss=0.009449, audio_tagging_loss=0.007085, over 15943.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09634, pruned_loss=0.01561, audio_tagging_loss=0.009166, over 3053284.97 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:03:45,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1824873.3333333333, ans=0.0 2023-11-22 06:03:52,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-22 06:03:56,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1824940.0, ans=0.1 2023-11-22 06:03:59,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273750 2023-11-22 06:04:02,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1825006.6666666667, ans=0.1 2023-11-22 06:04:12,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.119e+01 8.781e+01 9.554e+01 1.352e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 06:04:14,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1825006.6666666667, ans=0.0 2023-11-22 06:04:18,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1825073.3333333333, ans=0.04949747468305833 2023-11-22 06:04:29,574 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9250, loss[loss=0.08467, simple_loss=0.104, pruned_loss=0.02381, audio_tagging_loss=0.008881, over 16233.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09539, pruned_loss=0.01534, audio_tagging_loss=0.00912, over 3061258.05 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:04:36,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-22 06:04:45,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1825206.6666666667, ans=0.125 2023-11-22 06:05:05,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273800 2023-11-22 06:05:11,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-11-22 06:05:35,258 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9300, loss[loss=0.0725, simple_loss=0.09796, pruned_loss=0.0166, audio_tagging_loss=0.006916, over 15609.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09543, pruned_loss=0.01545, audio_tagging_loss=0.009174, over 3054235.90 frames. ], batch size: 59, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:05:40,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1825473.3333333333, ans=0.1 2023-11-22 06:05:54,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1825540.0, ans=0.125 2023-11-22 06:06:11,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273850 2023-11-22 06:06:19,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1825673.3333333333, ans=0.05 2023-11-22 06:06:24,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.188e+01 8.737e+01 9.420e+01 1.693e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 06:06:40,816 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9350, loss[loss=0.08786, simple_loss=0.1115, pruned_loss=0.02108, audio_tagging_loss=0.01104, over 15121.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09616, pruned_loss=0.0155, audio_tagging_loss=0.009208, over 3051105.57 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:06:44,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1825806.6666666667, ans=0.0 2023-11-22 06:07:15,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273900 2023-11-22 06:07:37,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1826073.3333333333, ans=0.125 2023-11-22 06:07:45,837 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9400, loss[loss=0.06113, simple_loss=0.08354, pruned_loss=0.009879, audio_tagging_loss=0.009481, over 15932.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.095, pruned_loss=0.01526, audio_tagging_loss=0.009421, over 3049945.72 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:08:13,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1826273.3333333333, ans=0.125 2023-11-22 06:08:20,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 273950 2023-11-22 06:08:34,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.127e+01 8.861e+01 9.548e+01 1.246e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 06:08:48,571 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:08:50,910 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9450, loss[loss=0.05976, simple_loss=0.0794, pruned_loss=0.01062, audio_tagging_loss=0.009442, over 15155.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09529, pruned_loss=0.01554, audio_tagging_loss=0.009435, over 3050094.84 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:09:19,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1826606.6666666667, ans=0.0 2023-11-22 06:09:23,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1826606.6666666667, ans=0.1 2023-11-22 06:09:25,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274000 2023-11-22 06:09:44,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1826740.0, ans=0.0 2023-11-22 06:09:46,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1826740.0, ans=0.125 2023-11-22 06:09:52,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1826740.0, ans=0.0 2023-11-22 06:09:55,599 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9500, loss[loss=0.07264, simple_loss=0.0944, pruned_loss=0.016, audio_tagging_loss=0.009437, over 15090.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09475, pruned_loss=0.01544, audio_tagging_loss=0.009539, over 3052744.51 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:09:58,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1826806.6666666667, ans=0.0 2023-11-22 06:10:12,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1826873.3333333333, ans=0.09899494936611666 2023-11-22 06:10:31,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274050 2023-11-22 06:10:43,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.082e+01 8.870e+01 9.586e+01 1.219e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 06:11:01,484 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9550, loss[loss=0.07203, simple_loss=0.09496, pruned_loss=0.01679, audio_tagging_loss=0.007765, over 14970.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09444, pruned_loss=0.01542, audio_tagging_loss=0.009638, over 3051548.60 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:11:04,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1827140.0, ans=0.2 2023-11-22 06:11:19,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1827206.6666666667, ans=0.125 2023-11-22 06:11:24,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1827206.6666666667, ans=0.0 2023-11-22 06:11:26,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1827273.3333333333, ans=0.125 2023-11-22 06:11:36,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274100 2023-11-22 06:11:36,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-22 06:11:50,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2023-11-22 06:11:54,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1827406.6666666667, ans=0.125 2023-11-22 06:12:06,018 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9600, loss[loss=0.07019, simple_loss=0.08849, pruned_loss=0.01442, audio_tagging_loss=0.01153, over 14296.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09484, pruned_loss=0.01542, audio_tagging_loss=0.009616, over 3040467.83 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:12:39,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1827606.6666666667, ans=0.1 2023-11-22 06:12:40,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274150 2023-11-22 06:12:46,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-22 06:12:53,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.292e+01 9.125e+01 9.909e+01 1.275e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-22 06:13:07,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.25 vs. limit=22.5 2023-11-22 06:13:09,801 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9650, loss[loss=0.08689, simple_loss=0.1159, pruned_loss=0.02112, audio_tagging_loss=0.007843, over 14823.00 frames. ], tot_loss[loss=0.07195, simple_loss=0.09435, pruned_loss=0.01516, audio_tagging_loss=0.009618, over 3035629.34 frames. ], batch size: 54, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:13:13,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1827806.6666666667, ans=0.05 2023-11-22 06:13:13,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1827806.6666666667, ans=0.1 2023-11-22 06:13:32,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1827873.3333333333, ans=0.0 2023-11-22 06:13:33,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1827873.3333333333, ans=0.125 2023-11-22 06:13:43,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1827940.0, ans=0.0 2023-11-22 06:13:45,023 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274200 2023-11-22 06:13:45,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1827940.0, ans=0.0 2023-11-22 06:14:08,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-22 06:14:09,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1828073.3333333333, ans=0.2 2023-11-22 06:14:09,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1828073.3333333333, ans=0.125 2023-11-22 06:14:09,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-11-22 06:14:14,372 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9700, loss[loss=0.08758, simple_loss=0.1216, pruned_loss=0.02093, audio_tagging_loss=0.005857, over 16358.00 frames. ], tot_loss[loss=0.07222, simple_loss=0.09506, pruned_loss=0.01534, audio_tagging_loss=0.009353, over 3031603.35 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:14:17,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1828140.0, ans=0.125 2023-11-22 06:14:24,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1828140.0, ans=0.0 2023-11-22 06:14:36,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1828206.6666666667, ans=0.125 2023-11-22 06:14:49,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274250 2023-11-22 06:15:01,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.673e+01 8.325e+01 8.777e+01 9.483e+01 1.567e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-22 06:15:16,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1828406.6666666667, ans=0.2 2023-11-22 06:15:19,160 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9750, loss[loss=0.07352, simple_loss=0.09883, pruned_loss=0.01682, audio_tagging_loss=0.007285, over 16227.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09485, pruned_loss=0.01532, audio_tagging_loss=0.009289, over 3039927.14 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:15:52,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1828606.6666666667, ans=0.0 2023-11-22 06:15:53,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274300 2023-11-22 06:16:06,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1828673.3333333333, ans=0.025 2023-11-22 06:16:12,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1828740.0, ans=0.1 2023-11-22 06:16:23,204 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9800, loss[loss=0.08768, simple_loss=0.124, pruned_loss=0.01726, audio_tagging_loss=0.008431, over 16028.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09493, pruned_loss=0.01537, audio_tagging_loss=0.009256, over 3039536.51 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:16:40,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-22 06:16:58,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274350 2023-11-22 06:17:10,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.125e+01 8.803e+01 9.670e+01 1.768e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-22 06:17:19,770 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:17:22,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1829073.3333333333, ans=0.125 2023-11-22 06:17:27,055 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9850, loss[loss=0.07175, simple_loss=0.09405, pruned_loss=0.01465, audio_tagging_loss=0.01007, over 15337.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09628, pruned_loss=0.0156, audio_tagging_loss=0.009105, over 3041005.05 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:18:01,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274400 2023-11-22 06:18:31,228 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9900, loss[loss=0.05985, simple_loss=0.07688, pruned_loss=0.01135, audio_tagging_loss=0.01006, over 15942.00 frames. ], tot_loss[loss=0.07273, simple_loss=0.09638, pruned_loss=0.01547, audio_tagging_loss=0.009064, over 3045454.74 frames. ], batch size: 62, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:18:34,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1829473.3333333333, ans=0.125 2023-11-22 06:18:44,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1829540.0, ans=0.2 2023-11-22 06:18:49,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1829540.0, ans=0.125 2023-11-22 06:18:50,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1829540.0, ans=0.2 2023-11-22 06:19:05,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274450 2023-11-22 06:19:12,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1829673.3333333333, ans=0.125 2023-11-22 06:19:17,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.069e+01 8.830e+01 9.400e+01 1.279e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 06:19:25,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1829740.0, ans=0.125 2023-11-22 06:19:35,149 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 9950, loss[loss=0.06515, simple_loss=0.08332, pruned_loss=0.01161, audio_tagging_loss=0.01189, over 16666.00 frames. ], tot_loss[loss=0.07287, simple_loss=0.09652, pruned_loss=0.01552, audio_tagging_loss=0.009089, over 3048764.30 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:19:51,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1829873.3333333333, ans=0.2 2023-11-22 06:20:10,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274500 2023-11-22 06:20:27,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1830073.3333333333, ans=0.125 2023-11-22 06:20:34,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1830073.3333333333, ans=0.2 2023-11-22 06:20:39,507 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10000, loss[loss=0.06308, simple_loss=0.08323, pruned_loss=0.01127, audio_tagging_loss=0.0102, over 15806.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09481, pruned_loss=0.01512, audio_tagging_loss=0.009105, over 3042631.69 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:20:41,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1830140.0, ans=0.125 2023-11-22 06:20:47,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1830140.0, ans=0.07 2023-11-22 06:20:51,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=22.5 2023-11-22 06:21:03,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-22 06:21:14,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-22 06:21:14,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274550 2023-11-22 06:21:18,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1830340.0, ans=0.125 2023-11-22 06:21:23,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1830340.0, ans=0.0 2023-11-22 06:21:28,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.149e+01 8.699e+01 9.689e+01 1.229e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 06:21:43,801 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10050, loss[loss=0.07877, simple_loss=0.107, pruned_loss=0.01702, audio_tagging_loss=0.008264, over 15527.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09491, pruned_loss=0.01539, audio_tagging_loss=0.009133, over 3043329.55 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:21:48,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1830473.3333333333, ans=0.125 2023-11-22 06:22:18,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274600 2023-11-22 06:22:37,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1830740.0, ans=0.125 2023-11-22 06:22:39,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1830740.0, ans=0.0 2023-11-22 06:22:45,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1830740.0, ans=0.1 2023-11-22 06:22:48,263 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10100, loss[loss=0.08477, simple_loss=0.1136, pruned_loss=0.01847, audio_tagging_loss=0.009511, over 15021.00 frames. ], tot_loss[loss=0.07249, simple_loss=0.09551, pruned_loss=0.01547, audio_tagging_loss=0.009266, over 3046297.45 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:22:54,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1830806.6666666667, ans=0.0 2023-11-22 06:23:22,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274650 2023-11-22 06:23:22,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2023-11-22 06:23:35,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2023-11-22 06:23:36,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.350e+01 8.114e+01 8.687e+01 9.390e+01 1.135e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-22 06:23:39,257 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:23:52,056 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10150, loss[loss=0.07652, simple_loss=0.1049, pruned_loss=0.01615, audio_tagging_loss=0.00791, over 16510.00 frames. ], tot_loss[loss=0.07267, simple_loss=0.09569, pruned_loss=0.01548, audio_tagging_loss=0.009344, over 3047333.42 frames. ], batch size: 62, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:24:16,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1831273.3333333333, ans=0.125 2023-11-22 06:24:21,552 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:24:23,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1831273.3333333333, ans=0.1 2023-11-22 06:24:26,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274700 2023-11-22 06:24:33,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1831340.0, ans=0.0 2023-11-22 06:24:37,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1831340.0, ans=0.1 2023-11-22 06:24:43,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1831406.6666666667, ans=0.125 2023-11-22 06:24:46,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-22 06:24:49,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2023-11-22 06:24:56,413 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10200, loss[loss=0.08084, simple_loss=0.1167, pruned_loss=0.01483, audio_tagging_loss=0.007682, over 15671.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.0949, pruned_loss=0.01541, audio_tagging_loss=0.009452, over 3046822.93 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:24:59,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1831473.3333333333, ans=0.2 2023-11-22 06:24:59,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-22 06:25:07,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1831473.3333333333, ans=0.0 2023-11-22 06:25:14,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1831540.0, ans=0.125 2023-11-22 06:25:19,844 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:25:29,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1831606.6666666667, ans=0.125 2023-11-22 06:25:30,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-22 06:25:31,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274750 2023-11-22 06:25:45,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.234e+01 8.918e+01 9.479e+01 1.480e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 06:25:57,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1831740.0, ans=0.1 2023-11-22 06:26:00,403 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10250, loss[loss=0.08819, simple_loss=0.1127, pruned_loss=0.02341, audio_tagging_loss=0.00844, over 13513.00 frames. ], tot_loss[loss=0.07256, simple_loss=0.09504, pruned_loss=0.01557, audio_tagging_loss=0.009474, over 3048039.77 frames. ], batch size: 52, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:26:18,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1831873.3333333333, ans=0.0 2023-11-22 06:26:19,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1831873.3333333333, ans=0.0 2023-11-22 06:26:23,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-11-22 06:26:26,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1831940.0, ans=15.0 2023-11-22 06:26:35,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274800 2023-11-22 06:27:05,160 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10300, loss[loss=0.04883, simple_loss=0.06165, pruned_loss=0.00699, audio_tagging_loss=0.01102, over 14912.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09496, pruned_loss=0.01554, audio_tagging_loss=0.009503, over 3057189.27 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:27:12,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1832140.0, ans=0.125 2023-11-22 06:27:20,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=22.5 2023-11-22 06:27:39,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274850 2023-11-22 06:27:48,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1832340.0, ans=0.0 2023-11-22 06:27:53,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.276e+01 9.014e+01 9.790e+01 1.332e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-22 06:28:07,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1832406.6666666667, ans=0.125 2023-11-22 06:28:09,399 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10350, loss[loss=0.06211, simple_loss=0.0778, pruned_loss=0.008139, audio_tagging_loss=0.01507, over 16218.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09521, pruned_loss=0.01551, audio_tagging_loss=0.009598, over 3053670.23 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2023-11-22 06:28:10,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1832473.3333333333, ans=0.125 2023-11-22 06:28:23,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1832540.0, ans=0.1 2023-11-22 06:28:32,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1832540.0, ans=0.2 2023-11-22 06:28:37,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1832606.6666666667, ans=0.1 2023-11-22 06:28:44,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274900 2023-11-22 06:28:50,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1832673.3333333333, ans=0.125 2023-11-22 06:29:09,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1832740.0, ans=0.0 2023-11-22 06:29:10,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1832740.0, ans=0.95 2023-11-22 06:29:13,113 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10400, loss[loss=0.06564, simple_loss=0.08684, pruned_loss=0.01276, audio_tagging_loss=0.009453, over 14763.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09467, pruned_loss=0.01536, audio_tagging_loss=0.009715, over 3050526.67 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:29:23,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1832806.6666666667, ans=0.125 2023-11-22 06:29:44,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1832940.0, ans=0.125 2023-11-22 06:29:47,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 274950 2023-11-22 06:29:50,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1833006.6666666667, ans=0.125 2023-11-22 06:30:01,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.321e+01 8.235e+01 8.763e+01 9.443e+01 1.759e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 06:30:03,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1833073.3333333333, ans=15.0 2023-11-22 06:30:10,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1833073.3333333333, ans=0.0 2023-11-22 06:30:17,052 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10450, loss[loss=0.05538, simple_loss=0.07663, pruned_loss=0.009176, audio_tagging_loss=0.00789, over 15119.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09546, pruned_loss=0.01552, audio_tagging_loss=0.009595, over 3048315.47 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:30:18,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1833140.0, ans=0.0 2023-11-22 06:30:24,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1833140.0, ans=0.1 2023-11-22 06:30:26,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1833140.0, ans=0.0 2023-11-22 06:30:28,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1833206.6666666667, ans=0.1 2023-11-22 06:30:51,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275000 2023-11-22 06:31:21,967 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10500, loss[loss=0.06686, simple_loss=0.08839, pruned_loss=0.01324, audio_tagging_loss=0.009423, over 13927.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09508, pruned_loss=0.01544, audio_tagging_loss=0.009462, over 3041870.69 frames. ], batch size: 53, lr: 2.95e-03, grad_scale: 32.0 2023-11-22 06:31:24,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1833473.3333333333, ans=0.2 2023-11-22 06:31:28,693 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 06:31:29,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1833473.3333333333, ans=0.125 2023-11-22 06:31:45,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1833540.0, ans=0.0 2023-11-22 06:31:56,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275050 2023-11-22 06:32:07,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1833673.3333333333, ans=0.125 2023-11-22 06:32:10,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.531e+01 8.246e+01 8.781e+01 9.587e+01 1.309e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 06:32:25,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-22 06:32:26,263 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10550, loss[loss=0.05792, simple_loss=0.0737, pruned_loss=0.01341, audio_tagging_loss=0.00766, over 15397.00 frames. ], tot_loss[loss=0.07213, simple_loss=0.09454, pruned_loss=0.0155, audio_tagging_loss=0.009362, over 3047338.27 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:32:30,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-22 06:33:00,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2023-11-22 06:33:00,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275100 2023-11-22 06:33:07,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1834006.6666666667, ans=0.125 2023-11-22 06:33:29,420 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10600, loss[loss=0.07545, simple_loss=0.1097, pruned_loss=0.01385, audio_tagging_loss=0.006773, over 15408.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.09459, pruned_loss=0.01544, audio_tagging_loss=0.009311, over 3047046.04 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:33:35,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1834140.0, ans=0.125 2023-11-22 06:34:05,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=12.0 2023-11-22 06:34:05,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275150 2023-11-22 06:34:11,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1834340.0, ans=0.0 2023-11-22 06:34:13,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1834340.0, ans=0.2 2023-11-22 06:34:19,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.081e+01 8.805e+01 9.218e+01 1.246e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 06:34:34,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1834473.3333333333, ans=0.1 2023-11-22 06:34:36,099 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10650, loss[loss=0.07031, simple_loss=0.08225, pruned_loss=0.01348, audio_tagging_loss=0.0157, over 14400.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09373, pruned_loss=0.01531, audio_tagging_loss=0.009314, over 3037928.74 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:34:42,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-22 06:34:59,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1834540.0, ans=0.0 2023-11-22 06:35:07,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2023-11-22 06:35:10,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275200 2023-11-22 06:35:10,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1834606.6666666667, ans=0.125 2023-11-22 06:35:12,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1834606.6666666667, ans=0.1 2023-11-22 06:35:15,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1834673.3333333333, ans=0.2 2023-11-22 06:35:23,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1834673.3333333333, ans=0.0 2023-11-22 06:35:41,440 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10700, loss[loss=0.07486, simple_loss=0.1112, pruned_loss=0.01252, audio_tagging_loss=0.006761, over 14747.00 frames. ], tot_loss[loss=0.07139, simple_loss=0.09386, pruned_loss=0.01521, audio_tagging_loss=0.009253, over 3036716.81 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:35:57,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1834873.3333333333, ans=0.125 2023-11-22 06:36:15,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275250 2023-11-22 06:36:30,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.298e+01 8.136e+01 8.812e+01 9.444e+01 1.567e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 06:36:32,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1835073.3333333333, ans=0.125 2023-11-22 06:36:35,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1835073.3333333333, ans=0.125 2023-11-22 06:36:35,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1835073.3333333333, ans=0.0 2023-11-22 06:36:38,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1835073.3333333333, ans=0.2 2023-11-22 06:36:38,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1835073.3333333333, ans=15.0 2023-11-22 06:36:45,215 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10750, loss[loss=0.05367, simple_loss=0.06423, pruned_loss=0.01246, audio_tagging_loss=0.009091, over 13879.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.09381, pruned_loss=0.01525, audio_tagging_loss=0.009257, over 3044125.15 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:36:50,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1835140.0, ans=0.125 2023-11-22 06:36:57,002 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 06:36:57,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1835206.6666666667, ans=0.95 2023-11-22 06:37:21,026 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275300 2023-11-22 06:37:22,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1835273.3333333333, ans=0.0 2023-11-22 06:37:31,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1835340.0, ans=0.0 2023-11-22 06:37:33,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1835340.0, ans=0.0 2023-11-22 06:37:43,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1835406.6666666667, ans=0.125 2023-11-22 06:37:49,831 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10800, loss[loss=0.09266, simple_loss=0.1211, pruned_loss=0.025, audio_tagging_loss=0.007131, over 16123.00 frames. ], tot_loss[loss=0.07185, simple_loss=0.09441, pruned_loss=0.01538, audio_tagging_loss=0.009265, over 3048312.32 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:37:51,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835473.3333333333, ans=0.1 2023-11-22 06:37:55,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835473.3333333333, ans=0.1 2023-11-22 06:38:09,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1835540.0, ans=0.0 2023-11-22 06:38:25,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275350 2023-11-22 06:38:28,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-22 06:38:41,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.004e+01 8.229e+01 8.855e+01 9.329e+01 1.142e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 06:38:51,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-22 06:38:56,693 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10850, loss[loss=0.0689, simple_loss=0.08725, pruned_loss=0.0177, audio_tagging_loss=0.007577, over 15293.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09436, pruned_loss=0.01538, audio_tagging_loss=0.009283, over 3052905.26 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:39:12,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-11-22 06:39:15,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835873.3333333333, ans=0.1 2023-11-22 06:39:27,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1835940.0, ans=0.1 2023-11-22 06:39:31,196 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275400 2023-11-22 06:39:36,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1836006.6666666667, ans=0.0 2023-11-22 06:39:56,792 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:40:01,530 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10900, loss[loss=0.06993, simple_loss=0.09113, pruned_loss=0.01369, audio_tagging_loss=0.01067, over 14855.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.09429, pruned_loss=0.01518, audio_tagging_loss=0.00927, over 3045554.42 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:40:04,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-22 06:40:25,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1836206.6666666667, ans=0.125 2023-11-22 06:40:35,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1836273.3333333333, ans=0.1 2023-11-22 06:40:37,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275450 2023-11-22 06:40:50,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1836340.0, ans=0.0 2023-11-22 06:40:51,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.279e+01 8.746e+01 9.747e+01 1.923e+02, threshold=1.749e+02, percent-clipped=1.0 2023-11-22 06:41:03,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1836406.6666666667, ans=0.125 2023-11-22 06:41:03,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1836406.6666666667, ans=0.125 2023-11-22 06:41:05,878 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 10950, loss[loss=0.07105, simple_loss=0.09209, pruned_loss=0.01704, audio_tagging_loss=0.007969, over 15317.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09397, pruned_loss=0.01505, audio_tagging_loss=0.009314, over 3041747.04 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:41:31,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1836606.6666666667, ans=0.2 2023-11-22 06:41:40,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275500 2023-11-22 06:41:56,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1836740.0, ans=0.125 2023-11-22 06:42:09,630 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11000, loss[loss=0.04789, simple_loss=0.05871, pruned_loss=0.007348, audio_tagging_loss=0.01119, over 15145.00 frames. ], tot_loss[loss=0.07131, simple_loss=0.0938, pruned_loss=0.01498, audio_tagging_loss=0.009431, over 3039709.50 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:42:20,679 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:42:23,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1836873.3333333333, ans=0.0 2023-11-22 06:42:31,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1836873.3333333333, ans=0.09899494936611666 2023-11-22 06:42:35,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-22 06:42:44,590 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275550 2023-11-22 06:43:00,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.356e+01 8.870e+01 9.515e+01 1.517e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 06:43:06,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=12.0 2023-11-22 06:43:11,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1837073.3333333333, ans=0.1 2023-11-22 06:43:12,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1837073.3333333333, ans=0.125 2023-11-22 06:43:14,641 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11050, loss[loss=0.07293, simple_loss=0.09329, pruned_loss=0.01864, audio_tagging_loss=0.007644, over 15274.00 frames. ], tot_loss[loss=0.07214, simple_loss=0.09469, pruned_loss=0.01527, audio_tagging_loss=0.009526, over 3038277.89 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:43:42,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-22 06:43:49,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275600 2023-11-22 06:43:53,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1837340.0, ans=0.125 2023-11-22 06:43:56,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1837340.0, ans=0.1 2023-11-22 06:44:09,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-22 06:44:18,976 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11100, loss[loss=0.05013, simple_loss=0.0554, pruned_loss=0.01276, audio_tagging_loss=0.009662, over 13866.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09456, pruned_loss=0.01533, audio_tagging_loss=0.009533, over 3037815.91 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:44:21,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1837473.3333333333, ans=0.0 2023-11-22 06:44:27,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1837473.3333333333, ans=0.125 2023-11-22 06:44:52,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1837606.6666666667, ans=0.125 2023-11-22 06:44:53,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275650 2023-11-22 06:45:11,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.196e+01 8.803e+01 9.419e+01 1.192e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 06:45:23,233 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11150, loss[loss=0.06858, simple_loss=0.09634, pruned_loss=0.01217, audio_tagging_loss=0.008246, over 15469.00 frames. ], tot_loss[loss=0.07201, simple_loss=0.09424, pruned_loss=0.0153, audio_tagging_loss=0.0096, over 3044716.71 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 8.0 2023-11-22 06:45:35,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-22 06:45:39,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1837873.3333333333, ans=0.2 2023-11-22 06:45:41,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1837873.3333333333, ans=0.04949747468305833 2023-11-22 06:45:50,345 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 06:45:52,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1837940.0, ans=0.1 2023-11-22 06:45:54,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1837940.0, ans=0.125 2023-11-22 06:45:58,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275700 2023-11-22 06:46:12,665 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 06:46:28,387 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11200, loss[loss=0.0822, simple_loss=0.1085, pruned_loss=0.01908, audio_tagging_loss=0.008852, over 14769.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09446, pruned_loss=0.0154, audio_tagging_loss=0.009747, over 3041684.84 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:46:37,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1838140.0, ans=0.125 2023-11-22 06:46:43,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-22 06:46:44,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1838206.6666666667, ans=0.125 2023-11-22 06:46:51,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1838206.6666666667, ans=0.04949747468305833 2023-11-22 06:47:02,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275750 2023-11-22 06:47:21,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.197e+01 8.985e+01 9.663e+01 1.096e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 06:47:32,684 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11250, loss[loss=0.05419, simple_loss=0.07022, pruned_loss=0.00781, audio_tagging_loss=0.01127, over 14986.00 frames. ], tot_loss[loss=0.07192, simple_loss=0.09374, pruned_loss=0.01534, audio_tagging_loss=0.009705, over 3046334.94 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:47:40,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1838473.3333333333, ans=0.125 2023-11-22 06:48:07,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275800 2023-11-22 06:48:18,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1838673.3333333333, ans=0.1 2023-11-22 06:48:26,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-11-22 06:48:29,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-22 06:48:38,279 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11300, loss[loss=0.06814, simple_loss=0.08805, pruned_loss=0.01396, audio_tagging_loss=0.01016, over 14746.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09308, pruned_loss=0.01515, audio_tagging_loss=0.009574, over 3045206.90 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:48:54,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1838873.3333333333, ans=0.1 2023-11-22 06:49:03,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1838940.0, ans=0.125 2023-11-22 06:49:05,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1838940.0, ans=0.125 2023-11-22 06:49:07,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1838940.0, ans=0.125 2023-11-22 06:49:13,708 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275850 2023-11-22 06:49:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1839006.6666666667, ans=0.0 2023-11-22 06:49:31,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.536e+01 7.924e+01 8.561e+01 9.574e+01 1.338e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-22 06:49:40,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1839073.3333333333, ans=0.0 2023-11-22 06:49:42,806 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11350, loss[loss=0.06478, simple_loss=0.08973, pruned_loss=0.01095, audio_tagging_loss=0.008967, over 15507.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09378, pruned_loss=0.01539, audio_tagging_loss=0.009458, over 3042199.03 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:49:52,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1839140.0, ans=0.07 2023-11-22 06:50:02,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1839206.6666666667, ans=0.125 2023-11-22 06:50:17,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275900 2023-11-22 06:50:39,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1839406.6666666667, ans=0.0 2023-11-22 06:50:47,663 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11400, loss[loss=0.071, simple_loss=0.09814, pruned_loss=0.01532, audio_tagging_loss=0.006609, over 14941.00 frames. ], tot_loss[loss=0.07238, simple_loss=0.09497, pruned_loss=0.01557, audio_tagging_loss=0.009328, over 3037187.11 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:50:54,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1839473.3333333333, ans=0.0 2023-11-22 06:50:57,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1839473.3333333333, ans=0.0 2023-11-22 06:51:04,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1839540.0, ans=0.125 2023-11-22 06:51:05,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1839540.0, ans=0.05 2023-11-22 06:51:19,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1839606.6666666667, ans=0.1 2023-11-22 06:51:22,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 275950 2023-11-22 06:51:33,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1839673.3333333333, ans=0.125 2023-11-22 06:51:40,304 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.167e+01 8.873e+01 9.595e+01 1.109e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 06:51:51,932 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11450, loss[loss=0.07185, simple_loss=0.09659, pruned_loss=0.01614, audio_tagging_loss=0.007414, over 16110.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.0953, pruned_loss=0.01551, audio_tagging_loss=0.009256, over 3042055.52 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:51:55,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1839806.6666666667, ans=0.2 2023-11-22 06:52:05,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1839873.3333333333, ans=0.1 2023-11-22 06:52:06,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1839873.3333333333, ans=0.125 2023-11-22 06:52:27,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276000 2023-11-22 06:52:40,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1840006.6666666667, ans=0.125 2023-11-22 06:52:47,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1840073.3333333333, ans=0.0 2023-11-22 06:52:51,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2023-11-22 06:52:59,770 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11500, loss[loss=0.08048, simple_loss=0.1054, pruned_loss=0.02, audio_tagging_loss=0.007795, over 15334.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09613, pruned_loss=0.01576, audio_tagging_loss=0.009148, over 3042089.50 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:53:18,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1840206.6666666667, ans=0.04949747468305833 2023-11-22 06:53:32,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2023-11-22 06:53:34,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276050 2023-11-22 06:53:35,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-22 06:53:45,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1840340.0, ans=0.0 2023-11-22 06:53:52,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.162e+01 8.710e+01 9.515e+01 1.226e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 06:53:53,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1840406.6666666667, ans=0.125 2023-11-22 06:54:03,967 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11550, loss[loss=0.05224, simple_loss=0.06316, pruned_loss=0.008597, audio_tagging_loss=0.01207, over 16101.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09608, pruned_loss=0.01579, audio_tagging_loss=0.009182, over 3045914.07 frames. ], batch size: 61, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 06:54:22,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-11-22 06:54:23,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1840540.0, ans=0.125 2023-11-22 06:54:36,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1840606.6666666667, ans=0.1 2023-11-22 06:54:39,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276100 2023-11-22 06:54:42,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1840673.3333333333, ans=0.0 2023-11-22 06:54:44,941 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 06:54:55,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1840740.0, ans=0.125 2023-11-22 06:55:06,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1840740.0, ans=0.125 2023-11-22 06:55:09,737 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11600, loss[loss=0.05766, simple_loss=0.06969, pruned_loss=0.01199, audio_tagging_loss=0.01083, over 15142.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.0955, pruned_loss=0.0157, audio_tagging_loss=0.009155, over 3049581.73 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:55:10,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1840806.6666666667, ans=0.0 2023-11-22 06:55:16,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1840806.6666666667, ans=0.04949747468305833 2023-11-22 06:55:23,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-22 06:55:39,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-22 06:55:44,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276150 2023-11-22 06:55:45,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1840940.0, ans=0.2 2023-11-22 06:55:52,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-22 06:55:54,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1841006.6666666667, ans=0.1 2023-11-22 06:56:03,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.370e+01 8.963e+01 9.598e+01 1.244e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-22 06:56:04,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-22 06:56:08,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1841073.3333333333, ans=0.125 2023-11-22 06:56:10,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1841073.3333333333, ans=0.035 2023-11-22 06:56:15,239 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11650, loss[loss=0.07264, simple_loss=0.107, pruned_loss=0.01179, audio_tagging_loss=0.00734, over 16035.00 frames. ], tot_loss[loss=0.07279, simple_loss=0.09562, pruned_loss=0.01578, audio_tagging_loss=0.009204, over 3044295.90 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:56:26,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1841206.6666666667, ans=0.125 2023-11-22 06:56:50,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276200 2023-11-22 06:57:06,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1841406.6666666667, ans=0.125 2023-11-22 06:57:13,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1841406.6666666667, ans=0.0 2023-11-22 06:57:19,040 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11700, loss[loss=0.05617, simple_loss=0.06626, pruned_loss=0.009636, audio_tagging_loss=0.01341, over 15450.00 frames. ], tot_loss[loss=0.07241, simple_loss=0.09512, pruned_loss=0.01556, audio_tagging_loss=0.009285, over 3048209.31 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:57:20,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1841473.3333333333, ans=0.1 2023-11-22 06:57:21,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1841473.3333333333, ans=0.2 2023-11-22 06:57:40,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1841540.0, ans=0.07 2023-11-22 06:57:54,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276250 2023-11-22 06:57:58,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1841673.3333333333, ans=0.035 2023-11-22 06:58:11,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.571e+01 8.445e+01 9.108e+01 1.005e+02 1.343e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-22 06:58:16,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1841740.0, ans=0.1 2023-11-22 06:58:16,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-22 06:58:24,066 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11750, loss[loss=0.05212, simple_loss=0.06823, pruned_loss=0.007226, audio_tagging_loss=0.01078, over 14220.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09557, pruned_loss=0.01575, audio_tagging_loss=0.009241, over 3045695.68 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 06:58:48,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1841940.0, ans=0.125 2023-11-22 06:58:49,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1841940.0, ans=0.125 2023-11-22 06:58:58,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276300 2023-11-22 06:59:20,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-22 06:59:28,429 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11800, loss[loss=0.05953, simple_loss=0.07914, pruned_loss=0.01222, audio_tagging_loss=0.007739, over 14903.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.09503, pruned_loss=0.01559, audio_tagging_loss=0.009188, over 3041407.55 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 07:00:03,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-11-22 07:00:03,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276350 2023-11-22 07:00:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1842340.0, ans=0.125 2023-11-22 07:00:11,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=22.5 2023-11-22 07:00:24,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.151e+01 8.673e+01 9.167e+01 1.144e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-22 07:00:31,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1842406.6666666667, ans=0.125 2023-11-22 07:00:34,079 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11850, loss[loss=0.07305, simple_loss=0.1042, pruned_loss=0.01256, audio_tagging_loss=0.008415, over 15384.00 frames. ], tot_loss[loss=0.07315, simple_loss=0.09625, pruned_loss=0.01584, audio_tagging_loss=0.009187, over 3038632.66 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 07:00:35,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1842473.3333333333, ans=0.125 2023-11-22 07:00:49,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1842540.0, ans=0.0 2023-11-22 07:00:52,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1842540.0, ans=0.035 2023-11-22 07:01:04,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1842606.6666666667, ans=0.0 2023-11-22 07:01:09,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276400 2023-11-22 07:01:26,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1842740.0, ans=6.0 2023-11-22 07:01:38,740 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11900, loss[loss=0.06849, simple_loss=0.09166, pruned_loss=0.01422, audio_tagging_loss=0.008442, over 14537.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09586, pruned_loss=0.01586, audio_tagging_loss=0.009384, over 3041216.65 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 07:01:55,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1842873.3333333333, ans=0.1 2023-11-22 07:02:10,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1842940.0, ans=0.125 2023-11-22 07:02:10,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-22 07:02:14,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276450 2023-11-22 07:02:33,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.503e+01 7.969e+01 8.417e+01 9.242e+01 1.141e+02, threshold=1.683e+02, percent-clipped=0.0 2023-11-22 07:02:44,008 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 11950, loss[loss=0.08452, simple_loss=0.1207, pruned_loss=0.01748, audio_tagging_loss=0.006674, over 15253.00 frames. ], tot_loss[loss=0.07238, simple_loss=0.09459, pruned_loss=0.01557, audio_tagging_loss=0.009519, over 3038407.91 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 16.0 2023-11-22 07:03:18,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276500 2023-11-22 07:03:22,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1843340.0, ans=0.0 2023-11-22 07:03:29,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1843340.0, ans=0.125 2023-11-22 07:03:33,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1843406.6666666667, ans=0.5 2023-11-22 07:03:41,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-22 07:03:46,895 INFO [train_asr.py:1221] (1/4) Epoch 23, batch 12000, loss[loss=0.07243, simple_loss=0.09782, pruned_loss=0.01218, audio_tagging_loss=0.01134, over 15762.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09396, pruned_loss=0.01536, audio_tagging_loss=0.009641, over 3041074.56 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 32.0 2023-11-22 07:03:46,895 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 07:04:08,088 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0988, 3.5636, 4.0095, 3.6669], device='cuda:1') 2023-11-22 07:04:27,925 INFO [train_asr.py:1253] (1/4) Epoch 23, validation: loss=0.05966, simple_loss=0.05174, pruned_loss=0.005186, audio_tagging_loss=0.02861, over 4681554.00 frames. 2023-11-22 07:04:27,926 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 07:04:28,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1843473.3333333333, ans=0.1 2023-11-22 07:04:42,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1843540.0, ans=0.1 2023-11-22 07:04:53,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1843606.6666666667, ans=0.2 2023-11-22 07:04:54,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1843606.6666666667, ans=10.0 2023-11-22 07:04:54,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-22 07:05:34,207 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 0, loss[loss=0.07027, simple_loss=0.08266, pruned_loss=0.008595, audio_tagging_loss=0.02035, over 14359.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.08266, pruned_loss=0.008595, audio_tagging_loss=0.02035, over 14359.00 frames. ], batch size: 55, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:05:34,208 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 07:06:09,674 INFO [train_asr.py:1253] (1/4) Epoch 24, validation: loss=0.05907, simple_loss=0.05179, pruned_loss=0.005258, audio_tagging_loss=0.02792, over 4681554.00 frames. 2023-11-22 07:06:09,675 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 07:06:13,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276550 2023-11-22 07:06:32,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 8.350e+01 8.889e+01 9.652e+01 1.254e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 07:06:41,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1843766.6666666667, ans=0.125 2023-11-22 07:06:43,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-22 07:07:02,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1843900.0, ans=0.0 2023-11-22 07:07:13,668 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 50, loss[loss=0.07574, simple_loss=0.07631, pruned_loss=0.01639, audio_tagging_loss=0.0212, over 15762.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.09353, pruned_loss=0.01483, audio_tagging_loss=0.01805, over 690278.36 frames. ], batch size: 60, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:07:16,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1843966.6666666667, ans=0.0 2023-11-22 07:07:16,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1843966.6666666667, ans=0.125 2023-11-22 07:07:17,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276600 2023-11-22 07:07:27,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:07:38,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-22 07:08:17,326 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 100, loss[loss=0.07477, simple_loss=0.09344, pruned_loss=0.01403, audio_tagging_loss=0.01402, over 14492.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.0949, pruned_loss=0.0149, audio_tagging_loss=0.01722, over 1215399.83 frames. ], batch size: 52, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:08:21,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276650 2023-11-22 07:08:24,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-22 07:08:35,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1844366.6666666667, ans=0.0 2023-11-22 07:08:40,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.891e+01 9.359e+01 9.974e+01 1.363e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-22 07:09:01,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-22 07:09:06,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2023-11-22 07:09:22,238 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 150, loss[loss=0.06373, simple_loss=0.08136, pruned_loss=0.01184, audio_tagging_loss=0.01122, over 13598.00 frames. ], tot_loss[loss=0.07751, simple_loss=0.09416, pruned_loss=0.01488, audio_tagging_loss=0.01555, over 1618740.56 frames. ], batch size: 52, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:09:26,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276700 2023-11-22 07:09:39,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1844700.0, ans=0.0 2023-11-22 07:09:47,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1844766.6666666667, ans=0.125 2023-11-22 07:09:59,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1844833.3333333333, ans=0.1 2023-11-22 07:10:22,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-22 07:10:27,711 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 200, loss[loss=0.07399, simple_loss=0.09985, pruned_loss=0.01428, audio_tagging_loss=0.009788, over 15522.00 frames. ], tot_loss[loss=0.07583, simple_loss=0.09394, pruned_loss=0.01491, audio_tagging_loss=0.01395, over 1937554.54 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:10:28,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-22 07:10:31,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276750 2023-11-22 07:10:41,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2023-11-22 07:10:51,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.378e+01 9.056e+01 9.976e+01 1.254e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-22 07:10:58,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1845100.0, ans=0.125 2023-11-22 07:11:19,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1845233.3333333333, ans=0.125 2023-11-22 07:11:31,350 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 250, loss[loss=0.08666, simple_loss=0.1173, pruned_loss=0.01843, audio_tagging_loss=0.009561, over 15296.00 frames. ], tot_loss[loss=0.07558, simple_loss=0.09566, pruned_loss=0.01511, audio_tagging_loss=0.01264, over 2174916.04 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:11:31,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1845300.0, ans=0.0 2023-11-22 07:11:35,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276800 2023-11-22 07:11:41,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-11-22 07:11:43,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1845366.6666666667, ans=0.125 2023-11-22 07:12:00,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1845433.3333333333, ans=0.025 2023-11-22 07:12:00,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2023-11-22 07:12:29,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1845566.6666666667, ans=0.0 2023-11-22 07:12:33,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1845566.6666666667, ans=0.2 2023-11-22 07:12:36,690 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 300, loss[loss=0.06295, simple_loss=0.07838, pruned_loss=0.01499, audio_tagging_loss=0.008768, over 14430.00 frames. ], tot_loss[loss=0.07451, simple_loss=0.09547, pruned_loss=0.01511, audio_tagging_loss=0.01167, over 2372736.13 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:12:39,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1845633.3333333333, ans=0.2 2023-11-22 07:12:40,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276850 2023-11-22 07:12:40,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1845633.3333333333, ans=0.125 2023-11-22 07:12:43,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1845633.3333333333, ans=0.125 2023-11-22 07:12:47,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1845700.0, ans=0.1 2023-11-22 07:12:49,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1845700.0, ans=0.125 2023-11-22 07:12:51,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1845700.0, ans=0.0 2023-11-22 07:12:58,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1845700.0, ans=0.125 2023-11-22 07:13:00,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.553e+01 8.200e+01 8.925e+01 9.618e+01 1.197e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 07:13:03,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1845766.6666666667, ans=0.125 2023-11-22 07:13:05,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-22 07:13:06,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1845766.6666666667, ans=0.125 2023-11-22 07:13:08,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1845766.6666666667, ans=0.0 2023-11-22 07:13:20,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1845833.3333333333, ans=0.0 2023-11-22 07:13:29,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-11-22 07:13:32,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1845900.0, ans=0.125 2023-11-22 07:13:39,905 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 350, loss[loss=0.07493, simple_loss=0.09494, pruned_loss=0.01739, audio_tagging_loss=0.01008, over 15884.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09518, pruned_loss=0.01508, audio_tagging_loss=0.011, over 2525304.10 frames. ], batch size: 60, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:13:43,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1845966.6666666667, ans=0.1 2023-11-22 07:13:44,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276900 2023-11-22 07:13:55,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-22 07:13:57,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1846033.3333333333, ans=0.125 2023-11-22 07:14:14,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1846100.0, ans=0.125 2023-11-22 07:14:31,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1846233.3333333333, ans=0.125 2023-11-22 07:14:37,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1846233.3333333333, ans=0.2 2023-11-22 07:14:44,725 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 400, loss[loss=0.0905, simple_loss=0.1279, pruned_loss=0.01692, audio_tagging_loss=0.009616, over 15031.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09487, pruned_loss=0.01507, audio_tagging_loss=0.01054, over 2651115.12 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:14:48,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 276950 2023-11-22 07:15:07,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1846366.6666666667, ans=0.1 2023-11-22 07:15:09,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 7.931e+01 8.582e+01 9.377e+01 1.207e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-22 07:15:14,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-22 07:15:39,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1846566.6666666667, ans=0.125 2023-11-22 07:15:42,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-22 07:15:49,038 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 450, loss[loss=0.05664, simple_loss=0.07506, pruned_loss=0.008302, audio_tagging_loss=0.01081, over 15521.00 frames. ], tot_loss[loss=0.07219, simple_loss=0.09392, pruned_loss=0.01495, audio_tagging_loss=0.01029, over 2741738.42 frames. ], batch size: 59, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:15:53,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277000 2023-11-22 07:16:28,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1846833.3333333333, ans=0.0 2023-11-22 07:16:35,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-22 07:16:53,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2023-11-22 07:16:53,471 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 500, loss[loss=0.09143, simple_loss=0.1271, pruned_loss=0.0235, audio_tagging_loss=0.004348, over 16544.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.09447, pruned_loss=0.01508, audio_tagging_loss=0.009982, over 2811123.23 frames. ], batch size: 59, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:16:57,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277050 2023-11-22 07:17:08,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1847033.3333333333, ans=0.125 2023-11-22 07:17:18,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.246e+01 8.888e+01 9.960e+01 1.317e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 07:17:29,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1847100.0, ans=0.0 2023-11-22 07:17:44,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1847233.3333333333, ans=0.0 2023-11-22 07:17:58,807 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 550, loss[loss=0.0728, simple_loss=0.09253, pruned_loss=0.01584, audio_tagging_loss=0.0107, over 13358.00 frames. ], tot_loss[loss=0.07132, simple_loss=0.09286, pruned_loss=0.01489, audio_tagging_loss=0.01, over 2861298.67 frames. ], batch size: 52, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:18:01,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1847300.0, ans=0.125 2023-11-22 07:18:02,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277100 2023-11-22 07:18:17,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1847366.6666666667, ans=0.0 2023-11-22 07:18:35,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1847433.3333333333, ans=0.0 2023-11-22 07:18:40,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1847500.0, ans=0.125 2023-11-22 07:18:54,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-22 07:19:00,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1847566.6666666667, ans=0.125 2023-11-22 07:19:00,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-22 07:19:02,829 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 600, loss[loss=0.0918, simple_loss=0.1293, pruned_loss=0.01963, audio_tagging_loss=0.007498, over 15584.00 frames. ], tot_loss[loss=0.07203, simple_loss=0.09393, pruned_loss=0.01517, audio_tagging_loss=0.009896, over 2901372.27 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:19:07,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277150 2023-11-22 07:19:12,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1847633.3333333333, ans=0.125 2023-11-22 07:19:19,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1847700.0, ans=0.125 2023-11-22 07:19:25,120 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:19:27,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.220e+01 8.869e+01 9.857e+01 1.255e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 07:19:28,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1847766.6666666667, ans=0.0 2023-11-22 07:19:47,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.32 vs. limit=10.0 2023-11-22 07:19:56,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1847900.0, ans=0.0 2023-11-22 07:20:07,398 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 650, loss[loss=0.06043, simple_loss=0.07404, pruned_loss=0.01155, audio_tagging_loss=0.01186, over 14860.00 frames. ], tot_loss[loss=0.07276, simple_loss=0.09511, pruned_loss=0.01551, audio_tagging_loss=0.009694, over 2945521.95 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:20:11,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277200 2023-11-22 07:20:27,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1848033.3333333333, ans=0.2 2023-11-22 07:20:36,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1848100.0, ans=0.125 2023-11-22 07:20:58,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1848233.3333333333, ans=0.025 2023-11-22 07:21:11,145 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 700, loss[loss=0.06696, simple_loss=0.0853, pruned_loss=0.01127, audio_tagging_loss=0.01304, over 16071.00 frames. ], tot_loss[loss=0.07169, simple_loss=0.09391, pruned_loss=0.01511, audio_tagging_loss=0.009632, over 2968767.63 frames. ], batch size: 59, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:21:15,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277250 2023-11-22 07:21:17,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2023-11-22 07:21:23,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1848366.6666666667, ans=0.125 2023-11-22 07:21:37,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.366e+01 8.189e+01 8.629e+01 9.381e+01 1.193e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-22 07:21:39,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1848433.3333333333, ans=0.0 2023-11-22 07:21:54,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1848500.0, ans=0.125 2023-11-22 07:22:06,777 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:22:16,024 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 750, loss[loss=0.07996, simple_loss=0.09269, pruned_loss=0.02197, audio_tagging_loss=0.01165, over 14705.00 frames. ], tot_loss[loss=0.0726, simple_loss=0.09507, pruned_loss=0.0154, audio_tagging_loss=0.009659, over 2992481.11 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:22:20,466 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277300 2023-11-22 07:22:34,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1848700.0, ans=0.1 2023-11-22 07:22:40,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1848700.0, ans=0.1 2023-11-22 07:22:42,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1848766.6666666667, ans=0.2 2023-11-22 07:22:45,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-22 07:22:49,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1848766.6666666667, ans=0.1 2023-11-22 07:23:10,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-22 07:23:15,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-22 07:23:21,086 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 800, loss[loss=0.07282, simple_loss=0.09991, pruned_loss=0.0154, audio_tagging_loss=0.007461, over 15466.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09468, pruned_loss=0.01544, audio_tagging_loss=0.009669, over 3000582.13 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:23:24,847 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277350 2023-11-22 07:23:31,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1848966.6666666667, ans=0.125 2023-11-22 07:23:46,080 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:23:46,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.163e+01 8.225e+01 8.846e+01 9.560e+01 1.170e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 07:23:50,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1849100.0, ans=0.2 2023-11-22 07:23:54,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1849100.0, ans=0.125 2023-11-22 07:23:55,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1849100.0, ans=0.0 2023-11-22 07:24:01,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1849166.6666666667, ans=0.125 2023-11-22 07:24:05,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-22 07:24:07,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1849166.6666666667, ans=0.1 2023-11-22 07:24:12,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1849233.3333333333, ans=0.125 2023-11-22 07:24:24,999 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 850, loss[loss=0.06675, simple_loss=0.08465, pruned_loss=0.0109, audio_tagging_loss=0.01352, over 15616.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.09436, pruned_loss=0.01546, audio_tagging_loss=0.00971, over 3008194.12 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:24:28,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277400 2023-11-22 07:24:28,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1849300.0, ans=0.0 2023-11-22 07:25:01,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1849433.3333333333, ans=0.0 2023-11-22 07:25:13,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-11-22 07:25:17,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1849566.6666666667, ans=0.125 2023-11-22 07:25:27,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-22 07:25:28,878 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 900, loss[loss=0.04807, simple_loss=0.05641, pruned_loss=0.009064, audio_tagging_loss=0.0108, over 15236.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.09349, pruned_loss=0.01522, audio_tagging_loss=0.009801, over 3020931.34 frames. ], batch size: 60, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:25:33,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277450 2023-11-22 07:25:33,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1849633.3333333333, ans=0.0 2023-11-22 07:25:56,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.215e+01 8.871e+01 9.558e+01 1.650e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 07:26:33,274 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 950, loss[loss=0.07077, simple_loss=0.1006, pruned_loss=0.01371, audio_tagging_loss=0.006741, over 14927.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.09378, pruned_loss=0.01518, audio_tagging_loss=0.009689, over 3031506.87 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:26:37,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277500 2023-11-22 07:26:38,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1849966.6666666667, ans=0.2 2023-11-22 07:26:45,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1850033.3333333333, ans=0.2 2023-11-22 07:27:36,851 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1000, loss[loss=0.069, simple_loss=0.08943, pruned_loss=0.01743, audio_tagging_loss=0.006855, over 14567.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09395, pruned_loss=0.01519, audio_tagging_loss=0.00943, over 3034705.44 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:27:40,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277550 2023-11-22 07:27:43,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1850300.0, ans=0.1 2023-11-22 07:27:58,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1850366.6666666667, ans=0.1 2023-11-22 07:28:04,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.008e+01 8.714e+01 9.298e+01 1.223e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-22 07:28:04,460 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 07:28:24,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1850500.0, ans=0.125 2023-11-22 07:28:26,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1850500.0, ans=0.125 2023-11-22 07:28:29,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1850566.6666666667, ans=0.125 2023-11-22 07:28:40,591 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1050, loss[loss=0.06572, simple_loss=0.08187, pruned_loss=0.01416, audio_tagging_loss=0.01062, over 14925.00 frames. ], tot_loss[loss=0.07178, simple_loss=0.09431, pruned_loss=0.0153, audio_tagging_loss=0.00932, over 3034813.42 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:28:40,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1850633.3333333333, ans=0.0 2023-11-22 07:28:42,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1850633.3333333333, ans=0.0 2023-11-22 07:28:44,404 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277600 2023-11-22 07:29:19,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1850833.3333333333, ans=0.0 2023-11-22 07:29:20,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1850833.3333333333, ans=0.125 2023-11-22 07:29:36,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-22 07:29:46,038 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1100, loss[loss=0.08482, simple_loss=0.1147, pruned_loss=0.01924, audio_tagging_loss=0.008241, over 15254.00 frames. ], tot_loss[loss=0.07155, simple_loss=0.094, pruned_loss=0.01527, audio_tagging_loss=0.009282, over 3043556.47 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:29:48,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1850966.6666666667, ans=0.0 2023-11-22 07:29:49,649 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 07:29:49,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277650 2023-11-22 07:29:58,930 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:30:11,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.181e+01 8.784e+01 9.508e+01 1.137e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 07:30:40,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1851233.3333333333, ans=0.125 2023-11-22 07:30:50,559 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1150, loss[loss=0.06196, simple_loss=0.08526, pruned_loss=0.01016, audio_tagging_loss=0.009174, over 16090.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09406, pruned_loss=0.0153, audio_tagging_loss=0.009365, over 3044356.32 frames. ], batch size: 61, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:30:54,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277700 2023-11-22 07:30:56,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-22 07:30:58,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1851300.0, ans=0.125 2023-11-22 07:30:58,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1851300.0, ans=0.125 2023-11-22 07:31:01,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1851366.6666666667, ans=0.125 2023-11-22 07:31:06,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1851366.6666666667, ans=0.0 2023-11-22 07:31:54,985 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1200, loss[loss=0.09065, simple_loss=0.1231, pruned_loss=0.02246, audio_tagging_loss=0.006622, over 15921.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09442, pruned_loss=0.01539, audio_tagging_loss=0.009239, over 3044307.97 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:31:55,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1851633.3333333333, ans=0.0 2023-11-22 07:31:58,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277750 2023-11-22 07:32:03,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1851633.3333333333, ans=0.1 2023-11-22 07:32:23,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.243e+01 8.919e+01 9.746e+01 1.203e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 07:32:23,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1851766.6666666667, ans=0.1 2023-11-22 07:32:29,119 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:32:48,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1851900.0, ans=0.125 2023-11-22 07:33:00,925 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1250, loss[loss=0.04572, simple_loss=0.05741, pruned_loss=0.007807, audio_tagging_loss=0.009206, over 14863.00 frames. ], tot_loss[loss=0.07191, simple_loss=0.09449, pruned_loss=0.01541, audio_tagging_loss=0.009258, over 3045614.92 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:33:04,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277800 2023-11-22 07:33:13,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2023-11-22 07:33:37,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1852100.0, ans=0.2 2023-11-22 07:34:06,785 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1300, loss[loss=0.07983, simple_loss=0.1075, pruned_loss=0.01842, audio_tagging_loss=0.007678, over 16098.00 frames. ], tot_loss[loss=0.07151, simple_loss=0.09404, pruned_loss=0.01518, audio_tagging_loss=0.009306, over 3046094.85 frames. ], batch size: 61, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:34:10,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277850 2023-11-22 07:34:14,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1852300.0, ans=0.125 2023-11-22 07:34:29,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1852366.6666666667, ans=0.04949747468305833 2023-11-22 07:34:33,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.077e+01 8.549e+01 9.237e+01 1.175e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-22 07:34:45,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1852500.0, ans=0.2 2023-11-22 07:34:55,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1852500.0, ans=0.2 2023-11-22 07:34:56,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1852500.0, ans=0.0 2023-11-22 07:35:02,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1852566.6666666667, ans=0.125 2023-11-22 07:35:07,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1852566.6666666667, ans=0.1 2023-11-22 07:35:11,084 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1350, loss[loss=0.1032, simple_loss=0.1218, pruned_loss=0.03046, audio_tagging_loss=0.01184, over 15396.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09504, pruned_loss=0.01552, audio_tagging_loss=0.009322, over 3048950.98 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:35:14,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277900 2023-11-22 07:35:17,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1852633.3333333333, ans=0.125 2023-11-22 07:35:20,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-22 07:35:49,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1852833.3333333333, ans=0.1 2023-11-22 07:35:57,562 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 07:36:01,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-22 07:36:16,013 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1400, loss[loss=0.08205, simple_loss=0.1049, pruned_loss=0.02039, audio_tagging_loss=0.009232, over 15607.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09445, pruned_loss=0.0153, audio_tagging_loss=0.009396, over 3051584.11 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:36:19,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 277950 2023-11-22 07:36:25,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-22 07:36:30,464 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:36:35,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1853033.3333333333, ans=0.0 2023-11-22 07:36:36,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-11-22 07:36:42,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.253e+01 8.948e+01 9.652e+01 1.188e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-22 07:37:03,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1853166.6666666667, ans=0.125 2023-11-22 07:37:06,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1853233.3333333333, ans=0.125 2023-11-22 07:37:20,255 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1450, loss[loss=0.06331, simple_loss=0.08191, pruned_loss=0.01251, audio_tagging_loss=0.009837, over 15046.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.09421, pruned_loss=0.01524, audio_tagging_loss=0.009416, over 3049620.65 frames. ], batch size: 57, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:37:24,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278000 2023-11-22 07:37:42,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1853366.6666666667, ans=0.125 2023-11-22 07:37:58,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1853500.0, ans=0.0 2023-11-22 07:38:12,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1853566.6666666667, ans=0.125 2023-11-22 07:38:13,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-22 07:38:24,844 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1500, loss[loss=0.07049, simple_loss=0.08329, pruned_loss=0.01456, audio_tagging_loss=0.01428, over 16748.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.0955, pruned_loss=0.01565, audio_tagging_loss=0.009456, over 3048674.73 frames. ], batch size: 62, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:38:28,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278050 2023-11-22 07:38:52,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.167e+01 8.667e+01 9.503e+01 1.230e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 07:39:06,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1853833.3333333333, ans=0.0 2023-11-22 07:39:29,698 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1550, loss[loss=0.06096, simple_loss=0.08328, pruned_loss=0.01123, audio_tagging_loss=0.008091, over 15542.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09571, pruned_loss=0.01565, audio_tagging_loss=0.009458, over 3047577.11 frames. ], batch size: 59, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:39:34,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278100 2023-11-22 07:39:39,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1853966.6666666667, ans=0.1 2023-11-22 07:39:39,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-22 07:39:59,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1854100.0, ans=0.1 2023-11-22 07:40:19,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-22 07:40:26,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1854233.3333333333, ans=0.2 2023-11-22 07:40:34,403 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1600, loss[loss=0.06713, simple_loss=0.08264, pruned_loss=0.01362, audio_tagging_loss=0.0122, over 15712.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09458, pruned_loss=0.01545, audio_tagging_loss=0.009599, over 3050798.28 frames. ], batch size: 60, lr: 2.87e-03, grad_scale: 32.0 2023-11-22 07:40:36,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1854300.0, ans=0.125 2023-11-22 07:40:38,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278150 2023-11-22 07:40:43,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1854300.0, ans=0.07 2023-11-22 07:41:00,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1854433.3333333333, ans=0.125 2023-11-22 07:41:04,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.136e+01 8.791e+01 9.603e+01 1.214e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 07:41:12,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1854500.0, ans=0.125 2023-11-22 07:41:16,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-11-22 07:41:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1854566.6666666667, ans=15.0 2023-11-22 07:41:39,050 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1650, loss[loss=0.09779, simple_loss=0.1267, pruned_loss=0.0241, audio_tagging_loss=0.01033, over 15580.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09417, pruned_loss=0.0153, audio_tagging_loss=0.00969, over 3047268.01 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:41:42,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278200 2023-11-22 07:41:59,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-22 07:42:29,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2023-11-22 07:42:30,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1854900.0, ans=0.125 2023-11-22 07:42:43,487 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1700, loss[loss=0.05959, simple_loss=0.07985, pruned_loss=0.009998, audio_tagging_loss=0.009663, over 15408.00 frames. ], tot_loss[loss=0.07202, simple_loss=0.09418, pruned_loss=0.01524, audio_tagging_loss=0.009689, over 3046688.81 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 16.0 2023-11-22 07:42:46,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1854966.6666666667, ans=0.1 2023-11-22 07:42:47,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278250 2023-11-22 07:42:52,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-22 07:42:52,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1854966.6666666667, ans=0.04949747468305833 2023-11-22 07:43:13,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.170e+01 8.792e+01 9.443e+01 1.155e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 07:43:24,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1855166.6666666667, ans=0.035 2023-11-22 07:43:24,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1855166.6666666667, ans=0.0 2023-11-22 07:43:24,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1855166.6666666667, ans=0.125 2023-11-22 07:43:35,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 07:43:43,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1855233.3333333333, ans=0.125 2023-11-22 07:43:48,095 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1750, loss[loss=0.07897, simple_loss=0.104, pruned_loss=0.01641, audio_tagging_loss=0.01059, over 15465.00 frames. ], tot_loss[loss=0.07178, simple_loss=0.09415, pruned_loss=0.01516, audio_tagging_loss=0.00954, over 3053664.83 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:43:52,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278300 2023-11-22 07:43:56,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1855300.0, ans=0.125 2023-11-22 07:44:16,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1855433.3333333333, ans=0.2 2023-11-22 07:44:21,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1855433.3333333333, ans=0.5 2023-11-22 07:44:41,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1855566.6666666667, ans=0.05 2023-11-22 07:44:52,591 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1800, loss[loss=0.06309, simple_loss=0.09002, pruned_loss=0.008332, audio_tagging_loss=0.009743, over 13962.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.09402, pruned_loss=0.01507, audio_tagging_loss=0.009463, over 3043835.21 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:44:55,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1855633.3333333333, ans=0.1 2023-11-22 07:44:56,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278350 2023-11-22 07:45:12,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-22 07:45:13,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1855700.0, ans=10.0 2023-11-22 07:45:22,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.603e+01 7.991e+01 8.555e+01 9.443e+01 1.169e+02, threshold=1.711e+02, percent-clipped=0.0 2023-11-22 07:45:29,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-22 07:45:54,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-22 07:45:57,022 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1850, loss[loss=0.04557, simple_loss=0.05472, pruned_loss=0.009099, audio_tagging_loss=0.009112, over 14297.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09372, pruned_loss=0.01506, audio_tagging_loss=0.009411, over 3036860.77 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:46:00,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-11-22 07:46:01,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278400 2023-11-22 07:46:21,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1856033.3333333333, ans=0.035 2023-11-22 07:46:23,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1856100.0, ans=0.0 2023-11-22 07:46:30,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1856100.0, ans=0.0 2023-11-22 07:46:42,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1856166.6666666667, ans=0.125 2023-11-22 07:46:42,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-22 07:46:50,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1856233.3333333333, ans=0.2 2023-11-22 07:46:53,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-22 07:47:02,767 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1900, loss[loss=0.06894, simple_loss=0.08826, pruned_loss=0.01522, audio_tagging_loss=0.009589, over 16002.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09411, pruned_loss=0.01515, audio_tagging_loss=0.009312, over 3039344.80 frames. ], batch size: 61, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:47:06,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278450 2023-11-22 07:47:07,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1856300.0, ans=0.0 2023-11-22 07:47:10,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-22 07:47:23,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1856366.6666666667, ans=0.2 2023-11-22 07:47:32,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.689e+01 7.925e+01 8.678e+01 9.471e+01 1.078e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-22 07:47:33,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1856433.3333333333, ans=22.5 2023-11-22 07:47:48,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1856500.0, ans=0.04949747468305833 2023-11-22 07:47:48,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=1856500.0, ans=12.0 2023-11-22 07:48:00,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1856566.6666666667, ans=0.125 2023-11-22 07:48:02,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-22 07:48:07,445 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 1950, loss[loss=0.07707, simple_loss=0.1109, pruned_loss=0.01337, audio_tagging_loss=0.00827, over 15434.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09302, pruned_loss=0.01489, audio_tagging_loss=0.009406, over 3039275.94 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:48:07,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1856633.3333333333, ans=0.09899494936611666 2023-11-22 07:48:11,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278500 2023-11-22 07:48:19,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-22 07:48:36,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1856766.6666666667, ans=0.125 2023-11-22 07:48:50,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1856833.3333333333, ans=0.0 2023-11-22 07:48:51,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=22.5 2023-11-22 07:48:57,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=22.5 2023-11-22 07:49:01,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1856900.0, ans=0.125 2023-11-22 07:49:06,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1856900.0, ans=0.125 2023-11-22 07:49:12,017 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2000, loss[loss=0.06209, simple_loss=0.08512, pruned_loss=0.01208, audio_tagging_loss=0.007449, over 16299.00 frames. ], tot_loss[loss=0.07119, simple_loss=0.09322, pruned_loss=0.01512, audio_tagging_loss=0.009467, over 3032791.35 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:49:14,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1856966.6666666667, ans=0.1 2023-11-22 07:49:16,446 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278550 2023-11-22 07:49:34,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2023-11-22 07:49:41,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.370e+01 9.075e+01 9.910e+01 1.919e+02, threshold=1.815e+02, percent-clipped=1.0 2023-11-22 07:49:54,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1857166.6666666667, ans=0.0 2023-11-22 07:49:58,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1857166.6666666667, ans=0.125 2023-11-22 07:50:12,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1857233.3333333333, ans=0.125 2023-11-22 07:50:15,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-22 07:50:17,539 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2050, loss[loss=0.072, simple_loss=0.1037, pruned_loss=0.01323, audio_tagging_loss=0.006902, over 13723.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.0939, pruned_loss=0.0152, audio_tagging_loss=0.0093, over 3036425.32 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:50:21,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278600 2023-11-22 07:50:31,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-11-22 07:50:43,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2023-11-22 07:50:47,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1857433.3333333333, ans=0.125 2023-11-22 07:50:48,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1857433.3333333333, ans=0.0 2023-11-22 07:51:05,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1857500.0, ans=10.0 2023-11-22 07:51:10,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1857566.6666666667, ans=0.125 2023-11-22 07:51:23,214 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2100, loss[loss=0.06222, simple_loss=0.09446, pruned_loss=0.007288, audio_tagging_loss=0.007706, over 15136.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09415, pruned_loss=0.01517, audio_tagging_loss=0.009261, over 3044794.71 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:51:26,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278650 2023-11-22 07:51:29,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1857633.3333333333, ans=0.125 2023-11-22 07:51:34,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1857700.0, ans=0.125 2023-11-22 07:51:43,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-22 07:51:52,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.224e+01 8.868e+01 9.706e+01 1.360e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 07:51:59,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1857766.6666666667, ans=0.1 2023-11-22 07:52:26,011 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2150, loss[loss=0.07498, simple_loss=0.1025, pruned_loss=0.01529, audio_tagging_loss=0.008433, over 14447.00 frames. ], tot_loss[loss=0.07157, simple_loss=0.09417, pruned_loss=0.01521, audio_tagging_loss=0.009275, over 3039247.31 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:52:29,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278700 2023-11-22 07:52:49,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-22 07:52:50,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1858033.3333333333, ans=0.04949747468305833 2023-11-22 07:52:57,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1858100.0, ans=0.2 2023-11-22 07:53:03,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=22.5 2023-11-22 07:53:06,679 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 07:53:27,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1858233.3333333333, ans=0.035 2023-11-22 07:53:31,932 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2200, loss[loss=0.05166, simple_loss=0.07012, pruned_loss=0.006807, audio_tagging_loss=0.009792, over 15284.00 frames. ], tot_loss[loss=0.07132, simple_loss=0.09365, pruned_loss=0.01516, audio_tagging_loss=0.009329, over 3035946.59 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:53:35,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278750 2023-11-22 07:53:46,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1858366.6666666667, ans=0.125 2023-11-22 07:53:57,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-11-22 07:53:58,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1858433.3333333333, ans=0.125 2023-11-22 07:54:00,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.248e+01 8.861e+01 9.671e+01 1.420e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 07:54:02,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-11-22 07:54:03,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1858433.3333333333, ans=0.0 2023-11-22 07:54:21,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1858500.0, ans=0.125 2023-11-22 07:54:22,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1858566.6666666667, ans=0.035 2023-11-22 07:54:22,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1858566.6666666667, ans=0.0 2023-11-22 07:54:36,100 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2250, loss[loss=0.08408, simple_loss=0.1089, pruned_loss=0.01974, audio_tagging_loss=0.009914, over 15032.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09482, pruned_loss=0.01538, audio_tagging_loss=0.009257, over 3040988.78 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:54:39,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278800 2023-11-22 07:54:52,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1858700.0, ans=0.125 2023-11-22 07:55:05,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-22 07:55:39,794 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2300, loss[loss=0.08838, simple_loss=0.1144, pruned_loss=0.02137, audio_tagging_loss=0.009814, over 15602.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.0949, pruned_loss=0.01536, audio_tagging_loss=0.009277, over 3042997.52 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:55:43,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278850 2023-11-22 07:56:09,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.100e+01 8.737e+01 9.435e+01 1.101e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 07:56:31,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1859233.3333333333, ans=0.125 2023-11-22 07:56:37,681 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 07:56:40,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1859233.3333333333, ans=0.0 2023-11-22 07:56:43,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1859300.0, ans=0.125 2023-11-22 07:56:45,046 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2350, loss[loss=0.07109, simple_loss=0.07856, pruned_loss=0.01715, audio_tagging_loss=0.01466, over 14764.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09455, pruned_loss=0.01525, audio_tagging_loss=0.00937, over 3045745.15 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 07:56:49,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278900 2023-11-22 07:57:18,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1859433.3333333333, ans=0.0 2023-11-22 07:57:36,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1859566.6666666667, ans=0.125 2023-11-22 07:57:42,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-11-22 07:57:49,791 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2400, loss[loss=0.07156, simple_loss=0.09199, pruned_loss=0.01583, audio_tagging_loss=0.009735, over 15272.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09474, pruned_loss=0.0152, audio_tagging_loss=0.009401, over 3047756.54 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:57:52,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2023-11-22 07:57:53,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 278950 2023-11-22 07:58:19,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.282e+01 8.837e+01 9.594e+01 1.331e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 07:58:23,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-22 07:58:30,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-11-22 07:58:35,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1859833.3333333333, ans=0.2 2023-11-22 07:58:53,796 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2450, loss[loss=0.07082, simple_loss=0.09017, pruned_loss=0.01574, audio_tagging_loss=0.009993, over 13731.00 frames. ], tot_loss[loss=0.07221, simple_loss=0.095, pruned_loss=0.01528, audio_tagging_loss=0.009434, over 3040186.04 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 07:58:57,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279000 2023-11-22 07:59:19,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2023-11-22 07:59:25,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-11-22 07:59:28,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1860100.0, ans=0.1 2023-11-22 07:59:31,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1860100.0, ans=0.2 2023-11-22 07:59:32,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1860166.6666666667, ans=0.0 2023-11-22 07:59:58,128 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2500, loss[loss=0.07053, simple_loss=0.08848, pruned_loss=0.01774, audio_tagging_loss=0.008554, over 14356.00 frames. ], tot_loss[loss=0.07226, simple_loss=0.09504, pruned_loss=0.01527, audio_tagging_loss=0.009468, over 3033937.10 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:00:02,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279050 2023-11-22 08:00:29,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.264e+01 8.898e+01 9.687e+01 1.183e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-22 08:00:45,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-22 08:00:56,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1860566.6666666667, ans=0.0 2023-11-22 08:01:03,354 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2550, loss[loss=0.06208, simple_loss=0.07872, pruned_loss=0.01199, audio_tagging_loss=0.01074, over 14826.00 frames. ], tot_loss[loss=0.072, simple_loss=0.09481, pruned_loss=0.0152, audio_tagging_loss=0.009399, over 3038181.47 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:01:07,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279100 2023-11-22 08:01:12,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1860633.3333333333, ans=0.1 2023-11-22 08:01:19,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-22 08:01:21,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1860700.0, ans=0.125 2023-11-22 08:01:42,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-22 08:02:02,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1860900.0, ans=0.125 2023-11-22 08:02:09,026 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2600, loss[loss=0.07807, simple_loss=0.101, pruned_loss=0.02058, audio_tagging_loss=0.006983, over 15200.00 frames. ], tot_loss[loss=0.07169, simple_loss=0.09446, pruned_loss=0.01514, audio_tagging_loss=0.009321, over 3040152.48 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:02:12,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279150 2023-11-22 08:02:12,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1860966.6666666667, ans=0.125 2023-11-22 08:02:35,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1861100.0, ans=0.015 2023-11-22 08:02:39,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.247e+01 8.743e+01 9.488e+01 1.623e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 08:02:42,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-22 08:02:49,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=12.0 2023-11-22 08:02:54,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1861166.6666666667, ans=0.0 2023-11-22 08:03:09,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1861233.3333333333, ans=0.1 2023-11-22 08:03:12,765 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2650, loss[loss=0.08033, simple_loss=0.1182, pruned_loss=0.01432, audio_tagging_loss=0.006914, over 15600.00 frames. ], tot_loss[loss=0.07185, simple_loss=0.09467, pruned_loss=0.01519, audio_tagging_loss=0.009319, over 3036356.23 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:03:17,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279200 2023-11-22 08:03:17,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-22 08:03:41,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1861433.3333333333, ans=0.125 2023-11-22 08:03:42,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=1861433.3333333333, ans=15.0 2023-11-22 08:03:43,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-22 08:03:56,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-22 08:04:16,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1861566.6666666667, ans=0.125 2023-11-22 08:04:18,474 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2700, loss[loss=0.06297, simple_loss=0.08329, pruned_loss=0.0121, audio_tagging_loss=0.009226, over 15245.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09429, pruned_loss=0.01532, audio_tagging_loss=0.009254, over 3042530.64 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 08:04:18,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1861633.3333333333, ans=0.125 2023-11-22 08:04:22,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279250 2023-11-22 08:04:24,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1861633.3333333333, ans=0.125 2023-11-22 08:04:26,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2023-11-22 08:04:35,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1861700.0, ans=0.125 2023-11-22 08:04:50,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 7.998e+01 8.544e+01 9.286e+01 1.150e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-22 08:04:55,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1861766.6666666667, ans=0.125 2023-11-22 08:04:57,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1861833.3333333333, ans=0.125 2023-11-22 08:05:06,775 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 08:05:12,893 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 08:05:23,838 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2750, loss[loss=0.07893, simple_loss=0.1054, pruned_loss=0.0172, audio_tagging_loss=0.009025, over 14824.00 frames. ], tot_loss[loss=0.072, simple_loss=0.09501, pruned_loss=0.01538, audio_tagging_loss=0.009122, over 3045243.65 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 16.0 2023-11-22 08:05:27,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279300 2023-11-22 08:06:11,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1862166.6666666667, ans=0.0 2023-11-22 08:06:20,785 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:06:20,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1862233.3333333333, ans=0.125 2023-11-22 08:06:28,049 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2800, loss[loss=0.05946, simple_loss=0.08432, pruned_loss=0.009376, audio_tagging_loss=0.007918, over 14475.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.09494, pruned_loss=0.01544, audio_tagging_loss=0.009139, over 3040819.23 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:06:30,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=12.0 2023-11-22 08:06:32,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279350 2023-11-22 08:06:43,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1862366.6666666667, ans=0.1 2023-11-22 08:06:58,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1862433.3333333333, ans=0.125 2023-11-22 08:07:01,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 7.937e+01 8.597e+01 9.316e+01 1.176e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-22 08:07:25,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1862566.6666666667, ans=0.2 2023-11-22 08:07:33,286 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2850, loss[loss=0.06862, simple_loss=0.08375, pruned_loss=0.01424, audio_tagging_loss=0.01251, over 16746.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09458, pruned_loss=0.01529, audio_tagging_loss=0.009138, over 3042677.84 frames. ], batch size: 65, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:07:37,069 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279400 2023-11-22 08:07:41,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1862633.3333333333, ans=0.125 2023-11-22 08:08:11,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-22 08:08:26,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1862900.0, ans=0.125 2023-11-22 08:08:38,449 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2900, loss[loss=0.05966, simple_loss=0.07637, pruned_loss=0.009535, audio_tagging_loss=0.01194, over 14613.00 frames. ], tot_loss[loss=0.07201, simple_loss=0.0949, pruned_loss=0.01536, audio_tagging_loss=0.0092, over 3040684.01 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:08:41,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1862966.6666666667, ans=0.09899494936611666 2023-11-22 08:08:42,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279450 2023-11-22 08:08:44,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1862966.6666666667, ans=0.2 2023-11-22 08:08:56,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1863033.3333333333, ans=0.125 2023-11-22 08:09:03,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1863100.0, ans=0.125 2023-11-22 08:09:04,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1863100.0, ans=0.125 2023-11-22 08:09:11,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.213e+01 8.733e+01 9.343e+01 1.106e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 08:09:22,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2023-11-22 08:09:28,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=1863166.6666666667, ans=15.0 2023-11-22 08:09:37,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1863233.3333333333, ans=0.125 2023-11-22 08:09:43,399 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 2950, loss[loss=0.06881, simple_loss=0.0909, pruned_loss=0.01242, audio_tagging_loss=0.01094, over 16063.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09551, pruned_loss=0.0156, audio_tagging_loss=0.009234, over 3038499.66 frames. ], batch size: 61, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:09:47,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279500 2023-11-22 08:09:48,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1863300.0, ans=0.1 2023-11-22 08:09:57,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1863366.6666666667, ans=0.0 2023-11-22 08:09:57,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-22 08:10:14,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-22 08:10:17,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-22 08:10:21,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1863500.0, ans=0.0 2023-11-22 08:10:38,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1863566.6666666667, ans=0.125 2023-11-22 08:10:49,470 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3000, loss[loss=0.09938, simple_loss=0.1269, pruned_loss=0.0279, audio_tagging_loss=0.008049, over 15472.00 frames. ], tot_loss[loss=0.07252, simple_loss=0.09521, pruned_loss=0.01556, audio_tagging_loss=0.009359, over 3040960.22 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:10:49,471 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 08:11:18,592 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0720, 2.9604, 3.3561, 2.9311, 3.7704, 3.7570, 3.3067, 3.1815], device='cuda:1') 2023-11-22 08:11:29,444 INFO [train_asr.py:1253] (1/4) Epoch 24, validation: loss=0.0588, simple_loss=0.05168, pruned_loss=0.005124, audio_tagging_loss=0.02784, over 4681554.00 frames. 2023-11-22 08:11:29,445 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 08:11:33,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279550 2023-11-22 08:11:35,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-22 08:11:36,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1863633.3333333333, ans=0.2 2023-11-22 08:11:48,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-22 08:11:57,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1863766.6666666667, ans=0.0 2023-11-22 08:12:02,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.135e+01 8.700e+01 9.423e+01 1.185e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 08:12:12,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1863833.3333333333, ans=0.2 2023-11-22 08:12:34,535 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3050, loss[loss=0.08384, simple_loss=0.1112, pruned_loss=0.01879, audio_tagging_loss=0.009449, over 16496.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09496, pruned_loss=0.01553, audio_tagging_loss=0.009436, over 3045741.51 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:12:38,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279600 2023-11-22 08:13:14,099 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:13:23,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1864166.6666666667, ans=0.0 2023-11-22 08:13:30,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1864233.3333333333, ans=0.1 2023-11-22 08:13:37,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1864233.3333333333, ans=0.0 2023-11-22 08:13:41,242 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3100, loss[loss=0.07346, simple_loss=0.1003, pruned_loss=0.01454, audio_tagging_loss=0.008778, over 16245.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09489, pruned_loss=0.01539, audio_tagging_loss=0.009447, over 3048613.15 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:13:42,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1864300.0, ans=0.025 2023-11-22 08:13:45,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279650 2023-11-22 08:13:45,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1864300.0, ans=0.125 2023-11-22 08:13:48,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-11-22 08:13:57,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1864366.6666666667, ans=0.07 2023-11-22 08:14:06,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1864433.3333333333, ans=0.125 2023-11-22 08:14:10,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-22 08:14:13,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.146e+01 8.664e+01 9.339e+01 1.098e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 08:14:33,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1864566.6666666667, ans=0.0 2023-11-22 08:14:36,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1864566.6666666667, ans=0.2 2023-11-22 08:14:46,744 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3150, loss[loss=0.06158, simple_loss=0.07867, pruned_loss=0.01275, audio_tagging_loss=0.009489, over 14879.00 frames. ], tot_loss[loss=0.07281, simple_loss=0.09556, pruned_loss=0.01566, audio_tagging_loss=0.009377, over 3045617.66 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:14:50,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279700 2023-11-22 08:14:55,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-22 08:14:59,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-22 08:15:10,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-22 08:15:19,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1864766.6666666667, ans=0.0 2023-11-22 08:15:19,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1864766.6666666667, ans=0.125 2023-11-22 08:15:28,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1864833.3333333333, ans=0.125 2023-11-22 08:15:36,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1864833.3333333333, ans=0.125 2023-11-22 08:15:42,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1864900.0, ans=0.125 2023-11-22 08:15:50,980 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3200, loss[loss=0.09475, simple_loss=0.1384, pruned_loss=0.02006, audio_tagging_loss=0.005475, over 15985.00 frames. ], tot_loss[loss=0.07271, simple_loss=0.09539, pruned_loss=0.01558, audio_tagging_loss=0.009427, over 3047160.26 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:15:55,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279750 2023-11-22 08:16:01,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-11-22 08:16:06,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1865033.3333333333, ans=0.0 2023-11-22 08:16:21,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1865100.0, ans=0.0 2023-11-22 08:16:24,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.310e+01 8.993e+01 9.711e+01 1.257e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 08:16:24,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1865100.0, ans=0.125 2023-11-22 08:16:31,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1865166.6666666667, ans=0.0 2023-11-22 08:16:56,986 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3250, loss[loss=0.06415, simple_loss=0.08267, pruned_loss=0.01335, audio_tagging_loss=0.009469, over 14788.00 frames. ], tot_loss[loss=0.07263, simple_loss=0.09514, pruned_loss=0.01556, audio_tagging_loss=0.009505, over 3050289.16 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:17:00,788 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279800 2023-11-22 08:17:20,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1865366.6666666667, ans=0.0 2023-11-22 08:17:52,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1865566.6666666667, ans=0.125 2023-11-22 08:17:54,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1865566.6666666667, ans=0.0 2023-11-22 08:18:02,035 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3300, loss[loss=0.07545, simple_loss=0.1042, pruned_loss=0.01486, audio_tagging_loss=0.008481, over 15146.00 frames. ], tot_loss[loss=0.07276, simple_loss=0.09521, pruned_loss=0.01558, audio_tagging_loss=0.009575, over 3043814.31 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:18:06,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279850 2023-11-22 08:18:14,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1865700.0, ans=0.0 2023-11-22 08:18:14,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1865700.0, ans=0.125 2023-11-22 08:18:16,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1865700.0, ans=0.125 2023-11-22 08:18:34,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.287e+01 8.648e+01 9.394e+01 1.159e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-22 08:18:45,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-22 08:18:53,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1865900.0, ans=0.125 2023-11-22 08:18:54,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1865900.0, ans=0.125 2023-11-22 08:18:55,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1865900.0, ans=0.125 2023-11-22 08:19:03,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1865900.0, ans=0.125 2023-11-22 08:19:03,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1865900.0, ans=0.125 2023-11-22 08:19:06,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-22 08:19:06,565 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3350, loss[loss=0.06254, simple_loss=0.08011, pruned_loss=0.01108, audio_tagging_loss=0.01141, over 15009.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09561, pruned_loss=0.01568, audio_tagging_loss=0.009515, over 3043609.10 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:19:08,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-22 08:19:10,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279900 2023-11-22 08:19:36,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-22 08:19:49,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-22 08:20:02,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1866233.3333333333, ans=0.0 2023-11-22 08:20:11,651 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3400, loss[loss=0.06741, simple_loss=0.08841, pruned_loss=0.01366, audio_tagging_loss=0.009541, over 14421.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.09521, pruned_loss=0.01568, audio_tagging_loss=0.009437, over 3047091.79 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:20:15,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 279950 2023-11-22 08:20:21,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1866300.0, ans=0.125 2023-11-22 08:20:28,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-22 08:20:34,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1866366.6666666667, ans=0.09899494936611666 2023-11-22 08:20:42,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.350e+01 8.855e+01 9.450e+01 1.235e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 08:20:45,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1866433.3333333333, ans=0.125 2023-11-22 08:21:15,128 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3450, loss[loss=0.08509, simple_loss=0.1092, pruned_loss=0.02041, audio_tagging_loss=0.01006, over 16586.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09561, pruned_loss=0.01567, audio_tagging_loss=0.009359, over 3052815.09 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:21:16,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1866633.3333333333, ans=0.125 2023-11-22 08:21:17,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1866633.3333333333, ans=0.0 2023-11-22 08:21:19,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280000 2023-11-22 08:21:19,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1866633.3333333333, ans=0.2 2023-11-22 08:22:12,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1866900.0, ans=0.0 2023-11-22 08:22:13,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1866900.0, ans=0.0 2023-11-22 08:22:22,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1866966.6666666667, ans=0.0 2023-11-22 08:22:23,271 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3500, loss[loss=0.07812, simple_loss=0.1043, pruned_loss=0.01908, audio_tagging_loss=0.006902, over 15330.00 frames. ], tot_loss[loss=0.07275, simple_loss=0.09545, pruned_loss=0.01574, audio_tagging_loss=0.009281, over 3053203.29 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:22:26,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1866966.6666666667, ans=0.1 2023-11-22 08:22:26,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280050 2023-11-22 08:22:28,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1866966.6666666667, ans=0.2 2023-11-22 08:22:52,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-11-22 08:22:55,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-22 08:22:56,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.244e+01 8.926e+01 9.794e+01 1.238e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 08:22:56,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1867100.0, ans=0.1 2023-11-22 08:22:57,660 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:23:29,248 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3550, loss[loss=0.06442, simple_loss=0.09058, pruned_loss=0.01033, audio_tagging_loss=0.008799, over 15031.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.09475, pruned_loss=0.01545, audio_tagging_loss=0.009154, over 3055579.87 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:23:30,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1867300.0, ans=0.125 2023-11-22 08:23:33,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280100 2023-11-22 08:23:46,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1867366.6666666667, ans=0.0 2023-11-22 08:23:48,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-22 08:24:34,297 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3600, loss[loss=0.05998, simple_loss=0.0846, pruned_loss=0.01182, audio_tagging_loss=0.005859, over 15038.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.09491, pruned_loss=0.01536, audio_tagging_loss=0.00916, over 3060548.43 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:24:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1867633.3333333333, ans=0.125 2023-11-22 08:24:38,046 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280150 2023-11-22 08:24:57,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1867700.0, ans=0.0 2023-11-22 08:24:58,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1867766.6666666667, ans=0.1 2023-11-22 08:24:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1867766.6666666667, ans=0.95 2023-11-22 08:25:07,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 7.927e+01 8.588e+01 9.278e+01 1.263e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-22 08:25:39,376 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3650, loss[loss=0.04077, simple_loss=0.04858, pruned_loss=0.006672, audio_tagging_loss=0.009811, over 16439.00 frames. ], tot_loss[loss=0.07186, simple_loss=0.09506, pruned_loss=0.01529, audio_tagging_loss=0.009043, over 3058914.54 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 32.0 2023-11-22 08:25:43,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280200 2023-11-22 08:26:03,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1868033.3333333333, ans=0.125 2023-11-22 08:26:11,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1868100.0, ans=0.2 2023-11-22 08:26:22,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2023-11-22 08:26:23,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1868166.6666666667, ans=0.125 2023-11-22 08:26:45,332 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3700, loss[loss=0.05958, simple_loss=0.07162, pruned_loss=0.01259, audio_tagging_loss=0.01118, over 14191.00 frames. ], tot_loss[loss=0.07199, simple_loss=0.09509, pruned_loss=0.01532, audio_tagging_loss=0.009125, over 3060013.67 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:26:49,232 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280250 2023-11-22 08:26:49,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1868300.0, ans=0.025 2023-11-22 08:27:01,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1868366.6666666667, ans=0.125 2023-11-22 08:27:18,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-11-22 08:27:18,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.613e+01 8.045e+01 8.787e+01 9.464e+01 1.178e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 08:27:20,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1868433.3333333333, ans=0.125 2023-11-22 08:27:21,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-22 08:27:26,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1868500.0, ans=0.0 2023-11-22 08:27:33,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1868500.0, ans=0.125 2023-11-22 08:27:36,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1868566.6666666667, ans=0.125 2023-11-22 08:27:42,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1868566.6666666667, ans=6.0 2023-11-22 08:27:50,866 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3750, loss[loss=0.06599, simple_loss=0.08983, pruned_loss=0.01218, audio_tagging_loss=0.008898, over 16358.00 frames. ], tot_loss[loss=0.07168, simple_loss=0.09428, pruned_loss=0.01533, audio_tagging_loss=0.009206, over 3061381.37 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:27:54,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280300 2023-11-22 08:28:00,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1868633.3333333333, ans=0.2 2023-11-22 08:28:21,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1868766.6666666667, ans=0.125 2023-11-22 08:28:26,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1868766.6666666667, ans=0.0 2023-11-22 08:28:29,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1868833.3333333333, ans=0.125 2023-11-22 08:28:34,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1868833.3333333333, ans=0.125 2023-11-22 08:28:35,743 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:28:55,614 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3800, loss[loss=0.07615, simple_loss=0.09394, pruned_loss=0.01923, audio_tagging_loss=0.009949, over 14712.00 frames. ], tot_loss[loss=0.07234, simple_loss=0.09513, pruned_loss=0.01549, audio_tagging_loss=0.00928, over 3054803.28 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:28:56,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1868966.6666666667, ans=0.0 2023-11-22 08:28:59,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280350 2023-11-22 08:29:26,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1869100.0, ans=0.0 2023-11-22 08:29:28,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.943e+01 8.492e+01 9.128e+01 1.004e+02 1.242e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-22 08:29:40,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2023-11-22 08:29:41,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1869166.6666666667, ans=0.2 2023-11-22 08:29:46,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-22 08:29:59,891 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3850, loss[loss=0.06361, simple_loss=0.09064, pruned_loss=0.01024, audio_tagging_loss=0.008055, over 15129.00 frames. ], tot_loss[loss=0.07182, simple_loss=0.09461, pruned_loss=0.01523, audio_tagging_loss=0.009283, over 3051376.50 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:30:04,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280400 2023-11-22 08:30:04,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-22 08:30:30,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1869433.3333333333, ans=0.125 2023-11-22 08:30:48,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1869500.0, ans=0.125 2023-11-22 08:31:04,439 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3900, loss[loss=0.07766, simple_loss=0.0931, pruned_loss=0.01811, audio_tagging_loss=0.013, over 13678.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.09404, pruned_loss=0.01508, audio_tagging_loss=0.009372, over 3051331.00 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:31:08,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280450 2023-11-22 08:31:35,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1869766.6666666667, ans=0.125 2023-11-22 08:31:35,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1869766.6666666667, ans=0.125 2023-11-22 08:31:38,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.174e+01 8.873e+01 9.506e+01 1.121e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 08:31:40,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1869766.6666666667, ans=0.2 2023-11-22 08:31:44,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1869833.3333333333, ans=0.2 2023-11-22 08:31:49,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1869833.3333333333, ans=0.125 2023-11-22 08:31:58,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1869900.0, ans=0.2 2023-11-22 08:32:03,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-22 08:32:10,100 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 3950, loss[loss=0.08076, simple_loss=0.1051, pruned_loss=0.01869, audio_tagging_loss=0.009507, over 15065.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.09385, pruned_loss=0.0151, audio_tagging_loss=0.009518, over 3050425.06 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:32:13,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280500 2023-11-22 08:32:42,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1870100.0, ans=0.2 2023-11-22 08:32:49,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1870166.6666666667, ans=0.2 2023-11-22 08:33:01,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1870233.3333333333, ans=0.0 2023-11-22 08:33:11,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1870233.3333333333, ans=0.0 2023-11-22 08:33:13,410 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4000, loss[loss=0.06705, simple_loss=0.08699, pruned_loss=0.01486, audio_tagging_loss=0.008699, over 15180.00 frames. ], tot_loss[loss=0.0726, simple_loss=0.0953, pruned_loss=0.01546, audio_tagging_loss=0.009494, over 3055820.42 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:33:17,817 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280550 2023-11-22 08:33:38,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1870433.3333333333, ans=0.0 2023-11-22 08:33:45,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-22 08:33:48,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.016e+01 8.419e+01 9.002e+01 9.753e+01 1.228e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-22 08:33:55,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-22 08:33:57,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1870500.0, ans=0.0 2023-11-22 08:33:59,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1870500.0, ans=0.125 2023-11-22 08:34:16,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=1870566.6666666667, ans=15.0 2023-11-22 08:34:18,115 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4050, loss[loss=0.06998, simple_loss=0.0948, pruned_loss=0.01415, audio_tagging_loss=0.008436, over 15343.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.09549, pruned_loss=0.01562, audio_tagging_loss=0.009597, over 3057645.55 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:34:21,965 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:34:21,994 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280600 2023-11-22 08:34:26,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1870633.3333333333, ans=0.1 2023-11-22 08:34:28,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-22 08:34:44,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1870766.6666666667, ans=0.0 2023-11-22 08:35:05,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1870833.3333333333, ans=0.125 2023-11-22 08:35:10,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1870900.0, ans=0.2 2023-11-22 08:35:22,993 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4100, loss[loss=0.04357, simple_loss=0.05124, pruned_loss=0.00745, audio_tagging_loss=0.0105, over 15974.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.0951, pruned_loss=0.01546, audio_tagging_loss=0.009636, over 3064424.18 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:35:27,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280650 2023-11-22 08:35:27,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-22 08:35:53,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1871100.0, ans=0.0 2023-11-22 08:35:56,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1871100.0, ans=0.125 2023-11-22 08:35:58,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.230e+01 8.854e+01 9.579e+01 2.804e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-22 08:36:00,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1871100.0, ans=6.0 2023-11-22 08:36:01,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-22 08:36:03,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2023-11-22 08:36:04,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1871166.6666666667, ans=0.1 2023-11-22 08:36:27,872 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4150, loss[loss=0.06446, simple_loss=0.08283, pruned_loss=0.01347, audio_tagging_loss=0.009577, over 13918.00 frames. ], tot_loss[loss=0.07216, simple_loss=0.09466, pruned_loss=0.01532, audio_tagging_loss=0.009508, over 3071335.44 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:36:31,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280700 2023-11-22 08:37:04,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1871433.3333333333, ans=0.0 2023-11-22 08:37:10,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1871500.0, ans=0.125 2023-11-22 08:37:15,472 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:37:24,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1871566.6666666667, ans=0.125 2023-11-22 08:37:32,643 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4200, loss[loss=0.07366, simple_loss=0.09478, pruned_loss=0.01729, audio_tagging_loss=0.008984, over 14848.00 frames. ], tot_loss[loss=0.07127, simple_loss=0.0936, pruned_loss=0.01509, audio_tagging_loss=0.009382, over 3064506.05 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:37:33,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-22 08:37:36,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280750 2023-11-22 08:37:39,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1871633.3333333333, ans=0.5 2023-11-22 08:37:44,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=22.5 2023-11-22 08:38:07,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.240e+01 8.912e+01 9.752e+01 1.165e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 08:38:37,285 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4250, loss[loss=0.05091, simple_loss=0.06458, pruned_loss=0.01038, audio_tagging_loss=0.008238, over 15362.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09304, pruned_loss=0.01492, audio_tagging_loss=0.009319, over 3061659.63 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:38:41,824 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280800 2023-11-22 08:38:44,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1871966.6666666667, ans=0.125 2023-11-22 08:38:55,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1872033.3333333333, ans=0.125 2023-11-22 08:39:10,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1872100.0, ans=0.2 2023-11-22 08:39:43,319 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4300, loss[loss=0.0888, simple_loss=0.1264, pruned_loss=0.01856, audio_tagging_loss=0.007023, over 15627.00 frames. ], tot_loss[loss=0.07164, simple_loss=0.09423, pruned_loss=0.01526, audio_tagging_loss=0.00926, over 3054409.78 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:39:47,927 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280850 2023-11-22 08:39:47,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1872300.0, ans=0.125 2023-11-22 08:40:18,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.496e+01 9.037e+01 9.736e+01 1.212e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-22 08:40:23,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1872500.0, ans=0.125 2023-11-22 08:40:30,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1872500.0, ans=0.2 2023-11-22 08:40:34,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1872566.6666666667, ans=0.125 2023-11-22 08:40:37,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-22 08:40:40,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 08:40:49,244 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4350, loss[loss=0.08269, simple_loss=0.1039, pruned_loss=0.02253, audio_tagging_loss=0.00823, over 15581.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09435, pruned_loss=0.01534, audio_tagging_loss=0.009226, over 3050002.82 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:40:53,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280900 2023-11-22 08:41:07,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1872700.0, ans=0.2 2023-11-22 08:41:11,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2023-11-22 08:41:53,076 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4400, loss[loss=0.06946, simple_loss=0.09547, pruned_loss=0.01406, audio_tagging_loss=0.007661, over 16020.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09413, pruned_loss=0.01536, audio_tagging_loss=0.009245, over 3052371.36 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:41:56,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 280950 2023-11-22 08:42:22,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1873100.0, ans=0.0 2023-11-22 08:42:29,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 8.342e+01 8.985e+01 9.628e+01 1.162e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 08:42:31,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1873166.6666666667, ans=0.0 2023-11-22 08:42:42,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1873166.6666666667, ans=0.0 2023-11-22 08:42:58,192 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4450, loss[loss=0.06749, simple_loss=0.08267, pruned_loss=0.0171, audio_tagging_loss=0.009052, over 13842.00 frames. ], tot_loss[loss=0.07196, simple_loss=0.09427, pruned_loss=0.01564, audio_tagging_loss=0.009185, over 3046839.48 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:43:01,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281000 2023-11-22 08:43:15,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1873366.6666666667, ans=0.1 2023-11-22 08:43:31,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1873433.3333333333, ans=0.1 2023-11-22 08:43:49,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1873566.6666666667, ans=0.0 2023-11-22 08:43:55,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1873566.6666666667, ans=0.2 2023-11-22 08:44:03,783 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4500, loss[loss=0.06574, simple_loss=0.07193, pruned_loss=0.02051, audio_tagging_loss=0.009269, over 13667.00 frames. ], tot_loss[loss=0.07243, simple_loss=0.09525, pruned_loss=0.01564, audio_tagging_loss=0.009157, over 3050897.14 frames. ], batch size: 52, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:44:07,695 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281050 2023-11-22 08:44:37,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.057e+01 8.748e+01 9.801e+01 1.166e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 08:45:01,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1873900.0, ans=0.125 2023-11-22 08:45:07,100 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4550, loss[loss=0.05936, simple_loss=0.08152, pruned_loss=0.0109, audio_tagging_loss=0.007703, over 14315.00 frames. ], tot_loss[loss=0.07149, simple_loss=0.09396, pruned_loss=0.01529, audio_tagging_loss=0.009213, over 3047388.20 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:45:10,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281100 2023-11-22 08:45:11,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 08:45:13,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1873966.6666666667, ans=0.125 2023-11-22 08:45:24,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1874033.3333333333, ans=0.125 2023-11-22 08:45:29,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1874033.3333333333, ans=0.125 2023-11-22 08:45:50,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1874166.6666666667, ans=0.125 2023-11-22 08:45:50,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1874166.6666666667, ans=0.2 2023-11-22 08:45:56,305 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 08:46:11,519 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4600, loss[loss=0.07064, simple_loss=0.09396, pruned_loss=0.01273, audio_tagging_loss=0.01094, over 13973.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09397, pruned_loss=0.01523, audio_tagging_loss=0.009239, over 3052526.13 frames. ], batch size: 55, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:46:15,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281150 2023-11-22 08:46:23,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1874366.6666666667, ans=0.125 2023-11-22 08:46:43,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1874433.3333333333, ans=0.0 2023-11-22 08:46:46,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 7.994e+01 8.789e+01 9.560e+01 1.302e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 08:47:03,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1874566.6666666667, ans=0.5 2023-11-22 08:47:06,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2023-11-22 08:47:16,074 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4650, loss[loss=0.04587, simple_loss=0.05806, pruned_loss=0.007186, audio_tagging_loss=0.009649, over 16783.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.0941, pruned_loss=0.01523, audio_tagging_loss=0.009318, over 3046800.55 frames. ], batch size: 67, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:47:20,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281200 2023-11-22 08:47:33,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1874700.0, ans=0.125 2023-11-22 08:47:38,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874700.0, ans=0.1 2023-11-22 08:47:40,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1874700.0, ans=0.1 2023-11-22 08:48:04,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1874833.3333333333, ans=0.0 2023-11-22 08:48:15,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1874900.0, ans=0.125 2023-11-22 08:48:21,835 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4700, loss[loss=0.06382, simple_loss=0.07928, pruned_loss=0.01484, audio_tagging_loss=0.009347, over 15381.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09412, pruned_loss=0.01527, audio_tagging_loss=0.009412, over 3052881.20 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:48:22,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1874966.6666666667, ans=0.125 2023-11-22 08:48:25,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281250 2023-11-22 08:48:33,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1875033.3333333333, ans=0.1 2023-11-22 08:48:36,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.78 vs. limit=10.0 2023-11-22 08:48:44,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1875033.3333333333, ans=0.125 2023-11-22 08:48:45,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1875100.0, ans=0.125 2023-11-22 08:48:57,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.881e+01 8.141e+01 8.777e+01 9.599e+01 1.409e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-22 08:49:15,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1875233.3333333333, ans=0.2 2023-11-22 08:49:25,813 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4750, loss[loss=0.08635, simple_loss=0.1175, pruned_loss=0.01992, audio_tagging_loss=0.007688, over 14677.00 frames. ], tot_loss[loss=0.0718, simple_loss=0.09425, pruned_loss=0.01522, audio_tagging_loss=0.00945, over 3044242.55 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:49:29,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281300 2023-11-22 08:49:40,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=22.5 2023-11-22 08:49:51,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1875433.3333333333, ans=0.125 2023-11-22 08:49:59,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1875433.3333333333, ans=0.1 2023-11-22 08:50:13,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1875500.0, ans=0.125 2023-11-22 08:50:23,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1875566.6666666667, ans=0.1 2023-11-22 08:50:29,583 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4800, loss[loss=0.08435, simple_loss=0.1148, pruned_loss=0.01805, audio_tagging_loss=0.008905, over 16133.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.09328, pruned_loss=0.01513, audio_tagging_loss=0.009639, over 3041352.60 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:50:34,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281350 2023-11-22 08:51:02,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1875766.6666666667, ans=0.1 2023-11-22 08:51:06,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1875766.6666666667, ans=0.5 2023-11-22 08:51:07,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.056e+01 8.734e+01 9.359e+01 1.284e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 08:51:10,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1875833.3333333333, ans=0.2 2023-11-22 08:51:34,877 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4850, loss[loss=0.07015, simple_loss=0.09695, pruned_loss=0.01293, audio_tagging_loss=0.008744, over 14884.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09312, pruned_loss=0.0149, audio_tagging_loss=0.009751, over 3035834.82 frames. ], batch size: 55, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:51:38,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281400 2023-11-22 08:51:42,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1875966.6666666667, ans=0.125 2023-11-22 08:51:47,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1876033.3333333333, ans=0.125 2023-11-22 08:51:52,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1876033.3333333333, ans=0.2 2023-11-22 08:51:52,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1876033.3333333333, ans=0.125 2023-11-22 08:51:53,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1876033.3333333333, ans=0.125 2023-11-22 08:52:01,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1876100.0, ans=0.2 2023-11-22 08:52:04,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1876100.0, ans=0.0 2023-11-22 08:52:07,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=15.0 2023-11-22 08:52:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1876166.6666666667, ans=0.125 2023-11-22 08:52:23,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1876166.6666666667, ans=0.1 2023-11-22 08:52:39,566 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4900, loss[loss=0.06191, simple_loss=0.08675, pruned_loss=0.01275, audio_tagging_loss=0.005791, over 14619.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09303, pruned_loss=0.0149, audio_tagging_loss=0.009776, over 3029150.11 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:52:43,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281450 2023-11-22 08:52:43,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1876300.0, ans=0.125 2023-11-22 08:52:47,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=10.0 2023-11-22 08:52:48,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1876300.0, ans=0.125 2023-11-22 08:53:09,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-22 08:53:16,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.163e+01 8.764e+01 9.409e+01 1.191e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 08:53:22,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-22 08:53:43,085 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 4950, loss[loss=0.06072, simple_loss=0.08428, pruned_loss=0.009686, audio_tagging_loss=0.008899, over 15830.00 frames. ], tot_loss[loss=0.07136, simple_loss=0.09346, pruned_loss=0.01505, audio_tagging_loss=0.009578, over 3036086.04 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:53:47,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281500 2023-11-22 08:53:53,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1876633.3333333333, ans=0.125 2023-11-22 08:54:46,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-22 08:54:48,579 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5000, loss[loss=0.08226, simple_loss=0.1159, pruned_loss=0.01512, audio_tagging_loss=0.009172, over 16528.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.09402, pruned_loss=0.01532, audio_tagging_loss=0.009405, over 3036221.50 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:54:50,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1876966.6666666667, ans=0.0 2023-11-22 08:54:52,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281550 2023-11-22 08:55:07,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1877033.3333333333, ans=0.1 2023-11-22 08:55:25,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.261e+01 8.670e+01 9.360e+01 1.108e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-22 08:55:27,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1877166.6666666667, ans=0.1 2023-11-22 08:55:52,730 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5050, loss[loss=0.07065, simple_loss=0.1019, pruned_loss=0.01304, audio_tagging_loss=0.006672, over 15028.00 frames. ], tot_loss[loss=0.07088, simple_loss=0.09304, pruned_loss=0.01501, audio_tagging_loss=0.009349, over 3036882.28 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:55:52,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1877300.0, ans=0.04949747468305833 2023-11-22 08:55:57,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281600 2023-11-22 08:55:57,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-22 08:56:07,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1877366.6666666667, ans=0.1 2023-11-22 08:56:14,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1877366.6666666667, ans=0.0 2023-11-22 08:56:21,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1877433.3333333333, ans=0.125 2023-11-22 08:56:39,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1877500.0, ans=0.0 2023-11-22 08:56:57,105 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5100, loss[loss=0.05836, simple_loss=0.06643, pruned_loss=0.01453, audio_tagging_loss=0.01062, over 15562.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09179, pruned_loss=0.01476, audio_tagging_loss=0.009338, over 3039511.42 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:56:58,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2023-11-22 08:57:00,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281650 2023-11-22 08:57:10,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1877700.0, ans=0.125 2023-11-22 08:57:12,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1877700.0, ans=0.1 2023-11-22 08:57:21,807 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 08:57:27,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1877766.6666666667, ans=0.1 2023-11-22 08:57:34,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.920e+01 7.966e+01 8.782e+01 9.481e+01 2.193e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-22 08:57:35,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-22 08:57:46,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1877833.3333333333, ans=0.125 2023-11-22 08:58:01,525 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5150, loss[loss=0.05966, simple_loss=0.08229, pruned_loss=0.009105, audio_tagging_loss=0.009412, over 15717.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09142, pruned_loss=0.01458, audio_tagging_loss=0.009331, over 3035980.20 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 08:58:04,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1877966.6666666667, ans=0.025 2023-11-22 08:58:05,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281700 2023-11-22 08:58:31,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1878100.0, ans=0.125 2023-11-22 08:58:32,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1878100.0, ans=0.0 2023-11-22 08:58:35,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1878100.0, ans=0.125 2023-11-22 08:58:44,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1878166.6666666667, ans=15.0 2023-11-22 08:58:55,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1878233.3333333333, ans=0.125 2023-11-22 08:59:04,864 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5200, loss[loss=0.06915, simple_loss=0.08722, pruned_loss=0.01432, audio_tagging_loss=0.01122, over 15512.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09229, pruned_loss=0.01481, audio_tagging_loss=0.009385, over 3036779.93 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 08:59:09,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281750 2023-11-22 08:59:30,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1878433.3333333333, ans=0.0 2023-11-22 08:59:35,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1878433.3333333333, ans=0.0 2023-11-22 08:59:42,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.104e+01 8.582e+01 9.336e+01 1.183e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-22 08:59:58,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1878566.6666666667, ans=0.125 2023-11-22 09:00:00,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1878566.6666666667, ans=0.2 2023-11-22 09:00:09,541 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5250, loss[loss=0.07277, simple_loss=0.0912, pruned_loss=0.01707, audio_tagging_loss=0.0101, over 15022.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.09205, pruned_loss=0.01481, audio_tagging_loss=0.009243, over 3032873.45 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 09:00:13,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281800 2023-11-22 09:00:21,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1878700.0, ans=0.125 2023-11-22 09:00:30,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1878700.0, ans=0.125 2023-11-22 09:00:48,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1878833.3333333333, ans=0.0 2023-11-22 09:00:51,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1878833.3333333333, ans=0.2 2023-11-22 09:01:10,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-22 09:01:14,900 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5300, loss[loss=0.08209, simple_loss=0.1136, pruned_loss=0.01784, audio_tagging_loss=0.007468, over 15476.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.09376, pruned_loss=0.0151, audio_tagging_loss=0.009124, over 3034607.07 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 09:01:18,621 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281850 2023-11-22 09:01:19,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-22 09:01:45,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1879100.0, ans=0.0 2023-11-22 09:01:53,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.133e+01 8.772e+01 9.528e+01 1.202e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 09:01:55,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1879166.6666666667, ans=0.125 2023-11-22 09:02:00,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1879166.6666666667, ans=0.1 2023-11-22 09:02:12,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1879233.3333333333, ans=0.1 2023-11-22 09:02:16,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1879233.3333333333, ans=0.0 2023-11-22 09:02:19,277 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5350, loss[loss=0.06332, simple_loss=0.08418, pruned_loss=0.01313, audio_tagging_loss=0.008104, over 14980.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.0927, pruned_loss=0.01501, audio_tagging_loss=0.009242, over 3035021.99 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 09:02:23,090 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281900 2023-11-22 09:03:04,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1879500.0, ans=0.1 2023-11-22 09:03:23,602 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5400, loss[loss=0.05925, simple_loss=0.08162, pruned_loss=0.01166, audio_tagging_loss=0.006781, over 14576.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09306, pruned_loss=0.01499, audio_tagging_loss=0.009216, over 3036951.94 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 09:03:27,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 281950 2023-11-22 09:03:30,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-11-22 09:04:02,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.289e+01 8.769e+01 9.344e+01 1.157e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 09:04:16,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1879900.0, ans=0.125 2023-11-22 09:04:29,469 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5450, loss[loss=0.09331, simple_loss=0.1222, pruned_loss=0.02498, audio_tagging_loss=0.007256, over 16106.00 frames. ], tot_loss[loss=0.07168, simple_loss=0.09417, pruned_loss=0.01527, audio_tagging_loss=0.009324, over 3041708.53 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 09:04:33,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282000 2023-11-22 09:04:44,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1880033.3333333333, ans=0.1 2023-11-22 09:04:58,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1880100.0, ans=0.125 2023-11-22 09:05:02,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1880100.0, ans=15.0 2023-11-22 09:05:10,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1880166.6666666667, ans=0.1 2023-11-22 09:05:12,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1880166.6666666667, ans=0.1 2023-11-22 09:05:24,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1880233.3333333333, ans=0.125 2023-11-22 09:05:25,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-22 09:05:33,760 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5500, loss[loss=0.07976, simple_loss=0.1034, pruned_loss=0.01852, audio_tagging_loss=0.009547, over 15449.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.0936, pruned_loss=0.01519, audio_tagging_loss=0.009431, over 3044594.44 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 09:05:36,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1880300.0, ans=0.125 2023-11-22 09:05:37,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282050 2023-11-22 09:05:48,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1880366.6666666667, ans=0.125 2023-11-22 09:05:55,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1880366.6666666667, ans=0.125 2023-11-22 09:06:12,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.257e+01 8.828e+01 9.557e+01 1.200e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 09:06:38,162 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5550, loss[loss=0.07166, simple_loss=0.09177, pruned_loss=0.01638, audio_tagging_loss=0.009398, over 14990.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.0933, pruned_loss=0.01517, audio_tagging_loss=0.009512, over 3042473.53 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 16.0 2023-11-22 09:06:41,943 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282100 2023-11-22 09:06:51,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1880700.0, ans=0.125 2023-11-22 09:07:18,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1880833.3333333333, ans=0.5 2023-11-22 09:07:22,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1880833.3333333333, ans=0.125 2023-11-22 09:07:31,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1880900.0, ans=0.125 2023-11-22 09:07:43,084 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5600, loss[loss=0.05756, simple_loss=0.07393, pruned_loss=0.01101, audio_tagging_loss=0.009583, over 15458.00 frames. ], tot_loss[loss=0.07106, simple_loss=0.09284, pruned_loss=0.01494, audio_tagging_loss=0.009703, over 3053419.45 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2023-11-22 09:07:47,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282150 2023-11-22 09:08:02,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1881033.3333333333, ans=0.1 2023-11-22 09:08:21,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.179e+01 8.745e+01 9.416e+01 1.882e+02, threshold=1.749e+02, percent-clipped=1.0 2023-11-22 09:08:30,612 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 09:08:47,873 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5650, loss[loss=0.05619, simple_loss=0.07163, pruned_loss=0.008656, audio_tagging_loss=0.01172, over 14305.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09333, pruned_loss=0.01508, audio_tagging_loss=0.009761, over 3046148.00 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:08:51,638 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282200 2023-11-22 09:08:56,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1881300.0, ans=0.125 2023-11-22 09:09:11,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1881366.6666666667, ans=0.125 2023-11-22 09:09:19,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2023-11-22 09:09:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1881500.0, ans=0.125 2023-11-22 09:09:51,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1881633.3333333333, ans=0.0 2023-11-22 09:09:52,837 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5700, loss[loss=0.07696, simple_loss=0.1038, pruned_loss=0.0165, audio_tagging_loss=0.008574, over 14630.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09396, pruned_loss=0.01511, audio_tagging_loss=0.009744, over 3048592.67 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:09:56,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282250 2023-11-22 09:09:58,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-22 09:10:16,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1881700.0, ans=0.2 2023-11-22 09:10:22,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-22 09:10:23,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1881766.6666666667, ans=0.09899494936611666 2023-11-22 09:10:30,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.915e+01 8.108e+01 8.715e+01 9.329e+01 1.149e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-22 09:10:55,632 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5750, loss[loss=0.07119, simple_loss=0.0864, pruned_loss=0.01636, audio_tagging_loss=0.01163, over 15318.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.09246, pruned_loss=0.01503, audio_tagging_loss=0.009726, over 3043035.56 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:10:59,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282300 2023-11-22 09:11:07,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1881966.6666666667, ans=0.125 2023-11-22 09:11:35,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1882166.6666666667, ans=0.025 2023-11-22 09:11:36,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-22 09:11:42,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1882166.6666666667, ans=0.05 2023-11-22 09:11:43,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-22 09:11:45,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-22 09:11:48,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1882233.3333333333, ans=0.2 2023-11-22 09:11:49,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-22 09:12:01,181 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5800, loss[loss=0.07848, simple_loss=0.1087, pruned_loss=0.01633, audio_tagging_loss=0.00778, over 14979.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09357, pruned_loss=0.01525, audio_tagging_loss=0.009556, over 3046766.83 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:12:02,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1882300.0, ans=0.035 2023-11-22 09:12:04,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282350 2023-11-22 09:12:17,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1882366.6666666667, ans=0.125 2023-11-22 09:12:21,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1882366.6666666667, ans=0.125 2023-11-22 09:12:29,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-22 09:12:40,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.350e+01 8.216e+01 8.857e+01 9.622e+01 1.311e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 09:12:42,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1882500.0, ans=0.125 2023-11-22 09:12:46,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1882500.0, ans=0.125 2023-11-22 09:12:48,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-22 09:12:49,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1882500.0, ans=0.0 2023-11-22 09:12:51,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1882566.6666666667, ans=0.1 2023-11-22 09:12:57,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1882566.6666666667, ans=0.0 2023-11-22 09:13:05,425 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5850, loss[loss=0.07137, simple_loss=0.09969, pruned_loss=0.01509, audio_tagging_loss=0.00644, over 15848.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09327, pruned_loss=0.01513, audio_tagging_loss=0.009415, over 3050642.00 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:13:05,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1882633.3333333333, ans=0.2 2023-11-22 09:13:09,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282400 2023-11-22 09:13:17,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.14 vs. limit=10.0 2023-11-22 09:13:35,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1882766.6666666667, ans=0.0 2023-11-22 09:13:40,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1882766.6666666667, ans=0.125 2023-11-22 09:13:56,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1882900.0, ans=0.125 2023-11-22 09:14:10,096 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5900, loss[loss=0.09848, simple_loss=0.127, pruned_loss=0.02681, audio_tagging_loss=0.008161, over 15012.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09361, pruned_loss=0.01517, audio_tagging_loss=0.0094, over 3047019.50 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:14:14,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282450 2023-11-22 09:14:15,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1882966.6666666667, ans=0.125 2023-11-22 09:14:23,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1883033.3333333333, ans=0.2 2023-11-22 09:14:25,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1883033.3333333333, ans=0.125 2023-11-22 09:14:26,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-11-22 09:14:34,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1883033.3333333333, ans=0.0 2023-11-22 09:14:39,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=12.0 2023-11-22 09:14:40,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1883100.0, ans=0.04949747468305833 2023-11-22 09:14:40,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1883100.0, ans=0.125 2023-11-22 09:14:50,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.181e+01 9.057e+01 9.624e+01 1.134e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-22 09:14:51,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1883166.6666666667, ans=0.2 2023-11-22 09:15:14,472 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 5950, loss[loss=0.0627, simple_loss=0.07912, pruned_loss=0.01273, audio_tagging_loss=0.01041, over 15524.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.09486, pruned_loss=0.01538, audio_tagging_loss=0.009291, over 3058605.66 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:15:18,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282500 2023-11-22 09:15:34,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-22 09:15:36,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1883366.6666666667, ans=0.125 2023-11-22 09:16:12,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1883566.6666666667, ans=0.2 2023-11-22 09:16:19,194 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6000, loss[loss=0.08829, simple_loss=0.1263, pruned_loss=0.01835, audio_tagging_loss=0.006805, over 16147.00 frames. ], tot_loss[loss=0.07212, simple_loss=0.09495, pruned_loss=0.0154, audio_tagging_loss=0.009234, over 3050464.01 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:16:19,195 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 09:17:01,472 INFO [train_asr.py:1253] (1/4) Epoch 24, validation: loss=0.05933, simple_loss=0.05174, pruned_loss=0.005222, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-22 09:17:01,473 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 09:17:03,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1883633.3333333333, ans=0.0 2023-11-22 09:17:05,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282550 2023-11-22 09:17:08,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1883633.3333333333, ans=0.1 2023-11-22 09:17:21,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1883700.0, ans=0.0 2023-11-22 09:17:26,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1883766.6666666667, ans=0.125 2023-11-22 09:17:41,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1883833.3333333333, ans=0.1 2023-11-22 09:17:42,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.160e+01 8.694e+01 9.357e+01 1.133e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 09:17:49,669 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 09:17:55,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1883900.0, ans=0.125 2023-11-22 09:18:06,493 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6050, loss[loss=0.06823, simple_loss=0.08839, pruned_loss=0.01492, audio_tagging_loss=0.009121, over 16033.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09431, pruned_loss=0.01532, audio_tagging_loss=0.009248, over 3054223.10 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:18:10,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282600 2023-11-22 09:18:15,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-22 09:18:24,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1884033.3333333333, ans=0.125 2023-11-22 09:18:54,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-22 09:18:58,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1884233.3333333333, ans=0.125 2023-11-22 09:19:00,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1884233.3333333333, ans=0.2 2023-11-22 09:19:03,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-11-22 09:19:12,043 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6100, loss[loss=0.04203, simple_loss=0.04933, pruned_loss=0.006204, audio_tagging_loss=0.01116, over 14821.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09454, pruned_loss=0.01523, audio_tagging_loss=0.009208, over 3060175.65 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:19:15,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282650 2023-11-22 09:19:19,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884300.0, ans=0.1 2023-11-22 09:19:49,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1884500.0, ans=0.125 2023-11-22 09:19:53,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.538e+01 9.030e+01 1.014e+02 1.305e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-22 09:20:01,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1884500.0, ans=0.0 2023-11-22 09:20:16,939 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6150, loss[loss=0.0708, simple_loss=0.09633, pruned_loss=0.01476, audio_tagging_loss=0.00788, over 16460.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09535, pruned_loss=0.0155, audio_tagging_loss=0.009198, over 3056030.99 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:20:17,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1884633.3333333333, ans=0.125 2023-11-22 09:20:20,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282700 2023-11-22 09:20:26,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1884633.3333333333, ans=0.125 2023-11-22 09:20:39,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-22 09:20:41,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-22 09:21:06,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884833.3333333333, ans=0.1 2023-11-22 09:21:10,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-22 09:21:16,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1884900.0, ans=0.2 2023-11-22 09:21:22,162 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6200, loss[loss=0.05402, simple_loss=0.06706, pruned_loss=0.01083, audio_tagging_loss=0.009663, over 14905.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09534, pruned_loss=0.0154, audio_tagging_loss=0.009297, over 3055962.97 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:21:25,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282750 2023-11-22 09:21:26,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1884966.6666666667, ans=0.0 2023-11-22 09:21:54,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1885100.0, ans=0.1 2023-11-22 09:21:58,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1885100.0, ans=0.125 2023-11-22 09:22:03,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.132e+01 8.774e+01 9.442e+01 1.403e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-22 09:22:07,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1885166.6666666667, ans=0.125 2023-11-22 09:22:08,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2023-11-22 09:22:16,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1885233.3333333333, ans=0.125 2023-11-22 09:22:22,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1885233.3333333333, ans=0.2 2023-11-22 09:22:26,714 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6250, loss[loss=0.07115, simple_loss=0.09427, pruned_loss=0.01403, audio_tagging_loss=0.009986, over 14249.00 frames. ], tot_loss[loss=0.07187, simple_loss=0.09429, pruned_loss=0.01527, audio_tagging_loss=0.009458, over 3050142.85 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:22:31,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282800 2023-11-22 09:22:37,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1885300.0, ans=0.0 2023-11-22 09:22:41,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1885366.6666666667, ans=0.0 2023-11-22 09:22:56,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2023-11-22 09:23:02,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1885433.3333333333, ans=0.125 2023-11-22 09:23:13,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1885500.0, ans=0.0 2023-11-22 09:23:30,896 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6300, loss[loss=0.07, simple_loss=0.09248, pruned_loss=0.01417, audio_tagging_loss=0.009589, over 15686.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.09536, pruned_loss=0.01551, audio_tagging_loss=0.009555, over 3053292.20 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:23:33,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-22 09:23:35,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282850 2023-11-22 09:23:36,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1885633.3333333333, ans=0.125 2023-11-22 09:24:11,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.510e+01 9.202e+01 1.035e+02 1.385e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-22 09:24:29,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2023-11-22 09:24:34,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-11-22 09:24:35,092 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6350, loss[loss=0.08015, simple_loss=0.1104, pruned_loss=0.01839, audio_tagging_loss=0.006543, over 16372.00 frames. ], tot_loss[loss=0.07287, simple_loss=0.09579, pruned_loss=0.01543, audio_tagging_loss=0.009546, over 3056072.91 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:24:38,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282900 2023-11-22 09:25:24,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1886166.6666666667, ans=0.0 2023-11-22 09:25:38,854 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6400, loss[loss=0.05273, simple_loss=0.0621, pruned_loss=0.009188, audio_tagging_loss=0.0125, over 15161.00 frames. ], tot_loss[loss=0.07243, simple_loss=0.09529, pruned_loss=0.01525, audio_tagging_loss=0.009539, over 3046836.28 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:25:43,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 282950 2023-11-22 09:25:51,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1886366.6666666667, ans=0.0 2023-11-22 09:25:51,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-22 09:25:58,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1886366.6666666667, ans=0.2 2023-11-22 09:26:03,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-22 09:26:07,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1886433.3333333333, ans=0.125 2023-11-22 09:26:10,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-22 09:26:18,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-22 09:26:20,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.176e+01 8.732e+01 9.554e+01 1.218e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 09:26:24,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1886500.0, ans=0.1 2023-11-22 09:26:43,529 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6450, loss[loss=0.06242, simple_loss=0.08956, pruned_loss=0.01326, audio_tagging_loss=0.004388, over 14822.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09442, pruned_loss=0.01522, audio_tagging_loss=0.009611, over 3038639.92 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:26:45,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-22 09:26:47,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283000 2023-11-22 09:26:51,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1886633.3333333333, ans=0.125 2023-11-22 09:26:51,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1886633.3333333333, ans=0.125 2023-11-22 09:26:58,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-22 09:27:07,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1886700.0, ans=0.09899494936611666 2023-11-22 09:27:10,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1886766.6666666667, ans=0.0 2023-11-22 09:27:21,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1886833.3333333333, ans=0.1 2023-11-22 09:27:38,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1886900.0, ans=0.125 2023-11-22 09:27:48,670 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6500, loss[loss=0.06173, simple_loss=0.08208, pruned_loss=0.01076, audio_tagging_loss=0.009925, over 15224.00 frames. ], tot_loss[loss=0.07188, simple_loss=0.09387, pruned_loss=0.01524, audio_tagging_loss=0.009703, over 3040750.62 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:27:50,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1886966.6666666667, ans=0.125 2023-11-22 09:27:52,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283050 2023-11-22 09:27:52,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2023-11-22 09:28:00,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-22 09:28:17,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2023-11-22 09:28:22,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1887100.0, ans=0.0 2023-11-22 09:28:30,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.224e+01 8.620e+01 9.457e+01 1.249e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-22 09:28:34,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1887166.6666666667, ans=0.125 2023-11-22 09:28:35,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1887166.6666666667, ans=0.0 2023-11-22 09:28:36,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1887166.6666666667, ans=0.125 2023-11-22 09:28:37,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1887166.6666666667, ans=0.2 2023-11-22 09:28:38,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1887233.3333333333, ans=0.0 2023-11-22 09:28:52,082 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6550, loss[loss=0.08184, simple_loss=0.107, pruned_loss=0.01873, audio_tagging_loss=0.009625, over 14910.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09391, pruned_loss=0.01528, audio_tagging_loss=0.009486, over 3042334.27 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:28:55,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283100 2023-11-22 09:29:00,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1887300.0, ans=0.125 2023-11-22 09:29:11,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1887366.6666666667, ans=0.125 2023-11-22 09:29:35,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-22 09:29:41,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-22 09:29:56,336 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6600, loss[loss=0.05065, simple_loss=0.05802, pruned_loss=0.01326, audio_tagging_loss=0.008384, over 14468.00 frames. ], tot_loss[loss=0.072, simple_loss=0.09457, pruned_loss=0.01535, audio_tagging_loss=0.009362, over 3037943.72 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:29:59,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283150 2023-11-22 09:30:38,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.912e+01 8.247e+01 8.842e+01 9.573e+01 1.466e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-22 09:30:43,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2023-11-22 09:30:45,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-22 09:30:52,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1887900.0, ans=0.125 2023-11-22 09:30:52,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1887900.0, ans=0.1 2023-11-22 09:30:56,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1887900.0, ans=0.0 2023-11-22 09:31:00,483 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6650, loss[loss=0.05644, simple_loss=0.07574, pruned_loss=0.01128, audio_tagging_loss=0.007281, over 14568.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09469, pruned_loss=0.01534, audio_tagging_loss=0.009289, over 3034481.78 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:31:04,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283200 2023-11-22 09:31:07,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-22 09:31:10,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1887966.6666666667, ans=0.125 2023-11-22 09:31:38,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1888166.6666666667, ans=0.125 2023-11-22 09:31:39,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1888166.6666666667, ans=0.125 2023-11-22 09:31:42,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1888166.6666666667, ans=0.125 2023-11-22 09:31:56,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1888233.3333333333, ans=0.125 2023-11-22 09:32:04,415 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6700, loss[loss=0.05153, simple_loss=0.05946, pruned_loss=0.01281, audio_tagging_loss=0.008981, over 14674.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.09398, pruned_loss=0.01517, audio_tagging_loss=0.009246, over 3032897.38 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:32:08,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283250 2023-11-22 09:32:28,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1888433.3333333333, ans=0.0 2023-11-22 09:32:39,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-22 09:32:47,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.099e+01 8.643e+01 9.373e+01 1.139e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 09:32:57,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-22 09:33:08,702 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6750, loss[loss=0.08225, simple_loss=0.1099, pruned_loss=0.01894, audio_tagging_loss=0.008352, over 15829.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09509, pruned_loss=0.01562, audio_tagging_loss=0.009199, over 3032637.76 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:33:12,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283300 2023-11-22 09:33:15,025 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 09:33:19,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1888700.0, ans=0.125 2023-11-22 09:33:48,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-22 09:33:52,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1888833.3333333333, ans=0.125 2023-11-22 09:33:52,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-22 09:33:58,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1888900.0, ans=0.125 2023-11-22 09:34:04,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1888900.0, ans=0.125 2023-11-22 09:34:12,851 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6800, loss[loss=0.06998, simple_loss=0.09211, pruned_loss=0.01309, audio_tagging_loss=0.01083, over 16422.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09406, pruned_loss=0.01525, audio_tagging_loss=0.009243, over 3037391.87 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:34:17,155 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283350 2023-11-22 09:34:20,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-22 09:34:21,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1888966.6666666667, ans=0.0 2023-11-22 09:34:26,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=22.5 2023-11-22 09:34:32,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1889033.3333333333, ans=0.125 2023-11-22 09:34:33,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1889033.3333333333, ans=0.125 2023-11-22 09:34:37,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1889100.0, ans=0.0 2023-11-22 09:34:41,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1889100.0, ans=0.025 2023-11-22 09:34:50,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1889166.6666666667, ans=0.0 2023-11-22 09:34:54,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 7.933e+01 8.700e+01 9.598e+01 1.312e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 09:34:57,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1889166.6666666667, ans=0.125 2023-11-22 09:35:16,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1889300.0, ans=0.0 2023-11-22 09:35:17,330 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6850, loss[loss=0.05707, simple_loss=0.07961, pruned_loss=0.01033, audio_tagging_loss=0.006932, over 16151.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09382, pruned_loss=0.01513, audio_tagging_loss=0.009166, over 3044241.64 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:35:21,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283400 2023-11-22 09:35:22,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1889300.0, ans=0.0 2023-11-22 09:35:24,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1889300.0, ans=0.5 2023-11-22 09:36:02,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-22 09:36:07,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2023-11-22 09:36:21,956 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6900, loss[loss=0.0987, simple_loss=0.1363, pruned_loss=0.0229, audio_tagging_loss=0.007649, over 15428.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09442, pruned_loss=0.01522, audio_tagging_loss=0.009092, over 3043648.76 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:36:25,814 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283450 2023-11-22 09:36:31,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1889633.3333333333, ans=0.0 2023-11-22 09:36:47,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1889766.6666666667, ans=0.2 2023-11-22 09:36:57,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-22 09:37:03,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.035e+01 8.885e+01 9.571e+01 1.179e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 09:37:13,103 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 09:37:14,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1889900.0, ans=0.125 2023-11-22 09:37:14,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-22 09:37:23,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1889900.0, ans=0.125 2023-11-22 09:37:25,870 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 6950, loss[loss=0.06376, simple_loss=0.08377, pruned_loss=0.01203, audio_tagging_loss=0.009854, over 14016.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.09428, pruned_loss=0.01523, audio_tagging_loss=0.009116, over 3048885.47 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:37:30,259 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283500 2023-11-22 09:37:45,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1890033.3333333333, ans=10.0 2023-11-22 09:37:46,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1890033.3333333333, ans=0.1 2023-11-22 09:38:09,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1890166.6666666667, ans=0.125 2023-11-22 09:38:19,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 09:38:22,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1890233.3333333333, ans=0.1 2023-11-22 09:38:31,030 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7000, loss[loss=0.06716, simple_loss=0.08336, pruned_loss=0.01466, audio_tagging_loss=0.01083, over 15325.00 frames. ], tot_loss[loss=0.07216, simple_loss=0.0949, pruned_loss=0.01547, audio_tagging_loss=0.009234, over 3048764.73 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:38:33,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1890300.0, ans=0.125 2023-11-22 09:38:33,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1890300.0, ans=0.1 2023-11-22 09:38:34,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283550 2023-11-22 09:38:46,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2023-11-22 09:39:13,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.396e+01 8.305e+01 8.768e+01 9.472e+01 1.582e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 09:39:20,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1890500.0, ans=0.1 2023-11-22 09:39:27,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1890566.6666666667, ans=0.2 2023-11-22 09:39:35,611 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7050, loss[loss=0.06523, simple_loss=0.0936, pruned_loss=0.01211, audio_tagging_loss=0.006319, over 15338.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09404, pruned_loss=0.01527, audio_tagging_loss=0.009363, over 3046600.73 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:39:38,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1890633.3333333333, ans=0.1 2023-11-22 09:39:39,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283600 2023-11-22 09:39:45,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1890633.3333333333, ans=0.0 2023-11-22 09:39:51,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2023-11-22 09:39:53,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1890700.0, ans=0.125 2023-11-22 09:40:17,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1890833.3333333333, ans=0.05 2023-11-22 09:40:19,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1890833.3333333333, ans=0.125 2023-11-22 09:40:26,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-11-22 09:40:38,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1890966.6666666667, ans=0.125 2023-11-22 09:40:39,973 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7100, loss[loss=0.06892, simple_loss=0.09691, pruned_loss=0.0126, audio_tagging_loss=0.007864, over 15474.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.09386, pruned_loss=0.01512, audio_tagging_loss=0.009487, over 3055898.44 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:40:43,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1890966.6666666667, ans=0.2 2023-11-22 09:40:44,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283650 2023-11-22 09:40:51,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1890966.6666666667, ans=0.95 2023-11-22 09:41:16,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1891100.0, ans=0.0 2023-11-22 09:41:23,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.301e+01 8.958e+01 9.633e+01 1.183e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 09:41:30,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1891233.3333333333, ans=0.1 2023-11-22 09:41:32,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1891233.3333333333, ans=0.1 2023-11-22 09:41:45,096 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7150, loss[loss=0.09337, simple_loss=0.1128, pruned_loss=0.02762, audio_tagging_loss=0.009346, over 14483.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09435, pruned_loss=0.01528, audio_tagging_loss=0.009477, over 3055574.03 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:41:48,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283700 2023-11-22 09:42:18,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1891433.3333333333, ans=0.2 2023-11-22 09:42:19,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1891433.3333333333, ans=0.1 2023-11-22 09:42:49,831 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7200, loss[loss=0.07453, simple_loss=0.09163, pruned_loss=0.01699, audio_tagging_loss=0.01172, over 14504.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09344, pruned_loss=0.01509, audio_tagging_loss=0.009638, over 3049240.08 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:42:54,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283750 2023-11-22 09:43:00,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-22 09:43:32,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.224e+01 8.056e+01 8.758e+01 9.674e+01 1.275e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 09:43:33,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-22 09:43:35,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1891833.3333333333, ans=10.0 2023-11-22 09:43:49,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2023-11-22 09:43:54,009 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7250, loss[loss=0.06277, simple_loss=0.08666, pruned_loss=0.01021, audio_tagging_loss=0.009232, over 14532.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09432, pruned_loss=0.01525, audio_tagging_loss=0.009669, over 3045656.45 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:43:57,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283800 2023-11-22 09:43:58,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-22 09:44:02,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1891966.6666666667, ans=0.05 2023-11-22 09:44:11,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1892033.3333333333, ans=0.0 2023-11-22 09:44:25,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1892100.0, ans=0.2 2023-11-22 09:44:29,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1892100.0, ans=0.125 2023-11-22 09:44:36,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1892166.6666666667, ans=0.125 2023-11-22 09:44:44,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1892233.3333333333, ans=0.2 2023-11-22 09:44:47,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1892233.3333333333, ans=0.125 2023-11-22 09:44:59,075 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7300, loss[loss=0.04863, simple_loss=0.05834, pruned_loss=0.008139, audio_tagging_loss=0.01133, over 14278.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09507, pruned_loss=0.01529, audio_tagging_loss=0.009491, over 3044743.16 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:45:02,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283850 2023-11-22 09:45:18,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1892366.6666666667, ans=0.02 2023-11-22 09:45:20,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-22 09:45:21,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=22.5 2023-11-22 09:45:24,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-22 09:45:39,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-22 09:45:41,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.095e+01 8.850e+01 9.580e+01 1.188e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-22 09:45:48,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-22 09:45:49,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1892566.6666666667, ans=0.125 2023-11-22 09:45:57,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1892566.6666666667, ans=0.125 2023-11-22 09:46:01,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1892633.3333333333, ans=0.125 2023-11-22 09:46:02,632 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7350, loss[loss=0.07156, simple_loss=0.08058, pruned_loss=0.02039, audio_tagging_loss=0.01089, over 16141.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.09491, pruned_loss=0.01531, audio_tagging_loss=0.009341, over 3044685.65 frames. ], batch size: 62, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:46:02,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1892633.3333333333, ans=0.125 2023-11-22 09:46:03,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1892633.3333333333, ans=0.125 2023-11-22 09:46:04,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1892633.3333333333, ans=0.2 2023-11-22 09:46:06,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283900 2023-11-22 09:46:10,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1892633.3333333333, ans=0.125 2023-11-22 09:46:14,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1892700.0, ans=0.1 2023-11-22 09:46:15,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1892700.0, ans=0.1 2023-11-22 09:46:44,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-22 09:46:59,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1892900.0, ans=0.0 2023-11-22 09:47:07,023 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7400, loss[loss=0.06053, simple_loss=0.07561, pruned_loss=0.01477, audio_tagging_loss=0.007956, over 13992.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09453, pruned_loss=0.01525, audio_tagging_loss=0.009198, over 3040124.75 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:47:08,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-22 09:47:10,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 283950 2023-11-22 09:47:20,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1893033.3333333333, ans=0.125 2023-11-22 09:47:35,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-11-22 09:47:46,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1893166.6666666667, ans=0.0 2023-11-22 09:47:51,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.493e+01 8.994e+01 9.861e+01 1.121e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 09:47:51,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1893166.6666666667, ans=0.125 2023-11-22 09:47:56,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1893166.6666666667, ans=0.0 2023-11-22 09:48:12,159 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7450, loss[loss=0.07086, simple_loss=0.08632, pruned_loss=0.01743, audio_tagging_loss=0.01027, over 14763.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.09412, pruned_loss=0.01514, audio_tagging_loss=0.009239, over 3038358.85 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:48:12,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1893300.0, ans=0.125 2023-11-22 09:48:15,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284000 2023-11-22 09:48:25,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1893300.0, ans=0.1 2023-11-22 09:48:30,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1893366.6666666667, ans=0.0 2023-11-22 09:48:32,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1893366.6666666667, ans=0.125 2023-11-22 09:48:33,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1893366.6666666667, ans=0.1 2023-11-22 09:48:33,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1893366.6666666667, ans=0.0 2023-11-22 09:48:53,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-22 09:49:17,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-22 09:49:19,184 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7500, loss[loss=0.08026, simple_loss=0.1142, pruned_loss=0.01636, audio_tagging_loss=0.006817, over 15282.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09308, pruned_loss=0.01481, audio_tagging_loss=0.009215, over 3030163.51 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:49:22,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284050 2023-11-22 09:49:24,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1893633.3333333333, ans=0.0 2023-11-22 09:49:27,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893633.3333333333, ans=0.1 2023-11-22 09:49:33,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=22.5 2023-11-22 09:49:49,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1893766.6666666667, ans=0.0 2023-11-22 09:50:02,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.170e+01 8.839e+01 9.515e+01 1.142e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-22 09:50:15,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893900.0, ans=0.1 2023-11-22 09:50:15,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893900.0, ans=0.1 2023-11-22 09:50:23,607 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7550, loss[loss=0.07649, simple_loss=0.1013, pruned_loss=0.01674, audio_tagging_loss=0.009094, over 16123.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09281, pruned_loss=0.01476, audio_tagging_loss=0.009263, over 3032901.02 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 16.0 2023-11-22 09:50:27,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284100 2023-11-22 09:50:27,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1893966.6666666667, ans=0.125 2023-11-22 09:50:29,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1893966.6666666667, ans=0.0 2023-11-22 09:50:34,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1893966.6666666667, ans=0.0 2023-11-22 09:50:44,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1894033.3333333333, ans=0.125 2023-11-22 09:50:55,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1894100.0, ans=0.035 2023-11-22 09:50:58,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1894100.0, ans=0.125 2023-11-22 09:51:28,602 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7600, loss[loss=0.06719, simple_loss=0.08013, pruned_loss=0.01664, audio_tagging_loss=0.01048, over 14932.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09203, pruned_loss=0.01476, audio_tagging_loss=0.009283, over 3037035.93 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2023-11-22 09:51:31,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1894300.0, ans=0.0 2023-11-22 09:51:31,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1894300.0, ans=0.125 2023-11-22 09:51:32,359 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284150 2023-11-22 09:51:39,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1894366.6666666667, ans=0.125 2023-11-22 09:51:43,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1894366.6666666667, ans=0.2 2023-11-22 09:52:07,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1894500.0, ans=0.125 2023-11-22 09:52:12,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.245e+01 8.957e+01 9.580e+01 1.201e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 09:52:14,859 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 09:52:15,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2023-11-22 09:52:32,245 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7650, loss[loss=0.05897, simple_loss=0.07904, pruned_loss=0.01336, audio_tagging_loss=0.006086, over 14878.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.0921, pruned_loss=0.01484, audio_tagging_loss=0.009323, over 3042173.24 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:52:36,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284200 2023-11-22 09:53:03,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1894766.6666666667, ans=0.125 2023-11-22 09:53:23,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1894900.0, ans=0.07 2023-11-22 09:53:32,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1894900.0, ans=0.125 2023-11-22 09:53:36,919 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7700, loss[loss=0.06098, simple_loss=0.07434, pruned_loss=0.01303, audio_tagging_loss=0.01078, over 15825.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09176, pruned_loss=0.01481, audio_tagging_loss=0.009421, over 3038913.43 frames. ], batch size: 59, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:53:41,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284250 2023-11-22 09:54:10,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1895100.0, ans=0.125 2023-11-22 09:54:20,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 7.998e+01 8.702e+01 9.312e+01 1.429e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 09:54:25,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2023-11-22 09:54:25,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1895166.6666666667, ans=0.125 2023-11-22 09:54:41,580 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7750, loss[loss=0.05927, simple_loss=0.06939, pruned_loss=0.01505, audio_tagging_loss=0.00953, over 14653.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09267, pruned_loss=0.01502, audio_tagging_loss=0.009453, over 3039551.71 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:54:45,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284300 2023-11-22 09:54:47,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2023-11-22 09:54:59,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1895366.6666666667, ans=0.0 2023-11-22 09:55:05,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1895433.3333333333, ans=0.0 2023-11-22 09:55:09,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1895433.3333333333, ans=0.0 2023-11-22 09:55:19,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-22 09:55:20,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1895500.0, ans=0.125 2023-11-22 09:55:23,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1895500.0, ans=0.125 2023-11-22 09:55:27,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1895500.0, ans=0.1 2023-11-22 09:55:38,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1895566.6666666667, ans=0.1 2023-11-22 09:55:42,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1895566.6666666667, ans=0.125 2023-11-22 09:55:45,640 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7800, loss[loss=0.06903, simple_loss=0.08739, pruned_loss=0.0147, audio_tagging_loss=0.01064, over 16854.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.0927, pruned_loss=0.0149, audio_tagging_loss=0.009573, over 3045990.40 frames. ], batch size: 65, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:55:49,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284350 2023-11-22 09:55:52,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-11-22 09:56:15,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1895766.6666666667, ans=0.1 2023-11-22 09:56:15,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1895766.6666666667, ans=0.0 2023-11-22 09:56:22,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-22 09:56:29,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.112e+01 8.791e+01 9.537e+01 1.111e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 09:56:38,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1895900.0, ans=0.1 2023-11-22 09:56:49,583 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7850, loss[loss=0.08226, simple_loss=0.1087, pruned_loss=0.01914, audio_tagging_loss=0.008789, over 15077.00 frames. ], tot_loss[loss=0.07112, simple_loss=0.09298, pruned_loss=0.01505, audio_tagging_loss=0.009583, over 3044459.35 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:56:53,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284400 2023-11-22 09:57:20,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1896100.0, ans=0.125 2023-11-22 09:57:21,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1896100.0, ans=15.0 2023-11-22 09:57:32,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1896166.6666666667, ans=0.125 2023-11-22 09:57:39,268 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 09:57:40,472 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 09:57:49,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1896233.3333333333, ans=0.0 2023-11-22 09:57:52,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1896233.3333333333, ans=0.07 2023-11-22 09:57:55,042 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7900, loss[loss=0.06216, simple_loss=0.08067, pruned_loss=0.01111, audio_tagging_loss=0.01072, over 14456.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09245, pruned_loss=0.01501, audio_tagging_loss=0.00961, over 3046512.58 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 09:57:55,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1896300.0, ans=0.2 2023-11-22 09:57:56,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1896300.0, ans=0.125 2023-11-22 09:57:59,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284450 2023-11-22 09:58:05,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1896300.0, ans=0.0 2023-11-22 09:58:09,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-22 09:58:15,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1896366.6666666667, ans=0.125 2023-11-22 09:58:37,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.160e+01 8.848e+01 9.602e+01 1.825e+02, threshold=1.770e+02, percent-clipped=1.0 2023-11-22 09:58:56,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1896566.6666666667, ans=0.1 2023-11-22 09:58:58,637 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 7950, loss[loss=0.05813, simple_loss=0.07517, pruned_loss=0.009474, audio_tagging_loss=0.01107, over 14977.00 frames. ], tot_loss[loss=0.07105, simple_loss=0.09266, pruned_loss=0.01506, audio_tagging_loss=0.00966, over 3052617.43 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 09:59:02,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284500 2023-11-22 09:59:03,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-22 09:59:04,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-22 09:59:06,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1896633.3333333333, ans=0.125 2023-11-22 09:59:15,222 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 09:59:35,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1896766.6666666667, ans=0.0 2023-11-22 09:59:40,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1896833.3333333333, ans=0.0 2023-11-22 09:59:49,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1896900.0, ans=0.125 2023-11-22 09:59:54,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-22 09:59:55,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1896900.0, ans=0.2 2023-11-22 10:00:02,904 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8000, loss[loss=0.06106, simple_loss=0.07291, pruned_loss=0.01384, audio_tagging_loss=0.01077, over 16058.00 frames. ], tot_loss[loss=0.07128, simple_loss=0.09291, pruned_loss=0.01519, audio_tagging_loss=0.009629, over 3048380.52 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:00:03,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-22 10:00:06,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284550 2023-11-22 10:00:09,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1896966.6666666667, ans=0.125 2023-11-22 10:00:22,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1897033.3333333333, ans=0.125 2023-11-22 10:00:45,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1897166.6666666667, ans=0.07 2023-11-22 10:00:47,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.187e+01 8.649e+01 9.156e+01 1.196e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-22 10:01:06,838 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8050, loss[loss=0.08702, simple_loss=0.1224, pruned_loss=0.01737, audio_tagging_loss=0.008432, over 15701.00 frames. ], tot_loss[loss=0.07087, simple_loss=0.09217, pruned_loss=0.0151, audio_tagging_loss=0.009683, over 3053213.65 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:01:11,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284600 2023-11-22 10:01:50,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-22 10:01:52,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1897500.0, ans=0.1 2023-11-22 10:02:12,572 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8100, loss[loss=0.06718, simple_loss=0.08088, pruned_loss=0.01715, audio_tagging_loss=0.009591, over 14180.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09186, pruned_loss=0.01516, audio_tagging_loss=0.00964, over 3041478.35 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:02:16,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284650 2023-11-22 10:02:19,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1897633.3333333333, ans=0.1 2023-11-22 10:02:20,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1897633.3333333333, ans=0.0 2023-11-22 10:02:31,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1897700.0, ans=0.0 2023-11-22 10:02:59,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.863e+01 8.417e+01 8.877e+01 9.599e+01 1.117e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 10:03:13,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-22 10:03:16,295 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8150, loss[loss=0.03843, simple_loss=0.0503, pruned_loss=0.003711, audio_tagging_loss=0.009572, over 14237.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09308, pruned_loss=0.01518, audio_tagging_loss=0.009486, over 3043188.96 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:03:20,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284700 2023-11-22 10:03:51,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1898100.0, ans=0.125 2023-11-22 10:04:20,963 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8200, loss[loss=0.06684, simple_loss=0.08453, pruned_loss=0.01432, audio_tagging_loss=0.01026, over 16672.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09335, pruned_loss=0.01516, audio_tagging_loss=0.009341, over 3046570.00 frames. ], batch size: 65, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:04:22,280 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:04:25,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284750 2023-11-22 10:04:37,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1898366.6666666667, ans=0.0 2023-11-22 10:04:54,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1898433.3333333333, ans=0.0 2023-11-22 10:05:02,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1898500.0, ans=0.125 2023-11-22 10:05:06,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 8.106e+01 8.680e+01 9.269e+01 1.329e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-22 10:05:25,263 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8250, loss[loss=0.05664, simple_loss=0.07862, pruned_loss=0.01021, audio_tagging_loss=0.007122, over 14719.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09431, pruned_loss=0.01528, audio_tagging_loss=0.009266, over 3049179.76 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:05:29,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284800 2023-11-22 10:05:32,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1898633.3333333333, ans=0.125 2023-11-22 10:05:39,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1898700.0, ans=0.2 2023-11-22 10:05:46,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1898700.0, ans=0.2 2023-11-22 10:06:01,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1898766.6666666667, ans=0.125 2023-11-22 10:06:21,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-22 10:06:22,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1898900.0, ans=0.0 2023-11-22 10:06:29,557 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8300, loss[loss=0.06759, simple_loss=0.08686, pruned_loss=0.01421, audio_tagging_loss=0.009953, over 14671.00 frames. ], tot_loss[loss=0.07079, simple_loss=0.09313, pruned_loss=0.01495, audio_tagging_loss=0.009279, over 3048546.73 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:06:33,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284850 2023-11-22 10:06:35,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2023-11-22 10:06:42,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1899033.3333333333, ans=0.125 2023-11-22 10:06:48,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1899033.3333333333, ans=0.125 2023-11-22 10:06:54,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1899100.0, ans=0.125 2023-11-22 10:06:58,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1899100.0, ans=0.125 2023-11-22 10:07:12,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1899166.6666666667, ans=0.125 2023-11-22 10:07:13,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-22 10:07:15,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 7.988e+01 8.571e+01 9.748e+01 1.296e+02, threshold=1.714e+02, percent-clipped=0.0 2023-11-22 10:07:17,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1899166.6666666667, ans=0.0 2023-11-22 10:07:33,337 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8350, loss[loss=0.0648, simple_loss=0.08346, pruned_loss=0.01414, audio_tagging_loss=0.008922, over 14346.00 frames. ], tot_loss[loss=0.07101, simple_loss=0.09317, pruned_loss=0.01507, audio_tagging_loss=0.009352, over 3049405.63 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:07:37,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284900 2023-11-22 10:07:45,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1899366.6666666667, ans=0.125 2023-11-22 10:07:47,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-11-22 10:08:38,240 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8400, loss[loss=0.06459, simple_loss=0.08138, pruned_loss=0.01451, audio_tagging_loss=0.009394, over 14372.00 frames. ], tot_loss[loss=0.07088, simple_loss=0.09309, pruned_loss=0.01499, audio_tagging_loss=0.009344, over 3048519.31 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:08:40,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1899633.3333333333, ans=0.125 2023-11-22 10:08:41,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 284950 2023-11-22 10:08:42,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1899633.3333333333, ans=0.125 2023-11-22 10:08:49,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1899700.0, ans=0.0 2023-11-22 10:09:05,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1899766.6666666667, ans=0.125 2023-11-22 10:09:20,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1899833.3333333333, ans=0.125 2023-11-22 10:09:24,807 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.199e+01 8.877e+01 9.641e+01 1.238e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 10:09:33,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1899900.0, ans=0.2 2023-11-22 10:09:42,467 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8450, loss[loss=0.07431, simple_loss=0.08681, pruned_loss=0.01681, audio_tagging_loss=0.0141, over 15219.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09375, pruned_loss=0.0153, audio_tagging_loss=0.009246, over 3049829.00 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:09:46,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285000 2023-11-22 10:09:55,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-22 10:09:58,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1900033.3333333333, ans=0.125 2023-11-22 10:10:18,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1900100.0, ans=0.1 2023-11-22 10:10:23,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1900166.6666666667, ans=0.0 2023-11-22 10:10:28,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1900166.6666666667, ans=0.125 2023-11-22 10:10:31,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1900166.6666666667, ans=0.125 2023-11-22 10:10:47,772 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8500, loss[loss=0.07913, simple_loss=0.09875, pruned_loss=0.01728, audio_tagging_loss=0.01247, over 14810.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09422, pruned_loss=0.01532, audio_tagging_loss=0.009272, over 3053775.25 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:10:48,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2023-11-22 10:10:49,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900300.0, ans=0.1 2023-11-22 10:10:49,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1900300.0, ans=0.0 2023-11-22 10:10:51,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285050 2023-11-22 10:11:07,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-11-22 10:11:19,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1900433.3333333333, ans=0.125 2023-11-22 10:11:29,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-22 10:11:33,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.621e+01 8.277e+01 9.051e+01 9.526e+01 1.493e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-22 10:11:39,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2023-11-22 10:11:52,682 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8550, loss[loss=0.07107, simple_loss=0.09046, pruned_loss=0.01327, audio_tagging_loss=0.01257, over 16502.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.094, pruned_loss=0.01515, audio_tagging_loss=0.009322, over 3054148.57 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:11:56,403 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285100 2023-11-22 10:12:03,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1900700.0, ans=0.125 2023-11-22 10:12:31,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-22 10:12:31,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1900833.3333333333, ans=0.0 2023-11-22 10:12:38,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1900833.3333333333, ans=0.125 2023-11-22 10:12:41,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1900833.3333333333, ans=0.125 2023-11-22 10:12:55,957 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8600, loss[loss=0.08242, simple_loss=0.1104, pruned_loss=0.01885, audio_tagging_loss=0.008359, over 14342.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09435, pruned_loss=0.01523, audio_tagging_loss=0.009321, over 3046955.36 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:12:59,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285150 2023-11-22 10:13:00,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1900966.6666666667, ans=0.125 2023-11-22 10:13:07,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1901033.3333333333, ans=0.125 2023-11-22 10:13:41,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.269e+01 8.846e+01 9.645e+01 1.157e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 10:13:57,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1901233.3333333333, ans=0.0 2023-11-22 10:13:59,986 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8650, loss[loss=0.06399, simple_loss=0.08629, pruned_loss=0.01326, audio_tagging_loss=0.007589, over 14525.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09467, pruned_loss=0.0153, audio_tagging_loss=0.009271, over 3044930.85 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:14:03,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285200 2023-11-22 10:14:05,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1901300.0, ans=0.0 2023-11-22 10:14:12,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-22 10:14:51,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1901566.6666666667, ans=0.125 2023-11-22 10:15:05,246 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8700, loss[loss=0.05899, simple_loss=0.06859, pruned_loss=0.01355, audio_tagging_loss=0.01114, over 14015.00 frames. ], tot_loss[loss=0.07262, simple_loss=0.09541, pruned_loss=0.01559, audio_tagging_loss=0.009324, over 3043538.44 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:15:09,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285250 2023-11-22 10:15:16,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-22 10:15:25,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1901700.0, ans=0.125 2023-11-22 10:15:34,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1901766.6666666667, ans=0.125 2023-11-22 10:15:51,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.447e+01 8.934e+01 9.629e+01 1.298e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 10:15:53,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1901833.3333333333, ans=0.0 2023-11-22 10:16:00,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1901900.0, ans=0.125 2023-11-22 10:16:00,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1901900.0, ans=0.125 2023-11-22 10:16:03,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1901900.0, ans=0.0 2023-11-22 10:16:09,455 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8750, loss[loss=0.06973, simple_loss=0.09454, pruned_loss=0.01272, audio_tagging_loss=0.009736, over 15828.00 frames. ], tot_loss[loss=0.07351, simple_loss=0.09657, pruned_loss=0.01584, audio_tagging_loss=0.009389, over 3048997.24 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:16:13,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285300 2023-11-22 10:16:19,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1901966.6666666667, ans=0.125 2023-11-22 10:16:29,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1902033.3333333333, ans=0.0 2023-11-22 10:16:38,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-11-22 10:16:56,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1902166.6666666667, ans=0.125 2023-11-22 10:16:59,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1902233.3333333333, ans=0.125 2023-11-22 10:17:08,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1902233.3333333333, ans=0.125 2023-11-22 10:17:13,271 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8800, loss[loss=0.05308, simple_loss=0.05948, pruned_loss=0.01229, audio_tagging_loss=0.01105, over 15772.00 frames. ], tot_loss[loss=0.0735, simple_loss=0.09653, pruned_loss=0.01585, audio_tagging_loss=0.009389, over 3049751.99 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:17:16,988 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285350 2023-11-22 10:17:39,825 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:17:43,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-22 10:17:54,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1902500.0, ans=0.125 2023-11-22 10:17:59,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.363e+01 9.001e+01 9.837e+01 1.230e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-22 10:18:11,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1902566.6666666667, ans=0.2 2023-11-22 10:18:17,776 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8850, loss[loss=0.06788, simple_loss=0.07973, pruned_loss=0.01766, audio_tagging_loss=0.01035, over 15451.00 frames. ], tot_loss[loss=0.07288, simple_loss=0.09552, pruned_loss=0.01557, audio_tagging_loss=0.009558, over 3049454.29 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:18:21,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285400 2023-11-22 10:18:28,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1902633.3333333333, ans=0.125 2023-11-22 10:18:32,174 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:18:33,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-22 10:18:35,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1902700.0, ans=0.125 2023-11-22 10:18:44,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1902766.6666666667, ans=0.125 2023-11-22 10:18:48,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1902766.6666666667, ans=0.025 2023-11-22 10:18:50,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1902766.6666666667, ans=0.125 2023-11-22 10:18:51,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1902766.6666666667, ans=0.035 2023-11-22 10:19:06,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902833.3333333333, ans=0.1 2023-11-22 10:19:18,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-11-22 10:19:21,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902966.6666666667, ans=0.1 2023-11-22 10:19:22,928 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8900, loss[loss=0.09768, simple_loss=0.1262, pruned_loss=0.02531, audio_tagging_loss=0.009291, over 15662.00 frames. ], tot_loss[loss=0.07304, simple_loss=0.09608, pruned_loss=0.01561, audio_tagging_loss=0.009392, over 3047100.43 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:19:25,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1902966.6666666667, ans=0.2 2023-11-22 10:19:25,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-22 10:19:26,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285450 2023-11-22 10:19:29,200 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:19:33,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1902966.6666666667, ans=0.125 2023-11-22 10:20:10,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.554e+01 8.009e+01 8.748e+01 9.224e+01 1.095e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 10:20:12,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1903166.6666666667, ans=0.1 2023-11-22 10:20:22,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1903233.3333333333, ans=0.09899494936611666 2023-11-22 10:20:26,869 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 8950, loss[loss=0.08703, simple_loss=0.119, pruned_loss=0.02033, audio_tagging_loss=0.007225, over 14639.00 frames. ], tot_loss[loss=0.07268, simple_loss=0.09583, pruned_loss=0.01552, audio_tagging_loss=0.009248, over 3045462.46 frames. ], batch size: 53, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:20:28,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1903300.0, ans=0.0 2023-11-22 10:20:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1903300.0, ans=0.2 2023-11-22 10:20:30,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285500 2023-11-22 10:20:38,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1903366.6666666667, ans=0.0 2023-11-22 10:20:53,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1903433.3333333333, ans=0.0 2023-11-22 10:20:54,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1903433.3333333333, ans=0.125 2023-11-22 10:21:06,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1903500.0, ans=0.125 2023-11-22 10:21:30,144 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9000, loss[loss=0.06714, simple_loss=0.08413, pruned_loss=0.01359, audio_tagging_loss=0.01149, over 14590.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.09592, pruned_loss=0.01557, audio_tagging_loss=0.009186, over 3050820.30 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:21:30,145 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 10:21:53,746 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3530, 5.0131, 4.6745, 5.2310], device='cuda:1') 2023-11-22 10:22:12,111 INFO [train_asr.py:1253] (1/4) Epoch 24, validation: loss=0.06037, simple_loss=0.05165, pruned_loss=0.00517, audio_tagging_loss=0.02938, over 4681554.00 frames. 2023-11-22 10:22:12,111 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 10:22:15,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285550 2023-11-22 10:22:31,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1903700.0, ans=0.125 2023-11-22 10:22:40,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-22 10:22:59,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.344e+01 8.279e+01 8.807e+01 9.755e+01 1.198e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 10:23:04,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-22 10:23:15,767 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9050, loss[loss=0.06587, simple_loss=0.09237, pruned_loss=0.01233, audio_tagging_loss=0.007355, over 14597.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.09616, pruned_loss=0.01549, audio_tagging_loss=0.009147, over 3047795.57 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:23:17,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2023-11-22 10:23:19,498 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285600 2023-11-22 10:23:31,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1904033.3333333333, ans=0.0 2023-11-22 10:23:45,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1904100.0, ans=0.125 2023-11-22 10:23:51,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-22 10:23:53,458 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:24:19,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1904300.0, ans=0.1 2023-11-22 10:24:20,401 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9100, loss[loss=0.08548, simple_loss=0.1109, pruned_loss=0.02448, audio_tagging_loss=0.005564, over 13638.00 frames. ], tot_loss[loss=0.0724, simple_loss=0.09553, pruned_loss=0.0155, audio_tagging_loss=0.009132, over 3050268.79 frames. ], batch size: 53, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:24:24,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285650 2023-11-22 10:24:37,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1904366.6666666667, ans=0.2 2023-11-22 10:24:41,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2023-11-22 10:24:48,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1904433.3333333333, ans=0.0 2023-11-22 10:25:07,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 8.029e+01 8.849e+01 9.580e+01 1.187e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-22 10:25:08,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1904500.0, ans=0.1 2023-11-22 10:25:12,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1904566.6666666667, ans=0.125 2023-11-22 10:25:14,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1904566.6666666667, ans=0.125 2023-11-22 10:25:15,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-22 10:25:21,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1904566.6666666667, ans=0.1 2023-11-22 10:25:24,628 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9150, loss[loss=0.06416, simple_loss=0.07865, pruned_loss=0.01619, audio_tagging_loss=0.00864, over 14170.00 frames. ], tot_loss[loss=0.07202, simple_loss=0.09509, pruned_loss=0.01538, audio_tagging_loss=0.009098, over 3047903.09 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 16.0 2023-11-22 10:25:26,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1904633.3333333333, ans=0.0 2023-11-22 10:25:28,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285700 2023-11-22 10:26:08,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1904833.3333333333, ans=0.2 2023-11-22 10:26:11,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1904833.3333333333, ans=0.0 2023-11-22 10:26:14,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1904900.0, ans=0.0 2023-11-22 10:26:28,547 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9200, loss[loss=0.07131, simple_loss=0.09736, pruned_loss=0.01399, audio_tagging_loss=0.008644, over 15667.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09569, pruned_loss=0.0155, audio_tagging_loss=0.009102, over 3047769.84 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:26:32,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285750 2023-11-22 10:26:48,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1905033.3333333333, ans=0.125 2023-11-22 10:26:49,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-11-22 10:27:07,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-22 10:27:15,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.192e+01 8.757e+01 9.519e+01 1.252e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 10:27:22,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2023-11-22 10:27:28,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1905233.3333333333, ans=0.125 2023-11-22 10:27:32,495 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9250, loss[loss=0.06677, simple_loss=0.09495, pruned_loss=0.009405, audio_tagging_loss=0.009897, over 14961.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09536, pruned_loss=0.01535, audio_tagging_loss=0.009052, over 3050472.06 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:27:35,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1905300.0, ans=0.2 2023-11-22 10:27:36,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285800 2023-11-22 10:27:52,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1905366.6666666667, ans=0.1 2023-11-22 10:27:55,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1905366.6666666667, ans=0.025 2023-11-22 10:27:58,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1905433.3333333333, ans=0.125 2023-11-22 10:27:58,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-22 10:28:25,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1905566.6666666667, ans=0.0 2023-11-22 10:28:26,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-22 10:28:37,330 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9300, loss[loss=0.1104, simple_loss=0.1356, pruned_loss=0.03393, audio_tagging_loss=0.008676, over 15176.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09582, pruned_loss=0.01534, audio_tagging_loss=0.009043, over 3050721.66 frames. ], batch size: 53, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:28:41,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285850 2023-11-22 10:28:41,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2023-11-22 10:28:48,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1905700.0, ans=0.1 2023-11-22 10:28:49,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1905700.0, ans=0.2 2023-11-22 10:29:04,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-22 10:29:17,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1905833.3333333333, ans=0.0 2023-11-22 10:29:17,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1905833.3333333333, ans=0.125 2023-11-22 10:29:25,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.603e+01 8.386e+01 8.892e+01 9.597e+01 1.135e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 10:29:35,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1905900.0, ans=0.125 2023-11-22 10:29:41,929 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9350, loss[loss=0.0746, simple_loss=0.09487, pruned_loss=0.01774, audio_tagging_loss=0.009421, over 14486.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.0959, pruned_loss=0.01542, audio_tagging_loss=0.009084, over 3046158.79 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:29:42,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1905966.6666666667, ans=0.1 2023-11-22 10:29:45,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285900 2023-11-22 10:29:45,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1905966.6666666667, ans=0.125 2023-11-22 10:29:45,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1905966.6666666667, ans=0.125 2023-11-22 10:29:47,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1905966.6666666667, ans=10.0 2023-11-22 10:30:10,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1906100.0, ans=0.0 2023-11-22 10:30:45,398 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9400, loss[loss=0.06534, simple_loss=0.07776, pruned_loss=0.01633, audio_tagging_loss=0.01013, over 13811.00 frames. ], tot_loss[loss=0.07179, simple_loss=0.09452, pruned_loss=0.01532, audio_tagging_loss=0.009217, over 3045995.22 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:30:49,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 285950 2023-11-22 10:30:59,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-22 10:31:32,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.349e+01 8.920e+01 9.542e+01 1.196e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 10:31:48,505 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:31:49,691 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9450, loss[loss=0.06044, simple_loss=0.07498, pruned_loss=0.01155, audio_tagging_loss=0.0114, over 14334.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09455, pruned_loss=0.01542, audio_tagging_loss=0.009402, over 3058454.89 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:31:51,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1906633.3333333333, ans=0.0 2023-11-22 10:31:54,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286000 2023-11-22 10:32:10,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1906700.0, ans=0.125 2023-11-22 10:32:13,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1906700.0, ans=0.1 2023-11-22 10:32:46,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1906900.0, ans=0.1 2023-11-22 10:32:54,481 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9500, loss[loss=0.06133, simple_loss=0.08366, pruned_loss=0.01057, audio_tagging_loss=0.008924, over 15208.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09481, pruned_loss=0.01549, audio_tagging_loss=0.009434, over 3053475.00 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:32:54,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1906966.6666666667, ans=0.125 2023-11-22 10:32:58,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286050 2023-11-22 10:33:14,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2023-11-22 10:33:19,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1907100.0, ans=0.125 2023-11-22 10:33:32,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1907166.6666666667, ans=0.0 2023-11-22 10:33:36,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1907166.6666666667, ans=0.125 2023-11-22 10:33:39,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1907166.6666666667, ans=0.125 2023-11-22 10:33:42,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.223e+01 8.226e+01 8.759e+01 9.603e+01 1.524e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 10:33:55,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1907233.3333333333, ans=0.125 2023-11-22 10:33:56,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1907233.3333333333, ans=0.1 2023-11-22 10:33:58,987 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9550, loss[loss=0.08956, simple_loss=0.1142, pruned_loss=0.02372, audio_tagging_loss=0.008765, over 15667.00 frames. ], tot_loss[loss=0.0731, simple_loss=0.09568, pruned_loss=0.01574, audio_tagging_loss=0.009519, over 3053940.05 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:34:01,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1907300.0, ans=0.125 2023-11-22 10:34:02,711 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286100 2023-11-22 10:34:02,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1907300.0, ans=0.125 2023-11-22 10:34:05,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-22 10:34:27,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1907433.3333333333, ans=0.0 2023-11-22 10:34:28,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1907433.3333333333, ans=0.125 2023-11-22 10:35:04,197 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9600, loss[loss=0.06911, simple_loss=0.09814, pruned_loss=0.01237, audio_tagging_loss=0.007669, over 14115.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.09571, pruned_loss=0.01582, audio_tagging_loss=0.009545, over 3053124.88 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:35:07,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286150 2023-11-22 10:35:23,304 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:35:29,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1907766.6666666667, ans=0.05 2023-11-22 10:35:50,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1907833.3333333333, ans=0.125 2023-11-22 10:35:51,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.074e+01 8.745e+01 9.476e+01 1.191e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 10:35:56,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1907900.0, ans=0.125 2023-11-22 10:36:01,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1907900.0, ans=0.125 2023-11-22 10:36:07,695 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9650, loss[loss=0.04346, simple_loss=0.05095, pruned_loss=0.005644, audio_tagging_loss=0.01234, over 16150.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.09475, pruned_loss=0.01559, audio_tagging_loss=0.009481, over 3047580.90 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 32.0 2023-11-22 10:36:11,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-22 10:36:12,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286200 2023-11-22 10:36:20,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1908033.3333333333, ans=0.125 2023-11-22 10:36:29,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1908033.3333333333, ans=0.125 2023-11-22 10:36:30,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1908033.3333333333, ans=0.2 2023-11-22 10:36:55,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-22 10:37:02,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1908233.3333333333, ans=0.125 2023-11-22 10:37:12,263 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9700, loss[loss=0.07628, simple_loss=0.1056, pruned_loss=0.01449, audio_tagging_loss=0.008983, over 15606.00 frames. ], tot_loss[loss=0.07212, simple_loss=0.09446, pruned_loss=0.01546, audio_tagging_loss=0.009433, over 3044967.18 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:37:13,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1908300.0, ans=0.0 2023-11-22 10:37:16,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286250 2023-11-22 10:37:25,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1908366.6666666667, ans=0.1 2023-11-22 10:37:52,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1908500.0, ans=0.05 2023-11-22 10:37:54,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1908500.0, ans=0.0 2023-11-22 10:37:59,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 7.968e+01 8.645e+01 9.511e+01 1.252e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 10:38:10,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-22 10:38:16,141 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9750, loss[loss=0.07425, simple_loss=0.09952, pruned_loss=0.0165, audio_tagging_loss=0.007989, over 16111.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09399, pruned_loss=0.0154, audio_tagging_loss=0.00932, over 3046936.12 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:38:20,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286300 2023-11-22 10:38:23,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1908633.3333333333, ans=0.125 2023-11-22 10:38:25,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1908633.3333333333, ans=0.0 2023-11-22 10:38:33,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1908700.0, ans=0.2 2023-11-22 10:38:41,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1908766.6666666667, ans=0.125 2023-11-22 10:38:48,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1908766.6666666667, ans=0.0 2023-11-22 10:38:59,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-11-22 10:39:00,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1908833.3333333333, ans=0.0 2023-11-22 10:39:12,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1908900.0, ans=0.0 2023-11-22 10:39:20,669 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9800, loss[loss=0.07463, simple_loss=0.1031, pruned_loss=0.01396, audio_tagging_loss=0.009141, over 15748.00 frames. ], tot_loss[loss=0.07136, simple_loss=0.09366, pruned_loss=0.01533, audio_tagging_loss=0.009208, over 3040762.62 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:39:22,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1908966.6666666667, ans=0.125 2023-11-22 10:39:22,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-22 10:39:24,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286350 2023-11-22 10:39:30,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1908966.6666666667, ans=0.125 2023-11-22 10:39:34,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1909033.3333333333, ans=0.0 2023-11-22 10:39:57,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1909100.0, ans=0.1 2023-11-22 10:40:08,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.345e+01 8.979e+01 9.748e+01 1.178e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-22 10:40:18,745 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:40:25,475 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9850, loss[loss=0.057, simple_loss=0.07387, pruned_loss=0.01162, audio_tagging_loss=0.008443, over 14358.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09447, pruned_loss=0.01545, audio_tagging_loss=0.00909, over 3043504.63 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:40:29,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286400 2023-11-22 10:40:37,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1909366.6666666667, ans=0.125 2023-11-22 10:40:38,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-22 10:40:39,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-22 10:41:10,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1909500.0, ans=0.1 2023-11-22 10:41:16,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1909566.6666666667, ans=0.2 2023-11-22 10:41:30,057 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9900, loss[loss=0.07062, simple_loss=0.08868, pruned_loss=0.0162, audio_tagging_loss=0.01007, over 14331.00 frames. ], tot_loss[loss=0.07189, simple_loss=0.09475, pruned_loss=0.01541, audio_tagging_loss=0.009107, over 3039064.88 frames. ], batch size: 57, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:41:34,475 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286450 2023-11-22 10:41:37,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1909633.3333333333, ans=0.1 2023-11-22 10:41:38,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1909633.3333333333, ans=0.0 2023-11-22 10:41:49,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1909700.0, ans=0.1 2023-11-22 10:42:04,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1909766.6666666667, ans=0.2 2023-11-22 10:42:15,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1909833.3333333333, ans=0.0 2023-11-22 10:42:19,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.212e+01 8.837e+01 9.450e+01 1.215e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 10:42:34,516 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 9950, loss[loss=0.05968, simple_loss=0.07678, pruned_loss=0.01116, audio_tagging_loss=0.01013, over 15663.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09427, pruned_loss=0.01523, audio_tagging_loss=0.009265, over 3039006.45 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:42:34,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1909966.6666666667, ans=0.125 2023-11-22 10:42:37,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1909966.6666666667, ans=0.0 2023-11-22 10:42:38,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286500 2023-11-22 10:42:42,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1909966.6666666667, ans=0.125 2023-11-22 10:42:53,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1910033.3333333333, ans=0.1 2023-11-22 10:42:59,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-22 10:43:07,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1910100.0, ans=0.125 2023-11-22 10:43:17,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1910166.6666666667, ans=0.125 2023-11-22 10:43:30,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2023-11-22 10:43:39,009 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10000, loss[loss=0.06267, simple_loss=0.0938, pruned_loss=0.008461, audio_tagging_loss=0.007307, over 14716.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09412, pruned_loss=0.01516, audio_tagging_loss=0.009245, over 3044549.66 frames. ], batch size: 57, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:43:42,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286550 2023-11-22 10:43:56,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1910366.6666666667, ans=0.125 2023-11-22 10:44:01,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1910366.6666666667, ans=0.07 2023-11-22 10:44:20,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1910500.0, ans=0.05 2023-11-22 10:44:27,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.515e+01 8.076e+01 8.734e+01 9.444e+01 1.191e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 10:44:29,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1910566.6666666667, ans=0.125 2023-11-22 10:44:35,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1910566.6666666667, ans=0.0 2023-11-22 10:44:43,776 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10050, loss[loss=0.06468, simple_loss=0.0817, pruned_loss=0.01215, audio_tagging_loss=0.01168, over 16297.00 frames. ], tot_loss[loss=0.0718, simple_loss=0.09426, pruned_loss=0.01535, audio_tagging_loss=0.00931, over 3044446.69 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:44:46,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1910633.3333333333, ans=0.1 2023-11-22 10:44:47,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286600 2023-11-22 10:44:58,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1910700.0, ans=0.125 2023-11-22 10:45:02,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1910700.0, ans=0.1 2023-11-22 10:45:09,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1910766.6666666667, ans=0.125 2023-11-22 10:45:18,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:45:48,197 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10100, loss[loss=0.06154, simple_loss=0.08808, pruned_loss=0.009503, audio_tagging_loss=0.007998, over 16106.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09431, pruned_loss=0.0152, audio_tagging_loss=0.009294, over 3047286.00 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:45:52,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286650 2023-11-22 10:45:52,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2023-11-22 10:46:30,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1911166.6666666667, ans=0.125 2023-11-22 10:46:34,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1911166.6666666667, ans=0.07 2023-11-22 10:46:37,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1911166.6666666667, ans=0.0 2023-11-22 10:46:38,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.221e+01 8.968e+01 9.727e+01 1.162e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 10:46:40,869 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:46:52,449 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10150, loss[loss=0.07162, simple_loss=0.08294, pruned_loss=0.01522, audio_tagging_loss=0.01493, over 15866.00 frames. ], tot_loss[loss=0.07167, simple_loss=0.09411, pruned_loss=0.01518, audio_tagging_loss=0.009437, over 3048174.92 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:46:55,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1911300.0, ans=0.125 2023-11-22 10:46:56,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286700 2023-11-22 10:47:00,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1911300.0, ans=0.0 2023-11-22 10:47:02,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1911300.0, ans=0.125 2023-11-22 10:47:23,807 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:47:23,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1911433.3333333333, ans=0.125 2023-11-22 10:47:50,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-22 10:47:56,794 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10200, loss[loss=0.06493, simple_loss=0.08122, pruned_loss=0.01396, audio_tagging_loss=0.01035, over 15255.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09458, pruned_loss=0.0152, audio_tagging_loss=0.00944, over 3052379.77 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:48:00,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286750 2023-11-22 10:48:06,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1911633.3333333333, ans=0.1 2023-11-22 10:48:20,847 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 10:48:21,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1911766.6666666667, ans=0.2 2023-11-22 10:48:46,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.248e+01 8.751e+01 9.441e+01 1.217e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 10:49:00,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.08 vs. limit=6.0 2023-11-22 10:49:01,122 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10250, loss[loss=0.08065, simple_loss=0.1087, pruned_loss=0.0183, audio_tagging_loss=0.008013, over 14840.00 frames. ], tot_loss[loss=0.07169, simple_loss=0.09422, pruned_loss=0.01511, audio_tagging_loss=0.009473, over 3047587.40 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:49:04,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286800 2023-11-22 10:49:27,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2023-11-22 10:49:38,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1912100.0, ans=0.07 2023-11-22 10:49:40,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1912166.6666666667, ans=0.0 2023-11-22 10:49:46,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1912166.6666666667, ans=0.125 2023-11-22 10:49:48,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-11-22 10:49:52,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2023-11-22 10:49:58,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1912233.3333333333, ans=0.0 2023-11-22 10:50:05,644 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10300, loss[loss=0.05643, simple_loss=0.07032, pruned_loss=0.01237, audio_tagging_loss=0.008896, over 15968.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09446, pruned_loss=0.01516, audio_tagging_loss=0.009515, over 3049494.99 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:50:09,987 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286850 2023-11-22 10:50:26,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2023-11-22 10:50:56,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.119e+01 8.625e+01 9.295e+01 1.228e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 10:51:07,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1912566.6666666667, ans=0.2 2023-11-22 10:51:08,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-22 10:51:09,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1912633.3333333333, ans=0.125 2023-11-22 10:51:09,891 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10350, loss[loss=0.08479, simple_loss=0.09786, pruned_loss=0.0237, audio_tagging_loss=0.01216, over 15081.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09475, pruned_loss=0.01502, audio_tagging_loss=0.009642, over 3059065.67 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 10:51:14,275 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286900 2023-11-22 10:51:23,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1912700.0, ans=0.0 2023-11-22 10:51:31,989 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:51:49,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1912833.3333333333, ans=0.1 2023-11-22 10:51:54,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2023-11-22 10:51:56,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1912833.3333333333, ans=0.125 2023-11-22 10:52:05,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1912900.0, ans=0.125 2023-11-22 10:52:15,704 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10400, loss[loss=0.06077, simple_loss=0.07652, pruned_loss=0.01071, audio_tagging_loss=0.0118, over 16358.00 frames. ], tot_loss[loss=0.07192, simple_loss=0.09426, pruned_loss=0.01501, audio_tagging_loss=0.00978, over 3050519.92 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:52:19,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 286950 2023-11-22 10:52:24,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-22 10:52:36,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1913033.3333333333, ans=0.125 2023-11-22 10:53:05,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.131e+01 8.865e+01 9.715e+01 1.299e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 10:53:10,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1913233.3333333333, ans=0.125 2023-11-22 10:53:13,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1913233.3333333333, ans=0.0 2023-11-22 10:53:18,883 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10450, loss[loss=0.05655, simple_loss=0.07566, pruned_loss=0.005552, audio_tagging_loss=0.01317, over 15026.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.09363, pruned_loss=0.01493, audio_tagging_loss=0.009732, over 3049629.31 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:53:19,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1913300.0, ans=0.0 2023-11-22 10:53:23,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287000 2023-11-22 10:53:56,579 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:54:02,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1913500.0, ans=0.125 2023-11-22 10:54:22,303 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10500, loss[loss=0.06687, simple_loss=0.0902, pruned_loss=0.01074, audio_tagging_loss=0.01104, over 15091.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09321, pruned_loss=0.01488, audio_tagging_loss=0.009676, over 3047457.60 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:54:26,100 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287050 2023-11-22 10:54:26,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1913633.3333333333, ans=0.0 2023-11-22 10:54:45,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1913700.0, ans=0.025 2023-11-22 10:55:04,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1913833.3333333333, ans=0.1 2023-11-22 10:55:06,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-22 10:55:07,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1913833.3333333333, ans=0.025 2023-11-22 10:55:12,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1913833.3333333333, ans=0.125 2023-11-22 10:55:13,008 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.563e+01 8.123e+01 8.688e+01 9.437e+01 1.274e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-22 10:55:28,209 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10550, loss[loss=0.07038, simple_loss=0.09359, pruned_loss=0.0127, audio_tagging_loss=0.01089, over 15760.00 frames. ], tot_loss[loss=0.0705, simple_loss=0.09272, pruned_loss=0.01465, audio_tagging_loss=0.009486, over 3045374.12 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:55:31,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1913966.6666666667, ans=0.125 2023-11-22 10:55:32,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287100 2023-11-22 10:55:39,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1913966.6666666667, ans=0.1 2023-11-22 10:55:39,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1913966.6666666667, ans=0.0 2023-11-22 10:55:42,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1914033.3333333333, ans=0.125 2023-11-22 10:55:53,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-22 10:56:22,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1914233.3333333333, ans=0.1 2023-11-22 10:56:33,342 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10600, loss[loss=0.0675, simple_loss=0.08787, pruned_loss=0.01425, audio_tagging_loss=0.00931, over 14909.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09284, pruned_loss=0.01479, audio_tagging_loss=0.009424, over 3050506.75 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:56:37,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287150 2023-11-22 10:56:38,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-22 10:56:40,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1914300.0, ans=10.0 2023-11-22 10:56:42,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1914300.0, ans=0.2 2023-11-22 10:56:56,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-22 10:57:02,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-22 10:57:05,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1914433.3333333333, ans=0.125 2023-11-22 10:57:10,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1914500.0, ans=0.1 2023-11-22 10:57:23,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.829e+01 8.242e+01 9.024e+01 9.598e+01 1.340e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-22 10:57:26,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1914566.6666666667, ans=0.125 2023-11-22 10:57:37,610 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10650, loss[loss=0.08034, simple_loss=0.1137, pruned_loss=0.01574, audio_tagging_loss=0.007759, over 15083.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09366, pruned_loss=0.01479, audio_tagging_loss=0.009279, over 3052298.40 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:57:41,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287200 2023-11-22 10:57:51,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1914700.0, ans=0.125 2023-11-22 10:57:58,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1914700.0, ans=0.125 2023-11-22 10:58:02,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1914700.0, ans=0.125 2023-11-22 10:58:14,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1914766.6666666667, ans=0.125 2023-11-22 10:58:33,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-22 10:58:37,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1914900.0, ans=0.0 2023-11-22 10:58:41,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1914966.6666666667, ans=0.0 2023-11-22 10:58:42,746 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10700, loss[loss=0.07699, simple_loss=0.09919, pruned_loss=0.01869, audio_tagging_loss=0.008701, over 15358.00 frames. ], tot_loss[loss=0.07101, simple_loss=0.09365, pruned_loss=0.01489, audio_tagging_loss=0.009287, over 3045701.87 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:58:47,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287250 2023-11-22 10:58:52,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1914966.6666666667, ans=15.0 2023-11-22 10:59:19,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1915100.0, ans=0.125 2023-11-22 10:59:33,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 8.175e+01 8.915e+01 9.385e+01 1.445e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-22 10:59:36,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1915233.3333333333, ans=0.1 2023-11-22 10:59:46,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1915300.0, ans=0.125 2023-11-22 10:59:47,522 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10750, loss[loss=0.09149, simple_loss=0.1257, pruned_loss=0.01913, audio_tagging_loss=0.009515, over 15148.00 frames. ], tot_loss[loss=0.07091, simple_loss=0.09369, pruned_loss=0.01489, audio_tagging_loss=0.009175, over 3052718.97 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 10:59:49,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 10:59:51,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287300 2023-11-22 10:59:53,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1915300.0, ans=0.125 2023-11-22 11:00:00,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1915366.6666666667, ans=0.125 2023-11-22 11:00:11,591 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:00:26,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1915500.0, ans=0.2 2023-11-22 11:00:30,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1915500.0, ans=0.125 2023-11-22 11:00:40,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1915566.6666666667, ans=0.0 2023-11-22 11:00:41,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1915566.6666666667, ans=0.0 2023-11-22 11:00:42,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1915566.6666666667, ans=0.07 2023-11-22 11:00:51,829 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10800, loss[loss=0.09295, simple_loss=0.1261, pruned_loss=0.02249, audio_tagging_loss=0.007399, over 15441.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09272, pruned_loss=0.01478, audio_tagging_loss=0.009218, over 3047650.90 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 11:00:55,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287350 2023-11-22 11:00:58,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-22 11:01:11,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-11-22 11:01:19,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1915766.6666666667, ans=0.2 2023-11-22 11:01:42,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.103e+01 8.562e+01 9.437e+01 1.159e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-22 11:01:56,456 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10850, loss[loss=0.07128, simple_loss=0.09914, pruned_loss=0.01392, audio_tagging_loss=0.00779, over 14896.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09263, pruned_loss=0.0147, audio_tagging_loss=0.009243, over 3045688.42 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:02:00,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287400 2023-11-22 11:02:07,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1915966.6666666667, ans=0.125 2023-11-22 11:02:16,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1916033.3333333333, ans=0.125 2023-11-22 11:02:16,168 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:02:33,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2023-11-22 11:02:48,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-11-22 11:02:57,555 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:03:01,198 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10900, loss[loss=0.05683, simple_loss=0.07157, pruned_loss=0.008481, audio_tagging_loss=0.01257, over 15977.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09232, pruned_loss=0.01467, audio_tagging_loss=0.009297, over 3044669.60 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:03:01,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-22 11:03:04,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287450 2023-11-22 11:03:12,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1916300.0, ans=0.125 2023-11-22 11:03:12,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-22 11:03:25,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1916366.6666666667, ans=0.125 2023-11-22 11:03:46,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-22 11:03:51,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1916566.6666666667, ans=0.125 2023-11-22 11:03:52,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.229e+01 8.892e+01 9.728e+01 1.443e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 11:03:57,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1916566.6666666667, ans=0.125 2023-11-22 11:04:06,222 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 10950, loss[loss=0.09424, simple_loss=0.1148, pruned_loss=0.02503, audio_tagging_loss=0.01182, over 14662.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09241, pruned_loss=0.01479, audio_tagging_loss=0.0094, over 3048339.24 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:04:10,100 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287500 2023-11-22 11:04:18,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2023-11-22 11:04:59,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1916900.0, ans=0.2 2023-11-22 11:05:04,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1916900.0, ans=0.0 2023-11-22 11:05:10,064 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11000, loss[loss=0.07839, simple_loss=0.1067, pruned_loss=0.01812, audio_tagging_loss=0.006919, over 15351.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09176, pruned_loss=0.01477, audio_tagging_loss=0.009498, over 3048927.11 frames. ], batch size: 57, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:05:13,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287550 2023-11-22 11:05:15,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1916966.6666666667, ans=0.1 2023-11-22 11:05:21,789 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:05:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1917100.0, ans=0.125 2023-11-22 11:05:40,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1917100.0, ans=0.1 2023-11-22 11:05:47,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1917166.6666666667, ans=0.125 2023-11-22 11:05:54,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1917166.6666666667, ans=0.125 2023-11-22 11:06:00,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1917233.3333333333, ans=0.0 2023-11-22 11:06:01,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.398e+01 8.039e+01 8.665e+01 9.378e+01 1.568e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 11:06:10,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=15.0 2023-11-22 11:06:14,096 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11050, loss[loss=0.06152, simple_loss=0.07788, pruned_loss=0.01006, audio_tagging_loss=0.01252, over 14213.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09197, pruned_loss=0.01485, audio_tagging_loss=0.009528, over 3048896.64 frames. ], batch size: 54, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:06:17,911 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287600 2023-11-22 11:06:21,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-22 11:06:21,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1917300.0, ans=0.0 2023-11-22 11:06:26,346 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:06:27,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1917366.6666666667, ans=0.04949747468305833 2023-11-22 11:06:34,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2023-11-22 11:06:58,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-22 11:07:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1917566.6666666667, ans=0.125 2023-11-22 11:07:12,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-22 11:07:18,029 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11100, loss[loss=0.06162, simple_loss=0.0754, pruned_loss=0.01259, audio_tagging_loss=0.01133, over 15150.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09177, pruned_loss=0.01473, audio_tagging_loss=0.009611, over 3049373.74 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:07:21,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1917633.3333333333, ans=0.125 2023-11-22 11:07:22,403 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287650 2023-11-22 11:07:22,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1917633.3333333333, ans=0.125 2023-11-22 11:07:27,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1917633.3333333333, ans=0.125 2023-11-22 11:07:46,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1917766.6666666667, ans=0.125 2023-11-22 11:08:08,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1917900.0, ans=0.125 2023-11-22 11:08:09,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.222e+01 9.043e+01 9.663e+01 1.186e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-22 11:08:11,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1917900.0, ans=0.1 2023-11-22 11:08:11,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-22 11:08:13,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1917900.0, ans=0.125 2023-11-22 11:08:18,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-11-22 11:08:22,595 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11150, loss[loss=0.08894, simple_loss=0.1176, pruned_loss=0.02231, audio_tagging_loss=0.007813, over 16218.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09242, pruned_loss=0.01487, audio_tagging_loss=0.009662, over 3059705.87 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:08:26,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287700 2023-11-22 11:08:28,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1917966.6666666667, ans=0.0 2023-11-22 11:09:07,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1918166.6666666667, ans=0.125 2023-11-22 11:09:18,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1918233.3333333333, ans=0.1 2023-11-22 11:09:22,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-22 11:09:26,693 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11200, loss[loss=0.08211, simple_loss=0.1035, pruned_loss=0.01809, audio_tagging_loss=0.01225, over 14865.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09258, pruned_loss=0.01496, audio_tagging_loss=0.009773, over 3054490.91 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 11:09:29,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1918300.0, ans=0.125 2023-11-22 11:09:30,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287750 2023-11-22 11:09:44,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1918366.6666666667, ans=0.1 2023-11-22 11:10:14,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1918500.0, ans=0.125 2023-11-22 11:10:17,758 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.119e+01 8.627e+01 9.513e+01 1.631e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 11:10:21,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1918566.6666666667, ans=0.0 2023-11-22 11:10:21,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2023-11-22 11:10:30,705 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11250, loss[loss=0.0835, simple_loss=0.1093, pruned_loss=0.01947, audio_tagging_loss=0.009374, over 15468.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09271, pruned_loss=0.01498, audio_tagging_loss=0.009639, over 3059159.43 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 11:10:34,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287800 2023-11-22 11:10:46,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1918700.0, ans=0.09899494936611666 2023-11-22 11:11:06,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1918766.6666666667, ans=0.1 2023-11-22 11:11:11,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2023-11-22 11:11:25,464 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:11:35,604 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11300, loss[loss=0.06988, simple_loss=0.09647, pruned_loss=0.01421, audio_tagging_loss=0.007435, over 15287.00 frames. ], tot_loss[loss=0.07032, simple_loss=0.09198, pruned_loss=0.01482, audio_tagging_loss=0.009514, over 3051844.03 frames. ], batch size: 57, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 11:11:37,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1918966.6666666667, ans=0.0 2023-11-22 11:11:39,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287850 2023-11-22 11:12:28,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.163e+01 8.307e+01 9.060e+01 9.651e+01 1.397e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-22 11:12:31,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1919233.3333333333, ans=0.0 2023-11-22 11:12:35,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1919233.3333333333, ans=0.1 2023-11-22 11:12:38,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-22 11:12:40,199 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11350, loss[loss=0.05471, simple_loss=0.07032, pruned_loss=0.009168, audio_tagging_loss=0.01038, over 14925.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.09326, pruned_loss=0.01513, audio_tagging_loss=0.00934, over 3054102.45 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:12:44,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287900 2023-11-22 11:12:54,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1919366.6666666667, ans=0.04949747468305833 2023-11-22 11:12:57,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1919366.6666666667, ans=0.0 2023-11-22 11:13:00,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1919366.6666666667, ans=0.1 2023-11-22 11:13:06,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1919433.3333333333, ans=0.0 2023-11-22 11:13:29,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1919500.0, ans=0.2 2023-11-22 11:13:31,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1919566.6666666667, ans=0.0 2023-11-22 11:13:37,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1919566.6666666667, ans=0.0 2023-11-22 11:13:42,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1919633.3333333333, ans=0.02 2023-11-22 11:13:43,678 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11400, loss[loss=0.03805, simple_loss=0.04542, pruned_loss=0.003569, audio_tagging_loss=0.01177, over 14645.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09379, pruned_loss=0.01506, audio_tagging_loss=0.009219, over 3055681.95 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:13:45,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1919633.3333333333, ans=0.125 2023-11-22 11:13:47,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 287950 2023-11-22 11:13:54,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1919633.3333333333, ans=0.0 2023-11-22 11:14:11,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2023-11-22 11:14:17,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-22 11:14:21,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1919833.3333333333, ans=0.125 2023-11-22 11:14:29,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-22 11:14:32,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1919833.3333333333, ans=0.0 2023-11-22 11:14:35,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1919900.0, ans=0.0 2023-11-22 11:14:36,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.311e+01 8.086e+01 8.647e+01 9.538e+01 1.200e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 11:14:47,051 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11450, loss[loss=0.06761, simple_loss=0.07846, pruned_loss=0.014, audio_tagging_loss=0.01438, over 15136.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09322, pruned_loss=0.01513, audio_tagging_loss=0.009182, over 3052423.76 frames. ], batch size: 60, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:14:50,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288000 2023-11-22 11:15:04,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-22 11:15:14,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-22 11:15:28,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1920166.6666666667, ans=0.1 2023-11-22 11:15:50,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1920233.3333333333, ans=0.0 2023-11-22 11:15:55,780 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11500, loss[loss=0.06699, simple_loss=0.09309, pruned_loss=0.01116, audio_tagging_loss=0.009278, over 16872.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.09412, pruned_loss=0.0153, audio_tagging_loss=0.009244, over 3058198.82 frames. ], batch size: 64, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:15:59,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288050 2023-11-22 11:16:02,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1920300.0, ans=0.0 2023-11-22 11:16:09,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1920366.6666666667, ans=0.04949747468305833 2023-11-22 11:16:12,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1920366.6666666667, ans=0.125 2023-11-22 11:16:37,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1920500.0, ans=0.1 2023-11-22 11:16:47,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.254e+01 8.955e+01 9.456e+01 2.559e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-22 11:16:50,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1920566.6666666667, ans=0.0 2023-11-22 11:16:56,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1920566.6666666667, ans=0.125 2023-11-22 11:16:59,001 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11550, loss[loss=0.1035, simple_loss=0.1388, pruned_loss=0.02482, audio_tagging_loss=0.009217, over 15743.00 frames. ], tot_loss[loss=0.07162, simple_loss=0.09437, pruned_loss=0.01532, audio_tagging_loss=0.009109, over 3058868.16 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:17:02,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288100 2023-11-22 11:17:04,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1920633.3333333333, ans=0.125 2023-11-22 11:17:05,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1920633.3333333333, ans=0.125 2023-11-22 11:17:30,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-22 11:17:36,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1920833.3333333333, ans=0.2 2023-11-22 11:17:37,678 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:17:46,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1920833.3333333333, ans=0.125 2023-11-22 11:18:02,506 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11600, loss[loss=0.08079, simple_loss=0.1105, pruned_loss=0.01892, audio_tagging_loss=0.006601, over 15876.00 frames. ], tot_loss[loss=0.07162, simple_loss=0.0942, pruned_loss=0.01534, audio_tagging_loss=0.009185, over 3056850.92 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 32.0 2023-11-22 11:18:06,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288150 2023-11-22 11:18:20,493 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:18:47,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-22 11:18:56,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.284e+01 8.414e+01 8.937e+01 9.626e+01 1.842e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-22 11:19:02,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-22 11:19:03,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1921233.3333333333, ans=0.125 2023-11-22 11:19:07,314 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11650, loss[loss=0.05049, simple_loss=0.0689, pruned_loss=0.006696, audio_tagging_loss=0.00934, over 14391.00 frames. ], tot_loss[loss=0.07191, simple_loss=0.09474, pruned_loss=0.0154, audio_tagging_loss=0.009137, over 3056545.50 frames. ], batch size: 54, lr: 2.82e-03, grad_scale: 16.0 2023-11-22 11:19:11,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288200 2023-11-22 11:19:28,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1921366.6666666667, ans=0.125 2023-11-22 11:19:40,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2023-11-22 11:19:56,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-11-22 11:20:11,335 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11700, loss[loss=0.07673, simple_loss=0.09603, pruned_loss=0.01937, audio_tagging_loss=0.009342, over 15620.00 frames. ], tot_loss[loss=0.07132, simple_loss=0.09374, pruned_loss=0.01515, audio_tagging_loss=0.009305, over 3063877.06 frames. ], batch size: 57, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:20:13,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1921633.3333333333, ans=0.0 2023-11-22 11:20:15,000 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288250 2023-11-22 11:20:30,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1921700.0, ans=0.1 2023-11-22 11:20:43,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1921766.6666666667, ans=0.125 2023-11-22 11:20:46,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-11-22 11:20:56,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1921833.3333333333, ans=0.125 2023-11-22 11:21:05,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.380e+01 8.961e+01 9.495e+01 1.206e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 11:21:14,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1921966.6666666667, ans=0.2 2023-11-22 11:21:15,671 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11750, loss[loss=0.07153, simple_loss=0.08707, pruned_loss=0.01906, audio_tagging_loss=0.00893, over 15115.00 frames. ], tot_loss[loss=0.072, simple_loss=0.09467, pruned_loss=0.01538, audio_tagging_loss=0.009286, over 3064571.10 frames. ], batch size: 58, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:21:18,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1921966.6666666667, ans=0.125 2023-11-22 11:21:19,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288300 2023-11-22 11:21:22,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1921966.6666666667, ans=0.125 2023-11-22 11:21:23,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1921966.6666666667, ans=0.125 2023-11-22 11:21:27,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2023-11-22 11:21:56,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1922166.6666666667, ans=0.125 2023-11-22 11:22:01,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1922166.6666666667, ans=0.2 2023-11-22 11:22:02,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1922166.6666666667, ans=0.125 2023-11-22 11:22:15,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1922233.3333333333, ans=0.0 2023-11-22 11:22:17,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1922233.3333333333, ans=0.125 2023-11-22 11:22:20,065 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11800, loss[loss=0.08335, simple_loss=0.1169, pruned_loss=0.01568, audio_tagging_loss=0.009207, over 15448.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09502, pruned_loss=0.01556, audio_tagging_loss=0.009299, over 3060303.44 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:22:24,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288350 2023-11-22 11:22:35,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1922366.6666666667, ans=0.0 2023-11-22 11:22:38,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1922366.6666666667, ans=0.125 2023-11-22 11:22:45,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-22 11:23:04,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1922500.0, ans=0.05 2023-11-22 11:23:11,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1922566.6666666667, ans=0.125 2023-11-22 11:23:14,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.024e+01 8.163e+01 8.779e+01 9.476e+01 1.167e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 11:23:21,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1922566.6666666667, ans=0.2 2023-11-22 11:23:24,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1922633.3333333333, ans=0.125 2023-11-22 11:23:25,267 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11850, loss[loss=0.05825, simple_loss=0.06847, pruned_loss=0.01434, audio_tagging_loss=0.009672, over 13852.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.09352, pruned_loss=0.01518, audio_tagging_loss=0.009489, over 3047669.16 frames. ], batch size: 52, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:23:25,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1922633.3333333333, ans=0.0 2023-11-22 11:23:29,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288400 2023-11-22 11:23:57,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1922766.6666666667, ans=0.0 2023-11-22 11:24:00,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1922766.6666666667, ans=0.0 2023-11-22 11:24:29,687 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11900, loss[loss=0.1134, simple_loss=0.1559, pruned_loss=0.02896, audio_tagging_loss=0.006494, over 14388.00 frames. ], tot_loss[loss=0.07185, simple_loss=0.09403, pruned_loss=0.0153, audio_tagging_loss=0.009531, over 3050672.05 frames. ], batch size: 53, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:24:33,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288450 2023-11-22 11:24:33,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1922966.6666666667, ans=0.2 2023-11-22 11:24:41,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2023-11-22 11:25:10,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1923166.6666666667, ans=0.04949747468305833 2023-11-22 11:25:23,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.115e+01 8.834e+01 9.649e+01 1.264e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 11:25:29,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2023-11-22 11:25:30,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-22 11:25:34,086 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 11950, loss[loss=0.04576, simple_loss=0.05337, pruned_loss=0.007235, audio_tagging_loss=0.01184, over 14524.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.0934, pruned_loss=0.01519, audio_tagging_loss=0.009711, over 3053472.63 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 16.0 2023-11-22 11:25:37,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288500 2023-11-22 11:25:42,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1923300.0, ans=0.1 2023-11-22 11:25:53,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1923366.6666666667, ans=0.125 2023-11-22 11:25:56,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1923366.6666666667, ans=0.0 2023-11-22 11:25:57,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1923366.6666666667, ans=0.04949747468305833 2023-11-22 11:26:00,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1923433.3333333333, ans=0.2 2023-11-22 11:26:02,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=8.0 2023-11-22 11:26:14,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1923500.0, ans=0.07 2023-11-22 11:26:15,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-11-22 11:26:19,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1923500.0, ans=0.125 2023-11-22 11:26:28,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1923566.6666666667, ans=0.2 2023-11-22 11:26:35,550 INFO [train_asr.py:1221] (1/4) Epoch 24, batch 12000, loss[loss=0.07232, simple_loss=0.09566, pruned_loss=0.01407, audio_tagging_loss=0.01042, over 15206.00 frames. ], tot_loss[loss=0.07167, simple_loss=0.09351, pruned_loss=0.01515, audio_tagging_loss=0.009763, over 3050126.13 frames. ], batch size: 56, lr: 2.81e-03, grad_scale: 32.0 2023-11-22 11:26:35,551 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 11:27:17,208 INFO [train_asr.py:1253] (1/4) Epoch 24, validation: loss=0.05896, simple_loss=0.05166, pruned_loss=0.00516, audio_tagging_loss=0.02797, over 4681554.00 frames. 2023-11-22 11:27:17,209 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 11:27:20,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288550 2023-11-22 11:28:19,673 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 0, loss[loss=0.07337, simple_loss=0.08047, pruned_loss=0.009942, audio_tagging_loss=0.0232, over 14053.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.08047, pruned_loss=0.009942, audio_tagging_loss=0.0232, over 14053.00 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 32.0 2023-11-22 11:28:19,674 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 11:28:55,801 INFO [train_asr.py:1253] (1/4) Epoch 25, validation: loss=0.05903, simple_loss=0.05164, pruned_loss=0.005146, audio_tagging_loss=0.02807, over 4681554.00 frames. 2023-11-22 11:28:55,803 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 11:29:18,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.611e+01 9.501e+01 1.042e+02 1.380e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-22 11:29:19,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1923860.0, ans=0.0 2023-11-22 11:29:24,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1923926.6666666667, ans=0.125 2023-11-22 11:29:29,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1923926.6666666667, ans=0.0 2023-11-22 11:29:34,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288600 2023-11-22 11:29:49,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1924060.0, ans=0.125 2023-11-22 11:30:00,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-22 11:30:00,632 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 50, loss[loss=0.09646, simple_loss=0.1161, pruned_loss=0.02114, audio_tagging_loss=0.01725, over 16047.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.09383, pruned_loss=0.01506, audio_tagging_loss=0.01844, over 676777.94 frames. ], batch size: 58, lr: 2.76e-03, grad_scale: 32.0 2023-11-22 11:30:03,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1924126.6666666667, ans=0.5 2023-11-22 11:30:14,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1924193.3333333333, ans=0.125 2023-11-22 11:30:38,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288650 2023-11-22 11:31:04,551 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 100, loss[loss=0.06287, simple_loss=0.07912, pruned_loss=0.009772, audio_tagging_loss=0.01354, over 14284.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09272, pruned_loss=0.01467, audio_tagging_loss=0.01731, over 1195414.16 frames. ], batch size: 54, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:31:23,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1924526.6666666667, ans=0.02 2023-11-22 11:31:28,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.752e+01 9.299e+01 1.007e+02 1.330e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-22 11:31:42,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288700 2023-11-22 11:31:46,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-22 11:31:58,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-22 11:32:10,038 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 150, loss[loss=0.06525, simple_loss=0.08965, pruned_loss=0.01065, audio_tagging_loss=0.009772, over 15782.00 frames. ], tot_loss[loss=0.07629, simple_loss=0.09239, pruned_loss=0.01463, audio_tagging_loss=0.01546, over 1611016.94 frames. ], batch size: 60, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:32:15,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1924793.3333333333, ans=10.0 2023-11-22 11:32:47,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288750 2023-11-22 11:32:53,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1924993.3333333333, ans=0.125 2023-11-22 11:33:14,437 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 200, loss[loss=0.07686, simple_loss=0.09571, pruned_loss=0.01504, audio_tagging_loss=0.01397, over 14529.00 frames. ], tot_loss[loss=0.07456, simple_loss=0.09223, pruned_loss=0.01465, audio_tagging_loss=0.0138, over 1932003.73 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:33:23,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1925126.6666666667, ans=0.125 2023-11-22 11:33:27,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:33:32,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1925193.3333333333, ans=0.02 2023-11-22 11:33:37,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.185e+01 8.728e+01 9.456e+01 1.257e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 11:33:38,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1925193.3333333333, ans=0.2 2023-11-22 11:33:39,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1925260.0, ans=0.2 2023-11-22 11:33:43,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1925260.0, ans=0.0 2023-11-22 11:33:48,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1925260.0, ans=0.125 2023-11-22 11:33:51,621 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288800 2023-11-22 11:33:53,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1925326.6666666667, ans=0.125 2023-11-22 11:33:53,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-22 11:34:02,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1925326.6666666667, ans=0.5 2023-11-22 11:34:18,344 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 250, loss[loss=0.06218, simple_loss=0.08006, pruned_loss=0.0129, audio_tagging_loss=0.009248, over 15002.00 frames. ], tot_loss[loss=0.07354, simple_loss=0.09275, pruned_loss=0.01472, audio_tagging_loss=0.01244, over 2183710.40 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:34:25,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-22 11:34:29,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-22 11:34:53,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1925593.3333333333, ans=0.125 2023-11-22 11:34:55,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288850 2023-11-22 11:35:22,892 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 300, loss[loss=0.06377, simple_loss=0.08456, pruned_loss=0.01445, audio_tagging_loss=0.007035, over 14498.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.09319, pruned_loss=0.01493, audio_tagging_loss=0.01155, over 2378116.72 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:35:25,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1925793.3333333333, ans=0.0 2023-11-22 11:35:33,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-22 11:35:45,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.207e+01 8.755e+01 9.551e+01 1.256e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 11:35:59,624 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288900 2023-11-22 11:36:00,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-11-22 11:36:14,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1926060.0, ans=0.125 2023-11-22 11:36:27,372 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 350, loss[loss=0.07773, simple_loss=0.1047, pruned_loss=0.01746, audio_tagging_loss=0.007927, over 14683.00 frames. ], tot_loss[loss=0.07328, simple_loss=0.09468, pruned_loss=0.01508, audio_tagging_loss=0.01086, over 2529596.32 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:37:05,096 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 288950 2023-11-22 11:37:32,207 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 400, loss[loss=0.07691, simple_loss=0.1031, pruned_loss=0.01314, audio_tagging_loss=0.01223, over 16737.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09456, pruned_loss=0.0151, audio_tagging_loss=0.01043, over 2658450.17 frames. ], batch size: 60, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:37:43,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1926460.0, ans=0.125 2023-11-22 11:37:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1926526.6666666667, ans=0.0 2023-11-22 11:37:55,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.456e+01 8.000e+01 8.654e+01 9.629e+01 1.287e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-22 11:38:00,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-22 11:38:02,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1926593.3333333333, ans=0.125 2023-11-22 11:38:05,253 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:38:09,841 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289000 2023-11-22 11:38:19,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1926660.0, ans=0.125 2023-11-22 11:38:21,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1926660.0, ans=0.125 2023-11-22 11:38:28,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1926726.6666666667, ans=0.125 2023-11-22 11:38:37,039 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 450, loss[loss=0.07847, simple_loss=0.1062, pruned_loss=0.01684, audio_tagging_loss=0.008548, over 15067.00 frames. ], tot_loss[loss=0.07237, simple_loss=0.09427, pruned_loss=0.01507, audio_tagging_loss=0.01016, over 2747909.26 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:39:02,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1926926.6666666667, ans=0.07 2023-11-22 11:39:04,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1926926.6666666667, ans=0.125 2023-11-22 11:39:05,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1926926.6666666667, ans=0.125 2023-11-22 11:39:14,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289050 2023-11-22 11:39:33,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1927060.0, ans=0.04949747468305833 2023-11-22 11:39:42,478 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 500, loss[loss=0.0795, simple_loss=0.1086, pruned_loss=0.01677, audio_tagging_loss=0.008428, over 14456.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09468, pruned_loss=0.01528, audio_tagging_loss=0.01001, over 2821807.30 frames. ], batch size: 55, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:39:53,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1927193.3333333333, ans=0.2 2023-11-22 11:39:58,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-22 11:40:05,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.293e+01 8.940e+01 9.677e+01 1.305e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 11:40:19,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289100 2023-11-22 11:40:29,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1927326.6666666667, ans=0.0 2023-11-22 11:40:47,556 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 550, loss[loss=0.07395, simple_loss=0.1023, pruned_loss=0.01539, audio_tagging_loss=0.007439, over 14246.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09358, pruned_loss=0.01503, audio_tagging_loss=0.009814, over 2870675.17 frames. ], batch size: 55, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:41:09,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1927526.6666666667, ans=0.125 2023-11-22 11:41:24,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289150 2023-11-22 11:41:29,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-22 11:41:38,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1927726.6666666667, ans=0.0 2023-11-22 11:41:51,756 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 600, loss[loss=0.0878, simple_loss=0.1171, pruned_loss=0.02172, audio_tagging_loss=0.007537, over 15657.00 frames. ], tot_loss[loss=0.07181, simple_loss=0.09413, pruned_loss=0.01509, audio_tagging_loss=0.009658, over 2912858.70 frames. ], batch size: 59, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:42:00,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-22 11:42:16,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 7.918e+01 8.734e+01 9.306e+01 1.583e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 11:42:29,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289200 2023-11-22 11:42:45,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1928060.0, ans=0.125 2023-11-22 11:42:51,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1928060.0, ans=0.125 2023-11-22 11:42:52,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-22 11:42:57,718 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 650, loss[loss=0.07157, simple_loss=0.09534, pruned_loss=0.01306, audio_tagging_loss=0.01084, over 14964.00 frames. ], tot_loss[loss=0.07232, simple_loss=0.09483, pruned_loss=0.01532, audio_tagging_loss=0.009579, over 2943915.78 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:42:59,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1928126.6666666667, ans=0.125 2023-11-22 11:43:06,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1928126.6666666667, ans=0.125 2023-11-22 11:43:11,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1928193.3333333333, ans=0.125 2023-11-22 11:43:15,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1928193.3333333333, ans=0.125 2023-11-22 11:43:34,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289250 2023-11-22 11:43:39,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2023-11-22 11:43:43,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1928326.6666666667, ans=0.2 2023-11-22 11:43:46,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1928326.6666666667, ans=0.0 2023-11-22 11:43:50,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1928393.3333333333, ans=15.0 2023-11-22 11:44:01,467 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 700, loss[loss=0.09402, simple_loss=0.1249, pruned_loss=0.02356, audio_tagging_loss=0.00799, over 16003.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09485, pruned_loss=0.01544, audio_tagging_loss=0.009422, over 2972210.26 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:44:26,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.213e+01 8.759e+01 9.279e+01 1.127e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 11:44:26,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1928593.3333333333, ans=0.125 2023-11-22 11:44:28,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1928593.3333333333, ans=0.0 2023-11-22 11:44:39,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289300 2023-11-22 11:44:42,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1928660.0, ans=0.0 2023-11-22 11:44:50,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1928660.0, ans=0.0 2023-11-22 11:45:02,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1928726.6666666667, ans=0.125 2023-11-22 11:45:03,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1928726.6666666667, ans=0.0 2023-11-22 11:45:05,548 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 750, loss[loss=0.07609, simple_loss=0.1029, pruned_loss=0.01652, audio_tagging_loss=0.008111, over 16120.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.0956, pruned_loss=0.01561, audio_tagging_loss=0.009411, over 2994430.85 frames. ], batch size: 61, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:45:07,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1928793.3333333333, ans=0.0 2023-11-22 11:45:17,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1928860.0, ans=0.1 2023-11-22 11:45:21,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1928860.0, ans=0.125 2023-11-22 11:45:32,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1928926.6666666667, ans=0.0 2023-11-22 11:45:39,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1928926.6666666667, ans=0.2 2023-11-22 11:45:39,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1928926.6666666667, ans=0.125 2023-11-22 11:45:43,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289350 2023-11-22 11:46:10,232 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 800, loss[loss=0.06249, simple_loss=0.08803, pruned_loss=0.01195, audio_tagging_loss=0.006523, over 16578.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09515, pruned_loss=0.0154, audio_tagging_loss=0.009494, over 3007490.78 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:46:13,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1929126.6666666667, ans=0.0 2023-11-22 11:46:15,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1929126.6666666667, ans=0.0 2023-11-22 11:46:16,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2023-11-22 11:46:29,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.24 vs. limit=22.5 2023-11-22 11:46:36,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.155e+01 8.807e+01 9.499e+01 1.198e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 11:46:46,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1929260.0, ans=0.1 2023-11-22 11:46:48,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289400 2023-11-22 11:46:55,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1929326.6666666667, ans=0.1 2023-11-22 11:47:08,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1929393.3333333333, ans=0.125 2023-11-22 11:47:10,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1929393.3333333333, ans=0.0 2023-11-22 11:47:10,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1929393.3333333333, ans=0.125 2023-11-22 11:47:16,259 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 850, loss[loss=0.0497, simple_loss=0.05904, pruned_loss=0.005597, audio_tagging_loss=0.01458, over 16653.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09444, pruned_loss=0.01512, audio_tagging_loss=0.009556, over 3020555.62 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:47:16,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1929460.0, ans=0.0 2023-11-22 11:47:30,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1929526.6666666667, ans=0.125 2023-11-22 11:47:30,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1929526.6666666667, ans=0.125 2023-11-22 11:47:36,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2023-11-22 11:47:42,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1929593.3333333333, ans=0.0 2023-11-22 11:47:54,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289450 2023-11-22 11:47:59,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1929660.0, ans=0.125 2023-11-22 11:48:21,275 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 900, loss[loss=0.05433, simple_loss=0.06469, pruned_loss=0.009274, audio_tagging_loss=0.01271, over 16212.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.09465, pruned_loss=0.01521, audio_tagging_loss=0.009718, over 3024625.93 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:48:39,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1929860.0, ans=0.125 2023-11-22 11:48:46,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1929926.6666666667, ans=0.125 2023-11-22 11:48:46,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.150e+01 8.654e+01 9.309e+01 1.377e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-22 11:48:58,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289500 2023-11-22 11:48:59,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1929993.3333333333, ans=0.125 2023-11-22 11:49:02,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1929993.3333333333, ans=0.125 2023-11-22 11:49:02,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1929993.3333333333, ans=0.0 2023-11-22 11:49:25,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1930126.6666666667, ans=0.0 2023-11-22 11:49:26,016 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 950, loss[loss=0.08708, simple_loss=0.1229, pruned_loss=0.01967, audio_tagging_loss=0.005965, over 14835.00 frames. ], tot_loss[loss=0.07226, simple_loss=0.09504, pruned_loss=0.01517, audio_tagging_loss=0.009566, over 3032269.53 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:49:27,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1930126.6666666667, ans=0.125 2023-11-22 11:49:36,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1930126.6666666667, ans=0.0 2023-11-22 11:50:01,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1930260.0, ans=0.0 2023-11-22 11:50:02,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1930260.0, ans=0.125 2023-11-22 11:50:04,408 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289550 2023-11-22 11:50:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1930326.6666666667, ans=0.125 2023-11-22 11:50:19,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1930393.3333333333, ans=0.1 2023-11-22 11:50:23,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1930393.3333333333, ans=0.125 2023-11-22 11:50:23,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-22 11:50:29,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1930460.0, ans=0.125 2023-11-22 11:50:30,845 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1000, loss[loss=0.07437, simple_loss=0.0857, pruned_loss=0.019, audio_tagging_loss=0.01252, over 14876.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.09482, pruned_loss=0.01525, audio_tagging_loss=0.009423, over 3030194.48 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:50:32,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2023-11-22 11:50:35,471 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:50:39,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1930460.0, ans=0.1 2023-11-22 11:50:47,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1930526.6666666667, ans=0.1 2023-11-22 11:50:48,409 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:50:51,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=22.5 2023-11-22 11:50:57,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.182e+01 8.192e+01 8.953e+01 9.947e+01 1.294e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 11:50:58,864 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:51:08,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289600 2023-11-22 11:51:09,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-11-22 11:51:34,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1930726.6666666667, ans=22.5 2023-11-22 11:51:36,487 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1050, loss[loss=0.06059, simple_loss=0.07287, pruned_loss=0.01216, audio_tagging_loss=0.01199, over 14898.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09578, pruned_loss=0.01543, audio_tagging_loss=0.00928, over 3043244.98 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:51:42,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2023-11-22 11:52:13,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289650 2023-11-22 11:52:40,616 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1100, loss[loss=0.0666, simple_loss=0.08695, pruned_loss=0.01431, audio_tagging_loss=0.008813, over 15761.00 frames. ], tot_loss[loss=0.07276, simple_loss=0.09595, pruned_loss=0.01556, audio_tagging_loss=0.009223, over 3042722.52 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:52:40,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1931126.6666666667, ans=0.125 2023-11-22 11:52:44,283 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:53:06,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.014e+01 8.517e+01 9.464e+01 1.257e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-22 11:53:12,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1931260.0, ans=0.0 2023-11-22 11:53:13,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1931260.0, ans=0.125 2023-11-22 11:53:18,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289700 2023-11-22 11:53:45,304 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1150, loss[loss=0.07753, simple_loss=0.1066, pruned_loss=0.01522, audio_tagging_loss=0.008979, over 14690.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09511, pruned_loss=0.01545, audio_tagging_loss=0.0092, over 3044217.90 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:53:52,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1931460.0, ans=0.0 2023-11-22 11:53:59,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1931526.6666666667, ans=0.125 2023-11-22 11:54:03,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1931526.6666666667, ans=0.1 2023-11-22 11:54:18,100 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:54:19,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-22 11:54:21,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1931593.3333333333, ans=0.125 2023-11-22 11:54:22,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289750 2023-11-22 11:54:25,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1931660.0, ans=0.125 2023-11-22 11:54:41,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1931726.6666666667, ans=0.0 2023-11-22 11:54:45,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1931726.6666666667, ans=0.1 2023-11-22 11:54:48,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1931726.6666666667, ans=0.125 2023-11-22 11:54:50,874 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1200, loss[loss=0.0928, simple_loss=0.1205, pruned_loss=0.02473, audio_tagging_loss=0.007798, over 15498.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09529, pruned_loss=0.01553, audio_tagging_loss=0.009103, over 3034137.82 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:55:06,643 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:55:16,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.526e+01 8.025e+01 8.580e+01 9.215e+01 1.708e+02, threshold=1.716e+02, percent-clipped=1.0 2023-11-22 11:55:17,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1931926.6666666667, ans=0.09899494936611666 2023-11-22 11:55:19,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1931926.6666666667, ans=0.0 2023-11-22 11:55:28,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289800 2023-11-22 11:55:32,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1931993.3333333333, ans=0.0 2023-11-22 11:55:41,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1931993.3333333333, ans=0.125 2023-11-22 11:55:41,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1931993.3333333333, ans=0.2 2023-11-22 11:55:54,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1932060.0, ans=0.1 2023-11-22 11:55:55,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1932126.6666666667, ans=0.125 2023-11-22 11:55:56,229 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1250, loss[loss=0.05211, simple_loss=0.06304, pruned_loss=0.008492, audio_tagging_loss=0.0121, over 14458.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09532, pruned_loss=0.01556, audio_tagging_loss=0.009056, over 3033351.46 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:56:08,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-22 11:56:08,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2023-11-22 11:56:18,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1932193.3333333333, ans=0.125 2023-11-22 11:56:33,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289850 2023-11-22 11:56:45,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1932326.6666666667, ans=0.0 2023-11-22 11:57:00,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-11-22 11:57:00,762 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1300, loss[loss=0.08699, simple_loss=0.1167, pruned_loss=0.02018, audio_tagging_loss=0.008476, over 16068.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09385, pruned_loss=0.01516, audio_tagging_loss=0.009161, over 3034497.48 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:57:04,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1932460.0, ans=0.0 2023-11-22 11:57:26,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.056e+01 8.885e+01 9.522e+01 1.296e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 11:57:26,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1932593.3333333333, ans=0.1 2023-11-22 11:57:29,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1932593.3333333333, ans=0.125 2023-11-22 11:57:38,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289900 2023-11-22 11:57:39,887 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:57:54,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1932726.6666666667, ans=0.125 2023-11-22 11:57:59,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-22 11:58:05,421 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1350, loss[loss=0.06749, simple_loss=0.08709, pruned_loss=0.01567, audio_tagging_loss=0.008279, over 14280.00 frames. ], tot_loss[loss=0.07156, simple_loss=0.09419, pruned_loss=0.01524, audio_tagging_loss=0.009221, over 3038720.02 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 11:58:15,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1932793.3333333333, ans=0.125 2023-11-22 11:58:28,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-22 11:58:35,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1932926.6666666667, ans=0.0 2023-11-22 11:58:41,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1932926.6666666667, ans=0.0 2023-11-22 11:58:42,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 289950 2023-11-22 11:58:45,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1932993.3333333333, ans=15.0 2023-11-22 11:58:52,188 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 11:59:01,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933060.0, ans=0.125 2023-11-22 11:59:07,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 11:59:10,472 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1400, loss[loss=0.04908, simple_loss=0.05296, pruned_loss=0.006486, audio_tagging_loss=0.01612, over 14401.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09426, pruned_loss=0.01524, audio_tagging_loss=0.009292, over 3040137.88 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 11:59:10,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1933126.6666666667, ans=0.0 2023-11-22 11:59:26,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1933193.3333333333, ans=0.125 2023-11-22 11:59:36,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.260e+01 8.672e+01 9.990e+01 1.427e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-22 11:59:38,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1933260.0, ans=0.1 2023-11-22 11:59:47,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290000 2023-11-22 12:00:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1933393.3333333333, ans=0.0 2023-11-22 12:00:09,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-22 12:00:15,344 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1450, loss[loss=0.07854, simple_loss=0.1094, pruned_loss=0.01583, audio_tagging_loss=0.008026, over 14587.00 frames. ], tot_loss[loss=0.07185, simple_loss=0.09454, pruned_loss=0.01525, audio_tagging_loss=0.009334, over 3041074.62 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:00:24,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1933460.0, ans=0.125 2023-11-22 12:00:28,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933526.6666666667, ans=0.125 2023-11-22 12:00:29,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1933526.6666666667, ans=0.125 2023-11-22 12:00:53,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290050 2023-11-22 12:01:20,149 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1500, loss[loss=0.07341, simple_loss=0.105, pruned_loss=0.01333, audio_tagging_loss=0.0076, over 14835.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09398, pruned_loss=0.01522, audio_tagging_loss=0.009388, over 3039590.55 frames. ], batch size: 54, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:01:30,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1933793.3333333333, ans=0.125 2023-11-22 12:01:43,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1933860.0, ans=0.125 2023-11-22 12:01:46,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.126e+01 8.732e+01 9.513e+01 1.269e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 12:01:49,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1933926.6666666667, ans=0.125 2023-11-22 12:01:57,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290100 2023-11-22 12:02:06,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1933993.3333333333, ans=0.125 2023-11-22 12:02:24,747 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1550, loss[loss=0.07254, simple_loss=0.1045, pruned_loss=0.01406, audio_tagging_loss=0.006251, over 15737.00 frames. ], tot_loss[loss=0.07149, simple_loss=0.09369, pruned_loss=0.01513, audio_tagging_loss=0.009522, over 3043309.70 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:02:28,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1934126.6666666667, ans=0.125 2023-11-22 12:02:28,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1934126.6666666667, ans=0.125 2023-11-22 12:02:29,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1934126.6666666667, ans=0.2 2023-11-22 12:02:37,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2023-11-22 12:02:46,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1934193.3333333333, ans=0.125 2023-11-22 12:02:53,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-22 12:03:02,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290150 2023-11-22 12:03:10,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1934326.6666666667, ans=0.125 2023-11-22 12:03:19,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-22 12:03:26,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1934393.3333333333, ans=0.125 2023-11-22 12:03:30,909 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1600, loss[loss=0.07027, simple_loss=0.09285, pruned_loss=0.01411, audio_tagging_loss=0.009732, over 15747.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09435, pruned_loss=0.01537, audio_tagging_loss=0.009661, over 3044738.59 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2023-11-22 12:03:41,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1934460.0, ans=0.035 2023-11-22 12:03:47,654 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:03:53,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1934526.6666666667, ans=0.125 2023-11-22 12:03:58,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.200e+01 8.897e+01 9.742e+01 1.180e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-22 12:04:08,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290200 2023-11-22 12:04:09,145 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.528e-02 2023-11-22 12:04:18,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1934660.0, ans=0.5 2023-11-22 12:04:26,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1934726.6666666667, ans=0.2 2023-11-22 12:04:26,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1934726.6666666667, ans=15.0 2023-11-22 12:04:27,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1934726.6666666667, ans=0.04949747468305833 2023-11-22 12:04:31,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1934726.6666666667, ans=0.2 2023-11-22 12:04:35,534 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1650, loss[loss=0.06002, simple_loss=0.08243, pruned_loss=0.01196, audio_tagging_loss=0.006849, over 15655.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09364, pruned_loss=0.01514, audio_tagging_loss=0.009634, over 3046066.29 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:04:42,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1934793.3333333333, ans=0.125 2023-11-22 12:04:48,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2023-11-22 12:05:11,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1934926.6666666667, ans=0.1 2023-11-22 12:05:13,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1934926.6666666667, ans=0.125 2023-11-22 12:05:13,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290250 2023-11-22 12:05:15,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1934993.3333333333, ans=0.0 2023-11-22 12:05:19,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1934993.3333333333, ans=0.1 2023-11-22 12:05:20,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1934993.3333333333, ans=0.125 2023-11-22 12:05:25,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1934993.3333333333, ans=0.1 2023-11-22 12:05:40,391 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1700, loss[loss=0.09281, simple_loss=0.1424, pruned_loss=0.01686, audio_tagging_loss=0.004748, over 15761.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09395, pruned_loss=0.01523, audio_tagging_loss=0.009728, over 3048188.84 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:05:55,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-22 12:06:10,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.333e+01 9.072e+01 9.807e+01 1.367e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-22 12:06:10,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1935260.0, ans=0.2 2023-11-22 12:06:15,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2023-11-22 12:06:18,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290300 2023-11-22 12:06:45,833 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1750, loss[loss=0.07096, simple_loss=0.09569, pruned_loss=0.01622, audio_tagging_loss=0.006893, over 15130.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09359, pruned_loss=0.01506, audio_tagging_loss=0.009608, over 3048345.49 frames. ], batch size: 55, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:07:00,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1935526.6666666667, ans=0.125 2023-11-22 12:07:05,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1935526.6666666667, ans=0.125 2023-11-22 12:07:22,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1935593.3333333333, ans=0.125 2023-11-22 12:07:23,611 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290350 2023-11-22 12:07:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1935660.0, ans=0.0 2023-11-22 12:07:43,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1935726.6666666667, ans=0.0 2023-11-22 12:07:45,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-22 12:07:46,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1935726.6666666667, ans=0.0 2023-11-22 12:07:50,639 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1800, loss[loss=0.06955, simple_loss=0.08577, pruned_loss=0.01371, audio_tagging_loss=0.01295, over 14960.00 frames. ], tot_loss[loss=0.07123, simple_loss=0.09344, pruned_loss=0.01507, audio_tagging_loss=0.009442, over 3047959.89 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:07:52,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-22 12:08:19,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.690e+01 8.156e+01 8.799e+01 9.735e+01 1.149e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 12:08:20,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-22 12:08:28,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290400 2023-11-22 12:08:55,120 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1850, loss[loss=0.05989, simple_loss=0.0827, pruned_loss=0.01019, audio_tagging_loss=0.008361, over 14771.00 frames. ], tot_loss[loss=0.07156, simple_loss=0.09407, pruned_loss=0.01518, audio_tagging_loss=0.009349, over 3053870.51 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:09:33,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290450 2023-11-22 12:09:33,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1936326.6666666667, ans=0.2 2023-11-22 12:09:58,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1936393.3333333333, ans=22.5 2023-11-22 12:09:59,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1936460.0, ans=0.125 2023-11-22 12:09:59,888 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1900, loss[loss=0.07737, simple_loss=0.1013, pruned_loss=0.01539, audio_tagging_loss=0.01133, over 14091.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09451, pruned_loss=0.01502, audio_tagging_loss=0.009173, over 3056729.27 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:10:13,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1936526.6666666667, ans=0.1 2023-11-22 12:10:16,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1936526.6666666667, ans=0.125 2023-11-22 12:10:19,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1936526.6666666667, ans=0.125 2023-11-22 12:10:23,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1936526.6666666667, ans=0.125 2023-11-22 12:10:23,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1936526.6666666667, ans=0.125 2023-11-22 12:10:29,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.157e+01 8.748e+01 9.500e+01 1.354e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 12:10:37,255 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290500 2023-11-22 12:11:04,301 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 1950, loss[loss=0.07411, simple_loss=0.09621, pruned_loss=0.01595, audio_tagging_loss=0.01006, over 15122.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09521, pruned_loss=0.01514, audio_tagging_loss=0.009186, over 3055296.26 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 8.0 2023-11-22 12:11:21,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-22 12:11:32,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1936926.6666666667, ans=0.125 2023-11-22 12:11:41,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290550 2023-11-22 12:12:08,551 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2000, loss[loss=0.07628, simple_loss=0.1042, pruned_loss=0.0169, audio_tagging_loss=0.007298, over 15705.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09429, pruned_loss=0.01491, audio_tagging_loss=0.009199, over 3053571.25 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:12:19,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1937126.6666666667, ans=0.125 2023-11-22 12:12:24,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1937193.3333333333, ans=0.0 2023-11-22 12:12:37,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1937260.0, ans=0.1 2023-11-22 12:12:38,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.037e+01 8.579e+01 9.384e+01 1.253e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-22 12:12:42,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1937260.0, ans=0.125 2023-11-22 12:12:45,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290600 2023-11-22 12:12:47,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1937326.6666666667, ans=0.0 2023-11-22 12:12:54,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1937326.6666666667, ans=0.125 2023-11-22 12:13:07,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-22 12:13:13,304 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2050, loss[loss=0.06885, simple_loss=0.09683, pruned_loss=0.0141, audio_tagging_loss=0.006327, over 15465.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.09437, pruned_loss=0.01497, audio_tagging_loss=0.009252, over 3046299.28 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:13:20,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-22 12:13:21,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1937460.0, ans=0.125 2023-11-22 12:13:28,595 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:13:31,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2023-11-22 12:13:38,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1937593.3333333333, ans=0.0 2023-11-22 12:13:42,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1937593.3333333333, ans=0.2 2023-11-22 12:13:44,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1937593.3333333333, ans=0.0 2023-11-22 12:13:50,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290650 2023-11-22 12:14:07,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1937726.6666666667, ans=0.125 2023-11-22 12:14:14,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-11-22 12:14:18,215 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2100, loss[loss=0.07698, simple_loss=0.1072, pruned_loss=0.01537, audio_tagging_loss=0.008004, over 15439.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09393, pruned_loss=0.01484, audio_tagging_loss=0.00912, over 3048710.53 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:14:27,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-22 12:14:34,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1937860.0, ans=0.125 2023-11-22 12:14:36,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1937860.0, ans=0.0 2023-11-22 12:14:47,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.164e+01 8.808e+01 9.351e+01 1.143e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 12:14:51,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1937926.6666666667, ans=0.125 2023-11-22 12:14:52,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1937926.6666666667, ans=0.1 2023-11-22 12:14:54,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290700 2023-11-22 12:15:02,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2023-11-22 12:15:05,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1937993.3333333333, ans=0.125 2023-11-22 12:15:13,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1938060.0, ans=0.0 2023-11-22 12:15:14,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1938060.0, ans=0.04949747468305833 2023-11-22 12:15:22,337 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2150, loss[loss=0.05928, simple_loss=0.07639, pruned_loss=0.01201, audio_tagging_loss=0.009076, over 13984.00 frames. ], tot_loss[loss=0.07106, simple_loss=0.09407, pruned_loss=0.01489, audio_tagging_loss=0.009136, over 3048026.29 frames. ], batch size: 52, lr: 2.75e-03, grad_scale: 16.0 2023-11-22 12:15:35,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1938193.3333333333, ans=0.07 2023-11-22 12:15:55,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1938260.0, ans=0.2 2023-11-22 12:15:56,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1938260.0, ans=0.125 2023-11-22 12:15:59,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290750 2023-11-22 12:16:01,970 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:16:21,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1938393.3333333333, ans=0.0 2023-11-22 12:16:26,548 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2200, loss[loss=0.07697, simple_loss=0.09052, pruned_loss=0.01708, audio_tagging_loss=0.01463, over 15181.00 frames. ], tot_loss[loss=0.07135, simple_loss=0.09412, pruned_loss=0.01505, audio_tagging_loss=0.00924, over 3045862.26 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:16:28,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1938460.0, ans=0.1 2023-11-22 12:16:35,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1938460.0, ans=0.125 2023-11-22 12:16:56,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.385e+01 8.783e+01 9.663e+01 1.140e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 12:17:00,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1938593.3333333333, ans=0.125 2023-11-22 12:17:01,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1938593.3333333333, ans=0.125 2023-11-22 12:17:03,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290800 2023-11-22 12:17:30,925 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2250, loss[loss=0.05872, simple_loss=0.0697, pruned_loss=0.01318, audio_tagging_loss=0.01069, over 14184.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09424, pruned_loss=0.01512, audio_tagging_loss=0.009181, over 3042537.99 frames. ], batch size: 54, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:18:08,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290850 2023-11-22 12:18:30,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1939060.0, ans=0.1 2023-11-22 12:18:36,068 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2300, loss[loss=0.06505, simple_loss=0.08209, pruned_loss=0.01406, audio_tagging_loss=0.00995, over 15814.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09365, pruned_loss=0.01497, audio_tagging_loss=0.00928, over 3040685.93 frames. ], batch size: 60, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:18:55,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1939193.3333333333, ans=0.0 2023-11-22 12:18:57,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=15.0 2023-11-22 12:18:57,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1939193.3333333333, ans=0.0 2023-11-22 12:19:01,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1939260.0, ans=0.125 2023-11-22 12:19:04,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1939260.0, ans=0.1 2023-11-22 12:19:06,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.285e+01 8.843e+01 9.474e+01 1.631e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 12:19:07,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1939260.0, ans=0.0 2023-11-22 12:19:12,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2023-11-22 12:19:12,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290900 2023-11-22 12:19:24,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1939326.6666666667, ans=0.125 2023-11-22 12:19:27,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1939393.3333333333, ans=0.125 2023-11-22 12:19:31,958 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:19:32,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1939393.3333333333, ans=0.0 2023-11-22 12:19:39,816 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2350, loss[loss=0.09192, simple_loss=0.1161, pruned_loss=0.02279, audio_tagging_loss=0.01105, over 14207.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09356, pruned_loss=0.01497, audio_tagging_loss=0.009515, over 3037808.34 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:19:56,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1939526.6666666667, ans=0.2 2023-11-22 12:19:57,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1939526.6666666667, ans=0.1 2023-11-22 12:19:59,317 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:20:05,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1939593.3333333333, ans=0.2 2023-11-22 12:20:15,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1939593.3333333333, ans=0.2 2023-11-22 12:20:17,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 290950 2023-11-22 12:20:30,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-22 12:20:44,666 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2400, loss[loss=0.0705, simple_loss=0.0874, pruned_loss=0.01315, audio_tagging_loss=0.01364, over 15768.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09386, pruned_loss=0.01521, audio_tagging_loss=0.009596, over 3045440.54 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:20:54,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1939793.3333333333, ans=0.07 2023-11-22 12:21:15,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.394e+01 8.930e+01 9.736e+01 4.109e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-22 12:21:21,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291000 2023-11-22 12:21:34,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1939993.3333333333, ans=0.125 2023-11-22 12:21:50,319 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2450, loss[loss=0.07927, simple_loss=0.1012, pruned_loss=0.01849, audio_tagging_loss=0.01017, over 15594.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09338, pruned_loss=0.01508, audio_tagging_loss=0.009688, over 3042093.63 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:22:08,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1940193.3333333333, ans=0.1 2023-11-22 12:22:13,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-22 12:22:18,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1940260.0, ans=0.125 2023-11-22 12:22:22,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1940260.0, ans=0.035 2023-11-22 12:22:26,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1940260.0, ans=0.125 2023-11-22 12:22:28,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291050 2023-11-22 12:22:42,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1940393.3333333333, ans=0.0 2023-11-22 12:22:56,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-22 12:22:56,580 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2500, loss[loss=0.07592, simple_loss=0.09821, pruned_loss=0.01595, audio_tagging_loss=0.01087, over 15831.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09247, pruned_loss=0.01502, audio_tagging_loss=0.009736, over 3036278.71 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:23:09,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2023-11-22 12:23:15,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1940526.6666666667, ans=0.2 2023-11-22 12:23:18,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1940526.6666666667, ans=0.2 2023-11-22 12:23:26,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.685e+01 8.038e+01 8.739e+01 9.351e+01 1.232e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 12:23:29,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1940593.3333333333, ans=0.0 2023-11-22 12:23:29,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1940593.3333333333, ans=0.0 2023-11-22 12:23:31,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1940593.3333333333, ans=0.2 2023-11-22 12:23:33,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291100 2023-11-22 12:23:38,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1940660.0, ans=0.125 2023-11-22 12:23:42,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1940660.0, ans=0.0 2023-11-22 12:23:49,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1940726.6666666667, ans=0.125 2023-11-22 12:23:57,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1940726.6666666667, ans=0.0 2023-11-22 12:24:01,986 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2550, loss[loss=0.08182, simple_loss=0.106, pruned_loss=0.02033, audio_tagging_loss=0.008516, over 16467.00 frames. ], tot_loss[loss=0.07061, simple_loss=0.09198, pruned_loss=0.01489, audio_tagging_loss=0.009731, over 3040304.95 frames. ], batch size: 60, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:24:12,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1940793.3333333333, ans=0.125 2023-11-22 12:24:16,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-22 12:24:29,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-22 12:24:40,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291150 2023-11-22 12:24:53,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1941060.0, ans=0.0 2023-11-22 12:25:06,893 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2600, loss[loss=0.04903, simple_loss=0.06082, pruned_loss=0.008274, audio_tagging_loss=0.01035, over 16257.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09158, pruned_loss=0.01484, audio_tagging_loss=0.009583, over 3036067.73 frames. ], batch size: 62, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:25:19,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1941193.3333333333, ans=0.1 2023-11-22 12:25:37,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1941260.0, ans=0.0 2023-11-22 12:25:38,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1941260.0, ans=0.0 2023-11-22 12:25:39,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.159e+01 9.048e+01 9.727e+01 1.323e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-22 12:25:42,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1941260.0, ans=0.0 2023-11-22 12:25:46,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291200 2023-11-22 12:25:50,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1941326.6666666667, ans=0.0 2023-11-22 12:26:12,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1941460.0, ans=0.0 2023-11-22 12:26:14,640 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2650, loss[loss=0.08317, simple_loss=0.1074, pruned_loss=0.01917, audio_tagging_loss=0.0103, over 16197.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09162, pruned_loss=0.0148, audio_tagging_loss=0.009626, over 3037517.16 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:26:16,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1941460.0, ans=0.125 2023-11-22 12:26:21,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1941460.0, ans=0.1 2023-11-22 12:26:23,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1941460.0, ans=0.0 2023-11-22 12:26:25,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1941460.0, ans=0.0 2023-11-22 12:26:27,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1941526.6666666667, ans=0.0 2023-11-22 12:26:28,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1941526.6666666667, ans=0.0 2023-11-22 12:26:33,851 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:26:41,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-11-22 12:26:53,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291250 2023-11-22 12:26:57,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2023-11-22 12:27:01,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1941660.0, ans=0.125 2023-11-22 12:27:08,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-22 12:27:20,085 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2700, loss[loss=0.0993, simple_loss=0.1326, pruned_loss=0.02453, audio_tagging_loss=0.00846, over 15858.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09174, pruned_loss=0.01489, audio_tagging_loss=0.009387, over 3045143.08 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:27:36,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-11-22 12:27:50,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 7.955e+01 8.601e+01 9.459e+01 1.243e+02, threshold=1.720e+02, percent-clipped=0.0 2023-11-22 12:27:57,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-22 12:27:58,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291300 2023-11-22 12:28:25,093 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2750, loss[loss=0.05926, simple_loss=0.07773, pruned_loss=0.01155, audio_tagging_loss=0.008844, over 15738.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09166, pruned_loss=0.01482, audio_tagging_loss=0.009312, over 3038463.56 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:28:33,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-22 12:28:39,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1942193.3333333333, ans=0.1 2023-11-22 12:28:42,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1942193.3333333333, ans=0.0 2023-11-22 12:28:53,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-22 12:29:03,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291350 2023-11-22 12:29:05,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1942326.6666666667, ans=0.0 2023-11-22 12:29:13,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1942326.6666666667, ans=0.125 2023-11-22 12:29:18,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1942393.3333333333, ans=0.2 2023-11-22 12:29:21,444 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:29:28,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-22 12:29:30,140 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2800, loss[loss=0.07527, simple_loss=0.1013, pruned_loss=0.01726, audio_tagging_loss=0.007335, over 15523.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09155, pruned_loss=0.0147, audio_tagging_loss=0.009295, over 3041475.69 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:29:40,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1942460.0, ans=0.0 2023-11-22 12:29:42,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1942526.6666666667, ans=0.125 2023-11-22 12:29:43,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-22 12:29:47,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-22 12:30:01,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.537e+01 8.249e+01 8.938e+01 9.573e+01 1.382e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 12:30:07,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291400 2023-11-22 12:30:17,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1942660.0, ans=0.125 2023-11-22 12:30:36,163 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2850, loss[loss=0.05891, simple_loss=0.07835, pruned_loss=0.01207, audio_tagging_loss=0.007664, over 15683.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09148, pruned_loss=0.01459, audio_tagging_loss=0.009232, over 3035540.83 frames. ], batch size: 61, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:30:50,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1942860.0, ans=0.1 2023-11-22 12:30:54,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1942860.0, ans=0.125 2023-11-22 12:31:04,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-22 12:31:08,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1942926.6666666667, ans=0.025 2023-11-22 12:31:13,610 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291450 2023-11-22 12:31:14,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1942993.3333333333, ans=0.125 2023-11-22 12:31:18,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1942993.3333333333, ans=0.125 2023-11-22 12:31:23,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1942993.3333333333, ans=0.0 2023-11-22 12:31:25,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-22 12:31:26,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1943060.0, ans=0.125 2023-11-22 12:31:34,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-22 12:31:40,807 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2900, loss[loss=0.09249, simple_loss=0.1247, pruned_loss=0.02144, audio_tagging_loss=0.008718, over 15931.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.0925, pruned_loss=0.01489, audio_tagging_loss=0.009234, over 3037353.28 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:31:58,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1943193.3333333333, ans=0.125 2023-11-22 12:31:59,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1943193.3333333333, ans=0.02 2023-11-22 12:32:10,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-22 12:32:11,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.290e+01 8.892e+01 9.564e+01 1.220e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 12:32:19,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291500 2023-11-22 12:32:45,619 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 2950, loss[loss=0.09967, simple_loss=0.1316, pruned_loss=0.02611, audio_tagging_loss=0.007762, over 15405.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09346, pruned_loss=0.01511, audio_tagging_loss=0.009238, over 3044818.14 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:32:52,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1943460.0, ans=0.0 2023-11-22 12:32:53,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-22 12:32:53,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1943460.0, ans=0.0 2023-11-22 12:32:57,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1943526.6666666667, ans=0.125 2023-11-22 12:33:06,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1943526.6666666667, ans=0.125 2023-11-22 12:33:09,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1943526.6666666667, ans=0.125 2023-11-22 12:33:23,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291550 2023-11-22 12:33:30,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2023-11-22 12:33:32,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1943660.0, ans=0.0 2023-11-22 12:33:40,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1943726.6666666667, ans=0.125 2023-11-22 12:33:43,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1943726.6666666667, ans=0.2 2023-11-22 12:33:44,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1943726.6666666667, ans=0.125 2023-11-22 12:33:50,772 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3000, loss[loss=0.0639, simple_loss=0.08469, pruned_loss=0.01189, audio_tagging_loss=0.00967, over 14865.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.09359, pruned_loss=0.01508, audio_tagging_loss=0.009296, over 3038112.79 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:33:50,772 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 12:34:15,420 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1937, 3.0560, 3.3381, 3.0054, 3.7431, 3.8182, 3.2906, 3.1324], device='cuda:1') 2023-11-22 12:34:19,571 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3606, 5.0571, 4.7986, 5.1865], device='cuda:1') 2023-11-22 12:34:29,092 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2656, 3.0151, 3.6952, 3.5201], device='cuda:1') 2023-11-22 12:34:30,473 INFO [train_asr.py:1253] (1/4) Epoch 25, validation: loss=0.05876, simple_loss=0.05157, pruned_loss=0.005103, audio_tagging_loss=0.02788, over 4681554.00 frames. 2023-11-22 12:34:30,473 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 12:34:30,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1943793.3333333333, ans=0.125 2023-11-22 12:34:51,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1943860.0, ans=0.5 2023-11-22 12:35:01,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.344e+01 8.945e+01 9.581e+01 1.218e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-22 12:35:08,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291600 2023-11-22 12:35:21,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-22 12:35:33,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1944060.0, ans=0.125 2023-11-22 12:35:35,432 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3050, loss[loss=0.06286, simple_loss=0.07913, pruned_loss=0.01142, audio_tagging_loss=0.01188, over 16720.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.09392, pruned_loss=0.01516, audio_tagging_loss=0.009323, over 3039791.10 frames. ], batch size: 65, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:35:38,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1944126.6666666667, ans=0.025 2023-11-22 12:35:54,799 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:36:05,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1944260.0, ans=0.125 2023-11-22 12:36:11,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1944260.0, ans=0.0 2023-11-22 12:36:13,309 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:36:13,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291650 2023-11-22 12:36:16,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1944326.6666666667, ans=0.0 2023-11-22 12:36:32,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-11-22 12:36:35,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1944393.3333333333, ans=0.0 2023-11-22 12:36:39,981 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3100, loss[loss=0.06941, simple_loss=0.0877, pruned_loss=0.01593, audio_tagging_loss=0.009628, over 15022.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.09378, pruned_loss=0.01518, audio_tagging_loss=0.0094, over 3036720.27 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:36:40,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-22 12:36:41,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1944460.0, ans=0.0 2023-11-22 12:36:46,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1944460.0, ans=0.0 2023-11-22 12:37:10,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.355e+01 8.957e+01 9.791e+01 1.144e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 12:37:17,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291700 2023-11-22 12:37:23,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1944660.0, ans=0.0 2023-11-22 12:37:42,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-22 12:37:44,664 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3150, loss[loss=0.0635, simple_loss=0.08056, pruned_loss=0.01251, audio_tagging_loss=0.01071, over 15472.00 frames. ], tot_loss[loss=0.07206, simple_loss=0.09481, pruned_loss=0.01524, audio_tagging_loss=0.009413, over 3042816.18 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:38:10,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1944926.6666666667, ans=0.05 2023-11-22 12:38:11,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1944926.6666666667, ans=0.125 2023-11-22 12:38:21,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291750 2023-11-22 12:38:41,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1945060.0, ans=10.0 2023-11-22 12:38:44,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1945060.0, ans=0.2 2023-11-22 12:38:49,290 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3200, loss[loss=0.06558, simple_loss=0.09176, pruned_loss=0.01029, audio_tagging_loss=0.009409, over 14504.00 frames. ], tot_loss[loss=0.07204, simple_loss=0.09444, pruned_loss=0.01524, audio_tagging_loss=0.009574, over 3040161.34 frames. ], batch size: 53, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:39:01,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1945193.3333333333, ans=0.0 2023-11-22 12:39:16,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1945260.0, ans=0.0 2023-11-22 12:39:17,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1945260.0, ans=0.125 2023-11-22 12:39:20,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.187e+01 8.718e+01 9.345e+01 1.176e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-22 12:39:27,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291800 2023-11-22 12:39:33,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1945326.6666666667, ans=0.125 2023-11-22 12:39:35,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1945326.6666666667, ans=0.2 2023-11-22 12:39:37,833 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:39:49,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1945393.3333333333, ans=0.1 2023-11-22 12:39:53,966 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3250, loss[loss=0.06914, simple_loss=0.08798, pruned_loss=0.01641, audio_tagging_loss=0.008742, over 14709.00 frames. ], tot_loss[loss=0.0718, simple_loss=0.09405, pruned_loss=0.01517, audio_tagging_loss=0.009601, over 3051582.93 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:40:09,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1945526.6666666667, ans=0.125 2023-11-22 12:40:16,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1945526.6666666667, ans=0.2 2023-11-22 12:40:21,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1945593.3333333333, ans=0.2 2023-11-22 12:40:31,195 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291850 2023-11-22 12:40:32,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1945660.0, ans=0.125 2023-11-22 12:40:58,271 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3300, loss[loss=0.07068, simple_loss=0.08739, pruned_loss=0.01722, audio_tagging_loss=0.009767, over 15468.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09397, pruned_loss=0.0152, audio_tagging_loss=0.009651, over 3052391.35 frames. ], batch size: 60, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:41:01,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-22 12:41:24,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1945926.6666666667, ans=0.025 2023-11-22 12:41:29,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.169e+01 8.738e+01 9.487e+01 1.174e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 12:41:30,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-22 12:41:34,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1945926.6666666667, ans=0.125 2023-11-22 12:41:35,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291900 2023-11-22 12:41:42,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1945993.3333333333, ans=0.125 2023-11-22 12:42:03,230 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3350, loss[loss=0.06632, simple_loss=0.09059, pruned_loss=0.01344, audio_tagging_loss=0.007582, over 15189.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.0933, pruned_loss=0.01505, audio_tagging_loss=0.009552, over 3045891.42 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:42:15,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1946193.3333333333, ans=0.125 2023-11-22 12:42:16,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-11-22 12:42:26,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-22 12:42:40,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 291950 2023-11-22 12:42:40,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1946326.6666666667, ans=0.0 2023-11-22 12:42:43,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1946326.6666666667, ans=0.0 2023-11-22 12:42:50,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1946326.6666666667, ans=0.125 2023-11-22 12:43:08,064 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3400, loss[loss=0.06278, simple_loss=0.0842, pruned_loss=0.01405, audio_tagging_loss=0.006637, over 15005.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09405, pruned_loss=0.01513, audio_tagging_loss=0.009295, over 3055627.27 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2023-11-22 12:43:09,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2023-11-22 12:43:21,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2023-11-22 12:43:32,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1946593.3333333333, ans=0.0 2023-11-22 12:43:38,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.227e+01 8.817e+01 9.429e+01 1.318e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 12:43:45,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292000 2023-11-22 12:43:57,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1946660.0, ans=0.0 2023-11-22 12:44:09,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-22 12:44:15,281 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3450, loss[loss=0.06036, simple_loss=0.08268, pruned_loss=0.0112, audio_tagging_loss=0.007812, over 15533.00 frames. ], tot_loss[loss=0.07161, simple_loss=0.09435, pruned_loss=0.01522, audio_tagging_loss=0.009211, over 3054371.88 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:44:16,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1946793.3333333333, ans=0.125 2023-11-22 12:44:16,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1946793.3333333333, ans=0.025 2023-11-22 12:44:24,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-22 12:44:38,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1946860.0, ans=0.125 2023-11-22 12:44:46,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-22 12:44:51,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292050 2023-11-22 12:45:19,185 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3500, loss[loss=0.06411, simple_loss=0.08524, pruned_loss=0.01311, audio_tagging_loss=0.008382, over 16070.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09365, pruned_loss=0.01487, audio_tagging_loss=0.009234, over 3051407.98 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:45:36,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2023-11-22 12:45:37,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1947193.3333333333, ans=0.125 2023-11-22 12:45:42,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=22.5 2023-11-22 12:45:50,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1947260.0, ans=0.125 2023-11-22 12:45:51,056 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:45:52,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.561e+01 7.990e+01 8.840e+01 9.495e+01 1.155e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-22 12:45:55,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292100 2023-11-22 12:46:02,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1947326.6666666667, ans=0.125 2023-11-22 12:46:23,357 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3550, loss[loss=0.08361, simple_loss=0.1169, pruned_loss=0.01758, audio_tagging_loss=0.007564, over 16031.00 frames. ], tot_loss[loss=0.07048, simple_loss=0.09299, pruned_loss=0.01475, audio_tagging_loss=0.009235, over 3052488.19 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:46:28,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1947460.0, ans=0.125 2023-11-22 12:46:37,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1947526.6666666667, ans=0.125 2023-11-22 12:46:48,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1947593.3333333333, ans=0.125 2023-11-22 12:47:00,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292150 2023-11-22 12:47:18,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1947726.6666666667, ans=0.125 2023-11-22 12:47:27,281 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3600, loss[loss=0.07212, simple_loss=0.08758, pruned_loss=0.01734, audio_tagging_loss=0.01099, over 15561.00 frames. ], tot_loss[loss=0.07113, simple_loss=0.09387, pruned_loss=0.01505, audio_tagging_loss=0.00915, over 3051214.24 frames. ], batch size: 61, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:47:38,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1947860.0, ans=0.2 2023-11-22 12:47:48,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1947860.0, ans=0.05 2023-11-22 12:47:48,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1947860.0, ans=0.125 2023-11-22 12:47:57,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1947926.6666666667, ans=0.1 2023-11-22 12:48:00,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.103e+01 8.938e+01 1.008e+02 1.423e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 12:48:04,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292200 2023-11-22 12:48:30,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1948126.6666666667, ans=0.0 2023-11-22 12:48:31,933 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3650, loss[loss=0.08451, simple_loss=0.109, pruned_loss=0.02065, audio_tagging_loss=0.009361, over 14843.00 frames. ], tot_loss[loss=0.07167, simple_loss=0.09462, pruned_loss=0.01525, audio_tagging_loss=0.009113, over 3055333.93 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:48:47,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2023-11-22 12:48:48,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1948193.3333333333, ans=0.0 2023-11-22 12:49:09,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292250 2023-11-22 12:49:21,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1948326.6666666667, ans=15.0 2023-11-22 12:49:37,062 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3700, loss[loss=0.08699, simple_loss=0.1251, pruned_loss=0.01888, audio_tagging_loss=0.005558, over 15475.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.09409, pruned_loss=0.01533, audio_tagging_loss=0.009092, over 3051076.71 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:49:37,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1948460.0, ans=0.0 2023-11-22 12:49:41,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-11-22 12:50:05,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2023-11-22 12:50:10,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.131e+01 8.890e+01 9.560e+01 1.123e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 12:50:13,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292300 2023-11-22 12:50:14,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1948660.0, ans=0.2 2023-11-22 12:50:41,534 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3750, loss[loss=0.06395, simple_loss=0.0804, pruned_loss=0.01232, audio_tagging_loss=0.01143, over 15172.00 frames. ], tot_loss[loss=0.07186, simple_loss=0.09445, pruned_loss=0.0154, audio_tagging_loss=0.009232, over 3048453.18 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:50:46,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1948793.3333333333, ans=0.125 2023-11-22 12:50:52,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1948860.0, ans=0.125 2023-11-22 12:51:19,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292350 2023-11-22 12:51:20,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1948993.3333333333, ans=0.1 2023-11-22 12:51:22,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1948993.3333333333, ans=0.125 2023-11-22 12:51:25,164 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:51:44,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1949126.6666666667, ans=0.125 2023-11-22 12:51:45,483 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3800, loss[loss=0.0821, simple_loss=0.107, pruned_loss=0.02009, audio_tagging_loss=0.008524, over 15180.00 frames. ], tot_loss[loss=0.07175, simple_loss=0.09447, pruned_loss=0.01522, audio_tagging_loss=0.009289, over 3050235.52 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:51:58,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-22 12:52:09,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1949193.3333333333, ans=0.1 2023-11-22 12:52:10,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-22 12:52:18,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1949260.0, ans=0.0 2023-11-22 12:52:21,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.073e+01 8.669e+01 9.455e+01 1.608e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-22 12:52:23,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292400 2023-11-22 12:52:49,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-22 12:52:50,532 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3850, loss[loss=0.07347, simple_loss=0.1017, pruned_loss=0.01313, audio_tagging_loss=0.009479, over 15359.00 frames. ], tot_loss[loss=0.07202, simple_loss=0.09467, pruned_loss=0.01525, audio_tagging_loss=0.009441, over 3053162.59 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:53:15,453 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:53:28,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292450 2023-11-22 12:53:35,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1949660.0, ans=0.2 2023-11-22 12:53:46,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1949726.6666666667, ans=0.0 2023-11-22 12:53:55,294 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3900, loss[loss=0.06869, simple_loss=0.08743, pruned_loss=0.01406, audio_tagging_loss=0.01092, over 14482.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.09456, pruned_loss=0.01533, audio_tagging_loss=0.009495, over 3051609.46 frames. ], batch size: 54, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:54:13,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1949860.0, ans=0.025 2023-11-22 12:54:21,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-22 12:54:28,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1949926.6666666667, ans=10.0 2023-11-22 12:54:30,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.183e+01 8.833e+01 9.463e+01 1.290e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 12:54:31,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1949926.6666666667, ans=0.2 2023-11-22 12:54:33,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292500 2023-11-22 12:54:57,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1950060.0, ans=0.0 2023-11-22 12:55:00,198 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 3950, loss[loss=0.08877, simple_loss=0.12, pruned_loss=0.02098, audio_tagging_loss=0.007806, over 15934.00 frames. ], tot_loss[loss=0.07164, simple_loss=0.09359, pruned_loss=0.01527, audio_tagging_loss=0.009574, over 3052024.16 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 8.0 2023-11-22 12:55:20,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1950193.3333333333, ans=0.09899494936611666 2023-11-22 12:55:25,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1950260.0, ans=0.1 2023-11-22 12:55:29,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-22 12:55:38,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292550 2023-11-22 12:55:38,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1950326.6666666667, ans=0.5 2023-11-22 12:55:43,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-22 12:55:54,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1950393.3333333333, ans=0.1 2023-11-22 12:56:04,972 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4000, loss[loss=0.07388, simple_loss=0.09478, pruned_loss=0.01817, audio_tagging_loss=0.008324, over 14424.00 frames. ], tot_loss[loss=0.07248, simple_loss=0.09498, pruned_loss=0.01555, audio_tagging_loss=0.00944, over 3047440.52 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:56:25,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1950526.6666666667, ans=0.0 2023-11-22 12:56:38,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1950593.3333333333, ans=0.0 2023-11-22 12:56:39,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2023-11-22 12:56:40,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.523e+01 9.261e+01 9.950e+01 1.651e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-22 12:56:43,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292600 2023-11-22 12:56:48,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1950660.0, ans=0.125 2023-11-22 12:56:51,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1950660.0, ans=0.0 2023-11-22 12:57:09,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1950726.6666666667, ans=0.0 2023-11-22 12:57:11,649 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4050, loss[loss=0.06089, simple_loss=0.07783, pruned_loss=0.01229, audio_tagging_loss=0.009683, over 15040.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09462, pruned_loss=0.01557, audio_tagging_loss=0.009482, over 3043457.38 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:57:14,238 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 12:57:15,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1950793.3333333333, ans=0.125 2023-11-22 12:57:37,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1950926.6666666667, ans=0.1 2023-11-22 12:57:37,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-11-22 12:57:43,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1950926.6666666667, ans=0.0 2023-11-22 12:57:47,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2023-11-22 12:57:50,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292650 2023-11-22 12:57:51,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1950993.3333333333, ans=0.125 2023-11-22 12:57:51,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1950993.3333333333, ans=0.125 2023-11-22 12:58:11,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-22 12:58:17,013 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4100, loss[loss=0.07096, simple_loss=0.0896, pruned_loss=0.01695, audio_tagging_loss=0.009209, over 16559.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09484, pruned_loss=0.01555, audio_tagging_loss=0.009496, over 3043606.70 frames. ], batch size: 63, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:58:25,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1951126.6666666667, ans=0.05 2023-11-22 12:58:31,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1951193.3333333333, ans=0.1 2023-11-22 12:58:41,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1951260.0, ans=0.1 2023-11-22 12:58:52,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.488e+01 8.924e+01 9.565e+01 1.327e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 12:58:54,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292700 2023-11-22 12:59:17,419 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 12:59:22,201 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4150, loss[loss=0.09714, simple_loss=0.1361, pruned_loss=0.02436, audio_tagging_loss=0.004732, over 16236.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09462, pruned_loss=0.01533, audio_tagging_loss=0.009287, over 3042193.20 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 12:59:34,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-22 12:59:36,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1951526.6666666667, ans=0.125 2023-11-22 12:59:47,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1951593.3333333333, ans=0.09899494936611666 2023-11-22 12:59:55,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-22 12:59:58,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1951593.3333333333, ans=0.125 2023-11-22 13:00:00,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292750 2023-11-22 13:00:07,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-11-22 13:00:09,135 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 13:00:14,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1951726.6666666667, ans=0.125 2023-11-22 13:00:26,775 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4200, loss[loss=0.06955, simple_loss=0.08254, pruned_loss=0.01904, audio_tagging_loss=0.009229, over 15043.00 frames. ], tot_loss[loss=0.07143, simple_loss=0.09401, pruned_loss=0.01523, audio_tagging_loss=0.009187, over 3044848.18 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 13:00:35,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-11-22 13:00:43,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1951860.0, ans=0.1 2023-11-22 13:01:00,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1951926.6666666667, ans=0.035 2023-11-22 13:01:02,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.208e+01 8.744e+01 9.952e+01 1.323e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 13:01:05,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292800 2023-11-22 13:01:28,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1952060.0, ans=0.125 2023-11-22 13:01:30,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2023-11-22 13:01:33,210 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4250, loss[loss=0.03521, simple_loss=0.0437, pruned_loss=0.002917, audio_tagging_loss=0.01044, over 16408.00 frames. ], tot_loss[loss=0.07186, simple_loss=0.09483, pruned_loss=0.01533, audio_tagging_loss=0.009119, over 3046389.73 frames. ], batch size: 67, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 13:02:07,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1952260.0, ans=0.125 2023-11-22 13:02:09,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292850 2023-11-22 13:02:37,480 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4300, loss[loss=0.08186, simple_loss=0.1056, pruned_loss=0.01851, audio_tagging_loss=0.01056, over 14937.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09553, pruned_loss=0.01549, audio_tagging_loss=0.00904, over 3046596.13 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 16.0 2023-11-22 13:02:43,913 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:02:45,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1952460.0, ans=0.05 2023-11-22 13:02:49,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1952526.6666666667, ans=0.125 2023-11-22 13:03:08,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1952593.3333333333, ans=0.125 2023-11-22 13:03:11,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1952593.3333333333, ans=0.1 2023-11-22 13:03:12,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.605e+01 8.363e+01 9.136e+01 9.974e+01 1.277e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-22 13:03:12,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1952593.3333333333, ans=0.125 2023-11-22 13:03:14,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292900 2023-11-22 13:03:41,700 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4350, loss[loss=0.04996, simple_loss=0.06177, pruned_loss=0.009336, audio_tagging_loss=0.009739, over 14262.00 frames. ], tot_loss[loss=0.07151, simple_loss=0.09427, pruned_loss=0.01518, audio_tagging_loss=0.009193, over 3041469.68 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:04:08,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1952926.6666666667, ans=0.02 2023-11-22 13:04:09,292 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:04:09,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1952926.6666666667, ans=0.125 2023-11-22 13:04:17,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1952926.6666666667, ans=0.05 2023-11-22 13:04:19,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 292950 2023-11-22 13:04:19,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1952993.3333333333, ans=0.0 2023-11-22 13:04:30,376 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:04:41,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1953060.0, ans=0.1 2023-11-22 13:04:42,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1953060.0, ans=0.125 2023-11-22 13:04:46,741 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4400, loss[loss=0.08239, simple_loss=0.1143, pruned_loss=0.01877, audio_tagging_loss=0.006445, over 16221.00 frames. ], tot_loss[loss=0.07131, simple_loss=0.09392, pruned_loss=0.01515, audio_tagging_loss=0.009199, over 3047753.32 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:04:48,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1953126.6666666667, ans=0.1 2023-11-22 13:05:10,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1953193.3333333333, ans=0.2 2023-11-22 13:05:21,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.093e+01 8.881e+01 9.679e+01 1.352e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-22 13:05:24,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293000 2023-11-22 13:05:25,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2023-11-22 13:05:30,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1953326.6666666667, ans=0.07 2023-11-22 13:05:41,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1953393.3333333333, ans=0.125 2023-11-22 13:05:51,664 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4450, loss[loss=0.06752, simple_loss=0.0921, pruned_loss=0.01511, audio_tagging_loss=0.006361, over 15148.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09385, pruned_loss=0.01508, audio_tagging_loss=0.009194, over 3048491.46 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:05:52,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1953460.0, ans=0.125 2023-11-22 13:05:55,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1953460.0, ans=0.2 2023-11-22 13:06:10,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1953526.6666666667, ans=0.125 2023-11-22 13:06:23,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1953593.3333333333, ans=0.125 2023-11-22 13:06:28,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293050 2023-11-22 13:06:41,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1953726.6666666667, ans=0.1 2023-11-22 13:06:46,714 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.274e-02 2023-11-22 13:06:53,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1953726.6666666667, ans=0.0 2023-11-22 13:06:55,426 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4500, loss[loss=0.07458, simple_loss=0.08941, pruned_loss=0.01968, audio_tagging_loss=0.0102, over 13847.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09462, pruned_loss=0.01515, audio_tagging_loss=0.009199, over 3045175.14 frames. ], batch size: 54, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:06:57,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-22 13:07:16,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1953860.0, ans=10.0 2023-11-22 13:07:31,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.144e+01 8.843e+01 9.621e+01 1.771e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 13:07:32,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1953926.6666666667, ans=0.125 2023-11-22 13:07:32,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293100 2023-11-22 13:07:35,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1953993.3333333333, ans=0.0 2023-11-22 13:07:39,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1953993.3333333333, ans=0.125 2023-11-22 13:07:46,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1954060.0, ans=0.0 2023-11-22 13:07:47,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=15.0 2023-11-22 13:07:51,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1954060.0, ans=0.1 2023-11-22 13:07:58,960 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4550, loss[loss=0.07795, simple_loss=0.1056, pruned_loss=0.01507, audio_tagging_loss=0.01009, over 15342.00 frames. ], tot_loss[loss=0.07174, simple_loss=0.0946, pruned_loss=0.01527, audio_tagging_loss=0.009169, over 3043240.35 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:08:04,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1954126.6666666667, ans=0.125 2023-11-22 13:08:12,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1954193.3333333333, ans=0.035 2023-11-22 13:08:19,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1954193.3333333333, ans=0.2 2023-11-22 13:08:24,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1954260.0, ans=0.0 2023-11-22 13:08:37,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293150 2023-11-22 13:08:37,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=22.5 2023-11-22 13:08:45,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1954326.6666666667, ans=0.1 2023-11-22 13:08:48,008 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 13:08:48,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1954326.6666666667, ans=0.0 2023-11-22 13:08:59,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-22 13:09:03,940 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4600, loss[loss=0.08272, simple_loss=0.1161, pruned_loss=0.0159, audio_tagging_loss=0.008753, over 14914.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09442, pruned_loss=0.01525, audio_tagging_loss=0.009199, over 3047954.94 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:09:15,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1954526.6666666667, ans=0.125 2023-11-22 13:09:15,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1954526.6666666667, ans=0.5 2023-11-22 13:09:28,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1954593.3333333333, ans=0.125 2023-11-22 13:09:40,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.472e+01 8.168e+01 8.767e+01 9.419e+01 1.196e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 13:09:41,511 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293200 2023-11-22 13:09:45,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1954660.0, ans=0.1 2023-11-22 13:10:05,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1954726.6666666667, ans=0.125 2023-11-22 13:10:09,644 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4650, loss[loss=0.0707, simple_loss=0.08829, pruned_loss=0.01571, audio_tagging_loss=0.01085, over 15206.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09379, pruned_loss=0.01502, audio_tagging_loss=0.009334, over 3051507.42 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:10:11,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:10:21,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1954860.0, ans=0.1 2023-11-22 13:10:28,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-22 13:10:46,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-22 13:10:46,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293250 2023-11-22 13:11:13,292 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4700, loss[loss=0.07749, simple_loss=0.1042, pruned_loss=0.01666, audio_tagging_loss=0.008746, over 15941.00 frames. ], tot_loss[loss=0.07132, simple_loss=0.09385, pruned_loss=0.01508, audio_tagging_loss=0.009318, over 3054431.56 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:11:22,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1955126.6666666667, ans=0.0 2023-11-22 13:11:26,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1955193.3333333333, ans=0.1 2023-11-22 13:11:26,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-11-22 13:11:33,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1955193.3333333333, ans=0.1 2023-11-22 13:11:47,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1955260.0, ans=0.125 2023-11-22 13:11:49,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.061e+01 8.708e+01 9.431e+01 1.267e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 13:11:50,926 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293300 2023-11-22 13:11:56,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1955326.6666666667, ans=0.0 2023-11-22 13:11:56,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-22 13:12:00,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1955326.6666666667, ans=0.125 2023-11-22 13:12:01,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1955326.6666666667, ans=0.1 2023-11-22 13:12:04,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1955393.3333333333, ans=0.0 2023-11-22 13:12:10,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2023-11-22 13:12:17,193 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4750, loss[loss=0.06796, simple_loss=0.09181, pruned_loss=0.01275, audio_tagging_loss=0.009308, over 14938.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09357, pruned_loss=0.01495, audio_tagging_loss=0.009377, over 3045666.49 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:12:28,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2023-11-22 13:12:32,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=12.0 2023-11-22 13:12:39,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1955526.6666666667, ans=0.05 2023-11-22 13:12:54,966 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293350 2023-11-22 13:13:00,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1955660.0, ans=0.125 2023-11-22 13:13:00,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1955660.0, ans=0.125 2023-11-22 13:13:12,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1955726.6666666667, ans=0.125 2023-11-22 13:13:23,043 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4800, loss[loss=0.05495, simple_loss=0.07571, pruned_loss=0.007888, audio_tagging_loss=0.009204, over 16304.00 frames. ], tot_loss[loss=0.07149, simple_loss=0.09394, pruned_loss=0.0151, audio_tagging_loss=0.009425, over 3044395.49 frames. ], batch size: 61, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:13:26,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1955793.3333333333, ans=0.0 2023-11-22 13:13:26,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1955793.3333333333, ans=0.125 2023-11-22 13:13:43,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1955860.0, ans=0.2 2023-11-22 13:13:46,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1955860.0, ans=0.0 2023-11-22 13:13:58,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.542e+01 8.018e+01 8.726e+01 9.477e+01 1.215e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 13:14:00,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293400 2023-11-22 13:14:08,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1955993.3333333333, ans=0.125 2023-11-22 13:14:09,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=22.5 2023-11-22 13:14:28,254 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4850, loss[loss=0.08196, simple_loss=0.1067, pruned_loss=0.01672, audio_tagging_loss=0.01191, over 15281.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.09313, pruned_loss=0.01495, audio_tagging_loss=0.009662, over 3044438.58 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:14:32,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1956126.6666666667, ans=0.125 2023-11-22 13:15:06,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293450 2023-11-22 13:15:31,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-22 13:15:33,151 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4900, loss[loss=0.0669, simple_loss=0.09092, pruned_loss=0.01272, audio_tagging_loss=0.008715, over 16664.00 frames. ], tot_loss[loss=0.07129, simple_loss=0.09331, pruned_loss=0.01504, audio_tagging_loss=0.0096, over 3035716.12 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:15:37,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1956460.0, ans=0.2 2023-11-22 13:15:38,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1956460.0, ans=0.125 2023-11-22 13:15:41,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1956460.0, ans=0.05 2023-11-22 13:16:01,446 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:16:10,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.107e+01 8.809e+01 9.600e+01 1.365e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 13:16:11,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293500 2023-11-22 13:16:17,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1956660.0, ans=0.125 2023-11-22 13:16:20,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1956660.0, ans=0.015 2023-11-22 13:16:32,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1956726.6666666667, ans=0.125 2023-11-22 13:16:38,731 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 4950, loss[loss=0.08446, simple_loss=0.1064, pruned_loss=0.02225, audio_tagging_loss=0.00902, over 15402.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09434, pruned_loss=0.01518, audio_tagging_loss=0.009422, over 3037180.39 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:16:39,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1956793.3333333333, ans=0.1 2023-11-22 13:16:42,050 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:16:47,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1956793.3333333333, ans=0.0 2023-11-22 13:17:01,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1956860.0, ans=0.0 2023-11-22 13:17:09,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1956926.6666666667, ans=0.2 2023-11-22 13:17:11,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1956926.6666666667, ans=0.125 2023-11-22 13:17:15,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1956926.6666666667, ans=0.125 2023-11-22 13:17:16,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293550 2023-11-22 13:17:28,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1956993.3333333333, ans=0.125 2023-11-22 13:17:28,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1956993.3333333333, ans=10.0 2023-11-22 13:17:44,349 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5000, loss[loss=0.06053, simple_loss=0.08464, pruned_loss=0.01104, audio_tagging_loss=0.00716, over 14243.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09433, pruned_loss=0.01511, audio_tagging_loss=0.009305, over 3035525.20 frames. ], batch size: 54, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:17:48,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1957126.6666666667, ans=0.0 2023-11-22 13:18:05,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1957193.3333333333, ans=0.0 2023-11-22 13:18:20,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.208e+01 8.762e+01 9.402e+01 1.160e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 13:18:22,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293600 2023-11-22 13:18:49,651 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5050, loss[loss=0.0675, simple_loss=0.08012, pruned_loss=0.01757, audio_tagging_loss=0.009871, over 15908.00 frames. ], tot_loss[loss=0.07157, simple_loss=0.09452, pruned_loss=0.01516, audio_tagging_loss=0.009148, over 3042186.20 frames. ], batch size: 60, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:19:07,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1957526.6666666667, ans=0.125 2023-11-22 13:19:13,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-22 13:19:27,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293650 2023-11-22 13:19:30,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1957660.0, ans=0.125 2023-11-22 13:19:46,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1957726.6666666667, ans=0.125 2023-11-22 13:19:50,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1957726.6666666667, ans=0.2 2023-11-22 13:19:54,006 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5100, loss[loss=0.06559, simple_loss=0.09009, pruned_loss=0.01012, audio_tagging_loss=0.01042, over 14788.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09385, pruned_loss=0.01495, audio_tagging_loss=0.009212, over 3045302.05 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:19:55,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1957793.3333333333, ans=0.125 2023-11-22 13:19:55,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1957793.3333333333, ans=0.125 2023-11-22 13:20:30,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.490e+01 8.194e+01 8.730e+01 9.544e+01 1.434e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 13:20:30,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1957926.6666666667, ans=0.95 2023-11-22 13:20:31,403 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293700 2023-11-22 13:20:46,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1958060.0, ans=0.125 2023-11-22 13:20:58,901 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5150, loss[loss=0.07303, simple_loss=0.0955, pruned_loss=0.0152, audio_tagging_loss=0.01008, over 15844.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09312, pruned_loss=0.01485, audio_tagging_loss=0.009304, over 3049206.33 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:21:12,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2023-11-22 13:21:27,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-22 13:21:28,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1958260.0, ans=0.2 2023-11-22 13:21:31,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1958260.0, ans=0.0 2023-11-22 13:21:31,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1958260.0, ans=0.0 2023-11-22 13:21:35,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293750 2023-11-22 13:21:42,901 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:21:44,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1958326.6666666667, ans=0.2 2023-11-22 13:21:45,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1958326.6666666667, ans=0.1 2023-11-22 13:22:03,623 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5200, loss[loss=0.05952, simple_loss=0.06155, pruned_loss=0.0176, audio_tagging_loss=0.01115, over 16984.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09383, pruned_loss=0.0151, audio_tagging_loss=0.009227, over 3045965.53 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:22:08,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1958460.0, ans=0.125 2023-11-22 13:22:27,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-11-22 13:22:39,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.666e+01 8.256e+01 9.056e+01 9.741e+01 1.453e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-22 13:22:41,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293800 2023-11-22 13:22:41,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-22 13:22:53,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1958660.0, ans=0.125 2023-11-22 13:23:08,287 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5250, loss[loss=0.07893, simple_loss=0.1035, pruned_loss=0.01814, audio_tagging_loss=0.009032, over 15109.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09398, pruned_loss=0.01511, audio_tagging_loss=0.009142, over 3043715.35 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:23:30,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1958860.0, ans=0.0 2023-11-22 13:23:31,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1958860.0, ans=0.125 2023-11-22 13:23:38,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1958926.6666666667, ans=0.0 2023-11-22 13:23:39,351 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:23:42,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1958926.6666666667, ans=0.125 2023-11-22 13:23:46,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293850 2023-11-22 13:23:52,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1958993.3333333333, ans=0.05 2023-11-22 13:24:13,201 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5300, loss[loss=0.06878, simple_loss=0.09097, pruned_loss=0.01359, audio_tagging_loss=0.009702, over 15191.00 frames. ], tot_loss[loss=0.07129, simple_loss=0.09395, pruned_loss=0.01518, audio_tagging_loss=0.009135, over 3045855.21 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:24:22,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1959126.6666666667, ans=0.125 2023-11-22 13:24:24,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1959126.6666666667, ans=0.125 2023-11-22 13:24:29,042 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:24:45,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1959260.0, ans=0.125 2023-11-22 13:24:51,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.505e+01 9.104e+01 9.788e+01 1.254e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-22 13:24:51,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293900 2023-11-22 13:24:51,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1959326.6666666667, ans=0.125 2023-11-22 13:25:04,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1959393.3333333333, ans=0.1 2023-11-22 13:25:06,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=22.5 2023-11-22 13:25:18,119 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5350, loss[loss=0.06898, simple_loss=0.08685, pruned_loss=0.01185, audio_tagging_loss=0.0137, over 14861.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.0938, pruned_loss=0.0152, audio_tagging_loss=0.009306, over 3044446.47 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:25:21,607 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:25:31,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1959526.6666666667, ans=0.5 2023-11-22 13:25:47,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-22 13:25:55,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 293950 2023-11-22 13:25:59,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1959660.0, ans=0.025 2023-11-22 13:26:01,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1959660.0, ans=0.1 2023-11-22 13:26:11,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1959726.6666666667, ans=0.125 2023-11-22 13:26:23,246 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5400, loss[loss=0.06409, simple_loss=0.08703, pruned_loss=0.01222, audio_tagging_loss=0.008359, over 16070.00 frames. ], tot_loss[loss=0.07157, simple_loss=0.09422, pruned_loss=0.01523, audio_tagging_loss=0.009231, over 3050990.24 frames. ], batch size: 60, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:26:27,144 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:26:49,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1959926.6666666667, ans=0.1 2023-11-22 13:26:53,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1959926.6666666667, ans=0.125 2023-11-22 13:27:00,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.841e+01 8.190e+01 8.796e+01 9.372e+01 1.218e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 13:27:00,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294000 2023-11-22 13:27:28,010 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5450, loss[loss=0.05884, simple_loss=0.07742, pruned_loss=0.01181, audio_tagging_loss=0.008311, over 14493.00 frames. ], tot_loss[loss=0.07206, simple_loss=0.09499, pruned_loss=0.01536, audio_tagging_loss=0.009205, over 3048374.14 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:27:35,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1960126.6666666667, ans=0.1 2023-11-22 13:27:36,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1960126.6666666667, ans=0.05 2023-11-22 13:28:01,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1960260.0, ans=0.1 2023-11-22 13:28:03,742 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:28:06,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294050 2023-11-22 13:28:06,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1960326.6666666667, ans=0.0 2023-11-22 13:28:08,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1960326.6666666667, ans=0.125 2023-11-22 13:28:26,440 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:28:31,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1960460.0, ans=0.5 2023-11-22 13:28:32,382 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5500, loss[loss=0.08413, simple_loss=0.1178, pruned_loss=0.01575, audio_tagging_loss=0.00946, over 15206.00 frames. ], tot_loss[loss=0.07236, simple_loss=0.09528, pruned_loss=0.0155, audio_tagging_loss=0.009216, over 3048470.73 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:28:44,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1960526.6666666667, ans=0.0 2023-11-22 13:28:45,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-22 13:28:51,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1960526.6666666667, ans=0.0 2023-11-22 13:28:53,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1960526.6666666667, ans=0.125 2023-11-22 13:28:59,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1960593.3333333333, ans=0.125 2023-11-22 13:29:04,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2023-11-22 13:29:10,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.387e+01 8.932e+01 9.464e+01 1.203e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 13:29:10,234 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294100 2023-11-22 13:29:11,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1960660.0, ans=0.125 2023-11-22 13:29:15,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1960660.0, ans=0.0 2023-11-22 13:29:23,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1960726.6666666667, ans=0.2 2023-11-22 13:29:37,764 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5550, loss[loss=0.08259, simple_loss=0.1048, pruned_loss=0.02182, audio_tagging_loss=0.008371, over 14855.00 frames. ], tot_loss[loss=0.07203, simple_loss=0.09466, pruned_loss=0.01538, audio_tagging_loss=0.009316, over 3041991.75 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:30:05,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1960926.6666666667, ans=0.0 2023-11-22 13:30:14,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294150 2023-11-22 13:30:20,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1960993.3333333333, ans=0.125 2023-11-22 13:30:24,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1960993.3333333333, ans=0.0 2023-11-22 13:30:42,361 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5600, loss[loss=0.07009, simple_loss=0.08922, pruned_loss=0.01573, audio_tagging_loss=0.009752, over 15085.00 frames. ], tot_loss[loss=0.07281, simple_loss=0.09575, pruned_loss=0.01555, audio_tagging_loss=0.009383, over 3034910.69 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:30:45,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1961126.6666666667, ans=0.0 2023-11-22 13:31:01,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1961193.3333333333, ans=0.1 2023-11-22 13:31:01,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1961193.3333333333, ans=0.125 2023-11-22 13:31:19,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294200 2023-11-22 13:31:21,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.047e+01 8.683e+01 9.396e+01 1.171e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-22 13:31:21,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1961326.6666666667, ans=0.125 2023-11-22 13:31:28,713 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 13:31:46,607 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5650, loss[loss=0.09005, simple_loss=0.1188, pruned_loss=0.01956, audio_tagging_loss=0.01108, over 15924.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09512, pruned_loss=0.01544, audio_tagging_loss=0.009558, over 3041201.41 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:32:21,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2023-11-22 13:32:24,600 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294250 2023-11-22 13:32:30,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1961660.0, ans=15.0 2023-11-22 13:32:35,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1961660.0, ans=0.07 2023-11-22 13:32:37,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1961726.6666666667, ans=0.0 2023-11-22 13:32:51,458 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5700, loss[loss=0.08077, simple_loss=0.1107, pruned_loss=0.01555, audio_tagging_loss=0.009858, over 16186.00 frames. ], tot_loss[loss=0.07233, simple_loss=0.09489, pruned_loss=0.01542, audio_tagging_loss=0.009462, over 3041160.11 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:33:15,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-22 13:33:25,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1961926.6666666667, ans=0.125 2023-11-22 13:33:28,455 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294300 2023-11-22 13:33:29,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.416e+01 9.012e+01 9.956e+01 1.838e+02, threshold=1.802e+02, percent-clipped=1.0 2023-11-22 13:33:35,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1961993.3333333333, ans=0.04949747468305833 2023-11-22 13:33:40,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1961993.3333333333, ans=0.05 2023-11-22 13:33:44,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1962060.0, ans=0.0 2023-11-22 13:33:54,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1962126.6666666667, ans=0.1 2023-11-22 13:33:55,984 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5750, loss[loss=0.07132, simple_loss=0.09282, pruned_loss=0.01583, audio_tagging_loss=0.009082, over 15677.00 frames. ], tot_loss[loss=0.07101, simple_loss=0.09304, pruned_loss=0.0151, audio_tagging_loss=0.009393, over 3041971.33 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:34:04,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1962126.6666666667, ans=0.0 2023-11-22 13:34:13,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1962193.3333333333, ans=0.0 2023-11-22 13:34:33,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294350 2023-11-22 13:34:55,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1962393.3333333333, ans=0.0 2023-11-22 13:35:00,428 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5800, loss[loss=0.0606, simple_loss=0.07562, pruned_loss=0.01314, audio_tagging_loss=0.009648, over 14698.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09349, pruned_loss=0.01521, audio_tagging_loss=0.009239, over 3042692.38 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:35:36,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1962593.3333333333, ans=0.125 2023-11-22 13:35:38,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294400 2023-11-22 13:35:39,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.175e+01 8.400e+01 8.933e+01 9.518e+01 1.972e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-22 13:35:51,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1962726.6666666667, ans=0.0 2023-11-22 13:36:05,422 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5850, loss[loss=0.04348, simple_loss=0.05325, pruned_loss=0.008528, audio_tagging_loss=0.00833, over 14672.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09331, pruned_loss=0.01505, audio_tagging_loss=0.009218, over 3040159.53 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:36:12,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1962793.3333333333, ans=0.0 2023-11-22 13:36:33,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1962926.6666666667, ans=0.0 2023-11-22 13:36:36,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1962926.6666666667, ans=0.2 2023-11-22 13:36:38,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1962926.6666666667, ans=0.1 2023-11-22 13:36:42,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294450 2023-11-22 13:36:43,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1962993.3333333333, ans=0.125 2023-11-22 13:37:10,410 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5900, loss[loss=0.07133, simple_loss=0.09134, pruned_loss=0.0158, audio_tagging_loss=0.009861, over 15788.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.0936, pruned_loss=0.015, audio_tagging_loss=0.009186, over 3042974.93 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:37:30,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1963193.3333333333, ans=0.0 2023-11-22 13:37:43,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1963260.0, ans=0.125 2023-11-22 13:37:47,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294500 2023-11-22 13:37:49,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.759e+01 8.196e+01 8.876e+01 9.676e+01 1.604e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 13:37:49,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1963326.6666666667, ans=0.125 2023-11-22 13:37:55,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1963326.6666666667, ans=0.125 2023-11-22 13:38:14,686 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 5950, loss[loss=0.07308, simple_loss=0.09682, pruned_loss=0.01511, audio_tagging_loss=0.009557, over 16094.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09446, pruned_loss=0.01526, audio_tagging_loss=0.009159, over 3047904.98 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:38:16,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1963460.0, ans=0.0 2023-11-22 13:38:17,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1963460.0, ans=0.125 2023-11-22 13:38:17,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1963460.0, ans=0.125 2023-11-22 13:38:52,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294550 2023-11-22 13:39:03,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1963660.0, ans=0.05 2023-11-22 13:39:19,053 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6000, loss[loss=0.06082, simple_loss=0.07979, pruned_loss=0.01184, audio_tagging_loss=0.009088, over 15293.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09411, pruned_loss=0.01522, audio_tagging_loss=0.009254, over 3049433.87 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:39:19,054 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 13:39:46,150 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9796, 3.1770, 2.9411, 3.1614, 3.4458, 2.8638, 3.4230, 2.5150], device='cuda:1') 2023-11-22 13:40:01,004 INFO [train_asr.py:1253] (1/4) Epoch 25, validation: loss=0.05896, simple_loss=0.05155, pruned_loss=0.00512, audio_tagging_loss=0.02806, over 4681554.00 frames. 2023-11-22 13:40:01,006 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 13:40:38,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294600 2023-11-22 13:40:39,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.185e+01 8.770e+01 9.296e+01 1.374e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 13:40:46,815 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 13:40:53,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1964060.0, ans=0.125 2023-11-22 13:41:04,636 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6050, loss[loss=0.08354, simple_loss=0.1207, pruned_loss=0.01538, audio_tagging_loss=0.00781, over 15315.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09473, pruned_loss=0.01524, audio_tagging_loss=0.009123, over 3056779.41 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:41:06,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1964126.6666666667, ans=0.125 2023-11-22 13:41:16,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1964193.3333333333, ans=0.125 2023-11-22 13:41:34,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1964260.0, ans=0.05 2023-11-22 13:41:42,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=22.5 2023-11-22 13:41:42,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294650 2023-11-22 13:42:09,279 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6100, loss[loss=0.08111, simple_loss=0.1132, pruned_loss=0.01866, audio_tagging_loss=0.005871, over 15535.00 frames. ], tot_loss[loss=0.07167, simple_loss=0.09457, pruned_loss=0.0152, audio_tagging_loss=0.009189, over 3057954.35 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:42:40,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1964593.3333333333, ans=0.0 2023-11-22 13:42:45,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1964593.3333333333, ans=0.125 2023-11-22 13:42:46,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294700 2023-11-22 13:42:47,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.151e+01 8.939e+01 9.703e+01 1.163e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 13:43:00,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1964726.6666666667, ans=0.125 2023-11-22 13:43:13,327 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6150, loss[loss=0.06662, simple_loss=0.09193, pruned_loss=0.01221, audio_tagging_loss=0.008439, over 15028.00 frames. ], tot_loss[loss=0.07151, simple_loss=0.09431, pruned_loss=0.0151, audio_tagging_loss=0.009254, over 3054681.63 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:43:22,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1964793.3333333333, ans=0.09899494936611666 2023-11-22 13:43:23,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1964793.3333333333, ans=0.07 2023-11-22 13:43:23,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2023-11-22 13:43:23,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=22.5 2023-11-22 13:43:29,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1964860.0, ans=0.2 2023-11-22 13:43:32,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-22 13:43:36,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1964860.0, ans=0.2 2023-11-22 13:43:45,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.79 vs. limit=10.0 2023-11-22 13:43:50,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294750 2023-11-22 13:44:00,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2023-11-22 13:44:08,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-22 13:44:10,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1965060.0, ans=0.125 2023-11-22 13:44:18,492 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6200, loss[loss=0.07496, simple_loss=0.09378, pruned_loss=0.01702, audio_tagging_loss=0.01105, over 15169.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09313, pruned_loss=0.01491, audio_tagging_loss=0.009368, over 3046088.63 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:44:23,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.56 vs. limit=15.0 2023-11-22 13:44:26,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1965126.6666666667, ans=0.0 2023-11-22 13:44:26,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1965126.6666666667, ans=0.125 2023-11-22 13:44:28,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1965126.6666666667, ans=0.125 2023-11-22 13:44:35,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-22 13:44:36,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1965193.3333333333, ans=0.2 2023-11-22 13:44:44,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=12.0 2023-11-22 13:44:44,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1965260.0, ans=0.125 2023-11-22 13:44:56,196 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294800 2023-11-22 13:44:57,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 7.998e+01 8.582e+01 9.232e+01 1.087e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-22 13:44:59,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1965326.6666666667, ans=0.0 2023-11-22 13:45:12,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1965393.3333333333, ans=0.125 2023-11-22 13:45:14,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-22 13:45:23,765 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6250, loss[loss=0.06693, simple_loss=0.08015, pruned_loss=0.01537, audio_tagging_loss=0.01148, over 15706.00 frames. ], tot_loss[loss=0.07057, simple_loss=0.09279, pruned_loss=0.01473, audio_tagging_loss=0.009445, over 3046832.35 frames. ], batch size: 60, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:45:30,333 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:45:56,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1965593.3333333333, ans=0.0 2023-11-22 13:46:01,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294850 2023-11-22 13:46:07,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1965660.0, ans=0.125 2023-11-22 13:46:17,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-22 13:46:28,185 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6300, loss[loss=0.07428, simple_loss=0.09682, pruned_loss=0.01752, audio_tagging_loss=0.008348, over 14521.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09247, pruned_loss=0.01459, audio_tagging_loss=0.009468, over 3038122.45 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:46:35,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1965793.3333333333, ans=0.1 2023-11-22 13:46:37,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1965793.3333333333, ans=0.2 2023-11-22 13:46:43,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1965860.0, ans=0.1 2023-11-22 13:47:05,602 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294900 2023-11-22 13:47:07,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.416e+01 8.996e+01 9.619e+01 1.207e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 13:47:24,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1966060.0, ans=10.0 2023-11-22 13:47:32,516 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6350, loss[loss=0.03855, simple_loss=0.04459, pruned_loss=0.005594, audio_tagging_loss=0.01066, over 14095.00 frames. ], tot_loss[loss=0.07068, simple_loss=0.09275, pruned_loss=0.01468, audio_tagging_loss=0.009624, over 3036621.06 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 16.0 2023-11-22 13:47:35,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1966126.6666666667, ans=0.1 2023-11-22 13:47:54,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-22 13:48:08,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-22 13:48:10,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 294950 2023-11-22 13:48:37,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=12.0 2023-11-22 13:48:38,025 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6400, loss[loss=0.08933, simple_loss=0.1139, pruned_loss=0.02392, audio_tagging_loss=0.008476, over 17220.00 frames. ], tot_loss[loss=0.0708, simple_loss=0.0928, pruned_loss=0.01469, audio_tagging_loss=0.009714, over 3040637.70 frames. ], batch size: 65, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:48:43,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1966460.0, ans=0.1 2023-11-22 13:48:45,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1966460.0, ans=0.125 2023-11-22 13:49:12,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2023-11-22 13:49:15,240 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295000 2023-11-22 13:49:16,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1966660.0, ans=10.0 2023-11-22 13:49:17,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.103e+01 8.926e+01 9.580e+01 1.211e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 13:49:40,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2023-11-22 13:49:42,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1966793.3333333333, ans=0.1 2023-11-22 13:49:43,148 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6450, loss[loss=0.05451, simple_loss=0.06557, pruned_loss=0.007747, audio_tagging_loss=0.01398, over 14584.00 frames. ], tot_loss[loss=0.07054, simple_loss=0.09247, pruned_loss=0.01457, audio_tagging_loss=0.009734, over 3041651.48 frames. ], batch size: 56, lr: 2.73e-03, grad_scale: 32.0 2023-11-22 13:49:44,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1966793.3333333333, ans=0.0 2023-11-22 13:50:01,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-22 13:50:12,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1966926.6666666667, ans=0.125 2023-11-22 13:50:14,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1966926.6666666667, ans=0.0 2023-11-22 13:50:18,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1966926.6666666667, ans=0.0 2023-11-22 13:50:20,984 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295050 2023-11-22 13:50:27,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1966993.3333333333, ans=0.125 2023-11-22 13:50:28,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2023-11-22 13:50:47,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.16 vs. limit=10.0 2023-11-22 13:50:47,823 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6500, loss[loss=0.06587, simple_loss=0.07606, pruned_loss=0.01449, audio_tagging_loss=0.01336, over 15177.00 frames. ], tot_loss[loss=0.07031, simple_loss=0.09185, pruned_loss=0.01459, audio_tagging_loss=0.009797, over 3049166.69 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:50:57,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1967126.6666666667, ans=0.125 2023-11-22 13:50:58,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:51:06,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1967193.3333333333, ans=0.125 2023-11-22 13:51:07,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-22 13:51:25,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295100 2023-11-22 13:51:27,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.137e+01 8.907e+01 9.781e+01 1.377e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-22 13:51:31,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.72 vs. limit=10.0 2023-11-22 13:51:46,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1967393.3333333333, ans=0.0 2023-11-22 13:51:49,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1967393.3333333333, ans=0.1 2023-11-22 13:51:51,249 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6550, loss[loss=0.06865, simple_loss=0.09664, pruned_loss=0.01126, audio_tagging_loss=0.009078, over 14717.00 frames. ], tot_loss[loss=0.07053, simple_loss=0.09274, pruned_loss=0.01459, audio_tagging_loss=0.009569, over 3044637.25 frames. ], batch size: 53, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:51:56,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1967460.0, ans=0.0 2023-11-22 13:51:59,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-22 13:52:02,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-22 13:52:12,914 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 13:52:20,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1967593.3333333333, ans=0.125 2023-11-22 13:52:29,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295150 2023-11-22 13:52:44,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1967726.6666666667, ans=0.125 2023-11-22 13:52:47,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1967726.6666666667, ans=0.1 2023-11-22 13:52:53,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1967726.6666666667, ans=0.0 2023-11-22 13:52:56,800 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6600, loss[loss=0.06741, simple_loss=0.08674, pruned_loss=0.01149, audio_tagging_loss=0.01255, over 14389.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09268, pruned_loss=0.01459, audio_tagging_loss=0.009517, over 3040804.54 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:53:02,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1967793.3333333333, ans=0.2 2023-11-22 13:53:18,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1967860.0, ans=0.0 2023-11-22 13:53:23,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1967926.6666666667, ans=0.125 2023-11-22 13:53:23,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1967926.6666666667, ans=0.0 2023-11-22 13:53:25,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-22 13:53:31,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1967926.6666666667, ans=0.125 2023-11-22 13:53:34,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295200 2023-11-22 13:53:37,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.017e+01 8.525e+01 9.445e+01 1.151e+02, threshold=1.705e+02, percent-clipped=0.0 2023-11-22 13:53:38,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1967993.3333333333, ans=0.1 2023-11-22 13:53:47,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-22 13:53:55,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1968060.0, ans=0.0 2023-11-22 13:53:57,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1968060.0, ans=0.0 2023-11-22 13:53:59,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1968060.0, ans=0.0 2023-11-22 13:54:02,052 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6650, loss[loss=0.06157, simple_loss=0.07922, pruned_loss=0.0155, audio_tagging_loss=0.006461, over 15755.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09344, pruned_loss=0.01493, audio_tagging_loss=0.009349, over 3033909.37 frames. ], batch size: 60, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:54:14,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2023-11-22 13:54:39,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295250 2023-11-22 13:54:41,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1968326.6666666667, ans=0.125 2023-11-22 13:54:46,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1968326.6666666667, ans=0.125 2023-11-22 13:54:55,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1968393.3333333333, ans=0.0 2023-11-22 13:54:58,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1968393.3333333333, ans=0.125 2023-11-22 13:55:05,826 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6700, loss[loss=0.07249, simple_loss=0.09249, pruned_loss=0.01655, audio_tagging_loss=0.009687, over 15294.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.09416, pruned_loss=0.01509, audio_tagging_loss=0.009229, over 3037771.55 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:55:19,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1968526.6666666667, ans=0.04949747468305833 2023-11-22 13:55:28,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1968526.6666666667, ans=0.125 2023-11-22 13:55:44,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295300 2023-11-22 13:55:46,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.344e+01 8.911e+01 9.672e+01 1.240e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 13:55:55,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2023-11-22 13:56:04,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1968726.6666666667, ans=0.1 2023-11-22 13:56:11,716 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6750, loss[loss=0.06379, simple_loss=0.08664, pruned_loss=0.01131, audio_tagging_loss=0.009154, over 15029.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09494, pruned_loss=0.01499, audio_tagging_loss=0.009204, over 3035552.86 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:56:21,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1968793.3333333333, ans=0.125 2023-11-22 13:56:30,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1968860.0, ans=0.1 2023-11-22 13:56:47,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295350 2023-11-22 13:57:04,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1969060.0, ans=0.2 2023-11-22 13:57:08,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-22 13:57:09,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1969060.0, ans=0.2 2023-11-22 13:57:15,699 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6800, loss[loss=0.08138, simple_loss=0.1235, pruned_loss=0.01477, audio_tagging_loss=0.004865, over 15913.00 frames. ], tot_loss[loss=0.07164, simple_loss=0.09499, pruned_loss=0.01502, audio_tagging_loss=0.009123, over 3038408.36 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:57:26,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-22 13:57:32,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1969193.3333333333, ans=0.125 2023-11-22 13:57:33,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1969193.3333333333, ans=0.2 2023-11-22 13:57:53,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295400 2023-11-22 13:57:56,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.041e+01 8.623e+01 9.531e+01 1.254e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-22 13:57:59,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-22 13:58:20,145 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6850, loss[loss=0.06244, simple_loss=0.08576, pruned_loss=0.01239, audio_tagging_loss=0.007169, over 15155.00 frames. ], tot_loss[loss=0.07162, simple_loss=0.09511, pruned_loss=0.01495, audio_tagging_loss=0.009112, over 3034413.81 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:58:27,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1969460.0, ans=0.2 2023-11-22 13:58:27,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1969460.0, ans=0.1 2023-11-22 13:58:45,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-22 13:58:58,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295450 2023-11-22 13:59:05,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1969660.0, ans=0.1 2023-11-22 13:59:19,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1969726.6666666667, ans=0.05 2023-11-22 13:59:24,607 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6900, loss[loss=0.06458, simple_loss=0.07685, pruned_loss=0.01173, audio_tagging_loss=0.01443, over 15557.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.09361, pruned_loss=0.01478, audio_tagging_loss=0.009244, over 3034755.63 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 13:59:58,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1969926.6666666667, ans=0.1 2023-11-22 14:00:02,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295500 2023-11-22 14:00:04,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.094e+01 8.795e+01 9.359e+01 1.337e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 14:00:10,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1969993.3333333333, ans=0.1 2023-11-22 14:00:15,077 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 14:00:22,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-22 14:00:30,081 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 6950, loss[loss=0.07429, simple_loss=0.09329, pruned_loss=0.01798, audio_tagging_loss=0.009665, over 15841.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09391, pruned_loss=0.01483, audio_tagging_loss=0.009164, over 3033710.73 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:00:36,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-22 14:00:49,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1970193.3333333333, ans=0.125 2023-11-22 14:00:56,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1970260.0, ans=0.0 2023-11-22 14:01:07,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295550 2023-11-22 14:01:09,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1970326.6666666667, ans=0.1 2023-11-22 14:01:11,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1970326.6666666667, ans=0.125 2023-11-22 14:01:21,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.40 vs. limit=10.0 2023-11-22 14:01:23,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1970393.3333333333, ans=0.2 2023-11-22 14:01:28,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1970393.3333333333, ans=10.0 2023-11-22 14:01:33,774 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7000, loss[loss=0.1028, simple_loss=0.1368, pruned_loss=0.02634, audio_tagging_loss=0.008057, over 15737.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09358, pruned_loss=0.01478, audio_tagging_loss=0.009241, over 3032004.25 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:01:46,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1970526.6666666667, ans=0.125 2023-11-22 14:01:46,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-22 14:02:07,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1970593.3333333333, ans=0.0 2023-11-22 14:02:12,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295600 2023-11-22 14:02:14,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.600e+01 8.282e+01 8.833e+01 9.554e+01 1.299e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 14:02:24,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2023-11-22 14:02:38,737 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7050, loss[loss=0.05532, simple_loss=0.06667, pruned_loss=0.01287, audio_tagging_loss=0.00911, over 14490.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09283, pruned_loss=0.01448, audio_tagging_loss=0.009374, over 3029632.79 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:02:39,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1970793.3333333333, ans=0.125 2023-11-22 14:03:01,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1970860.0, ans=0.125 2023-11-22 14:03:16,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295650 2023-11-22 14:03:22,445 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:03:24,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2023-11-22 14:03:41,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1971060.0, ans=0.1 2023-11-22 14:03:43,538 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7100, loss[loss=0.0847, simple_loss=0.1187, pruned_loss=0.01742, audio_tagging_loss=0.007918, over 15130.00 frames. ], tot_loss[loss=0.07075, simple_loss=0.09345, pruned_loss=0.01461, audio_tagging_loss=0.009407, over 3031053.98 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:03:49,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-22 14:03:56,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1971193.3333333333, ans=0.0 2023-11-22 14:03:57,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1971193.3333333333, ans=0.125 2023-11-22 14:03:58,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1971193.3333333333, ans=0.125 2023-11-22 14:04:07,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1971193.3333333333, ans=0.125 2023-11-22 14:04:10,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1971260.0, ans=0.1 2023-11-22 14:04:15,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2023-11-22 14:04:21,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295700 2023-11-22 14:04:24,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.732e+01 8.394e+01 8.961e+01 9.557e+01 1.218e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 14:04:25,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1971326.6666666667, ans=0.125 2023-11-22 14:04:36,143 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:04:43,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=12.0 2023-11-22 14:04:47,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1971460.0, ans=0.1 2023-11-22 14:04:47,940 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7150, loss[loss=0.08125, simple_loss=0.1084, pruned_loss=0.01956, audio_tagging_loss=0.007487, over 15972.00 frames. ], tot_loss[loss=0.07098, simple_loss=0.09384, pruned_loss=0.01461, audio_tagging_loss=0.009448, over 3036028.39 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:04:49,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1971460.0, ans=0.0 2023-11-22 14:04:50,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1971460.0, ans=0.125 2023-11-22 14:05:03,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1971526.6666666667, ans=0.04949747468305833 2023-11-22 14:05:24,668 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:05:26,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295750 2023-11-22 14:05:31,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1971660.0, ans=0.125 2023-11-22 14:05:45,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1971726.6666666667, ans=0.1 2023-11-22 14:05:46,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1971726.6666666667, ans=0.125 2023-11-22 14:05:48,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1971726.6666666667, ans=0.0 2023-11-22 14:05:51,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1971793.3333333333, ans=0.0 2023-11-22 14:05:52,783 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7200, loss[loss=0.06672, simple_loss=0.08693, pruned_loss=0.01325, audio_tagging_loss=0.01001, over 16281.00 frames. ], tot_loss[loss=0.07035, simple_loss=0.0928, pruned_loss=0.01439, audio_tagging_loss=0.009567, over 3035295.87 frames. ], batch size: 60, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:06:00,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1971793.3333333333, ans=0.125 2023-11-22 14:06:18,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-22 14:06:30,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295800 2023-11-22 14:06:33,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.039e+01 8.548e+01 9.116e+01 1.085e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-22 14:06:43,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1971993.3333333333, ans=0.125 2023-11-22 14:06:54,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1972060.0, ans=0.125 2023-11-22 14:06:56,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1972060.0, ans=0.125 2023-11-22 14:06:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1972060.0, ans=0.0 2023-11-22 14:06:58,282 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7250, loss[loss=0.06687, simple_loss=0.09022, pruned_loss=0.01412, audio_tagging_loss=0.007645, over 15625.00 frames. ], tot_loss[loss=0.06986, simple_loss=0.09181, pruned_loss=0.0143, audio_tagging_loss=0.009659, over 3039364.86 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:07:13,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1972193.3333333333, ans=0.125 2023-11-22 14:07:36,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295850 2023-11-22 14:07:38,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-22 14:07:49,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1972393.3333333333, ans=0.125 2023-11-22 14:08:02,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1972460.0, ans=0.1 2023-11-22 14:08:03,643 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7300, loss[loss=0.09558, simple_loss=0.1258, pruned_loss=0.02126, audio_tagging_loss=0.01141, over 15151.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09296, pruned_loss=0.01455, audio_tagging_loss=0.009615, over 3037831.16 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:08:22,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1972526.6666666667, ans=0.125 2023-11-22 14:08:40,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295900 2023-11-22 14:08:44,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.276e+01 8.894e+01 9.525e+01 1.269e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-22 14:08:45,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1972660.0, ans=0.07 2023-11-22 14:08:58,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1972726.6666666667, ans=0.09899494936611666 2023-11-22 14:09:08,250 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7350, loss[loss=0.06999, simple_loss=0.09903, pruned_loss=0.01248, audio_tagging_loss=0.007997, over 14788.00 frames. ], tot_loss[loss=0.0705, simple_loss=0.09272, pruned_loss=0.01461, audio_tagging_loss=0.009532, over 3044094.93 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:09:18,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1972793.3333333333, ans=0.1 2023-11-22 14:09:18,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1972793.3333333333, ans=0.0 2023-11-22 14:09:33,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1972926.6666666667, ans=0.0 2023-11-22 14:09:45,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 295950 2023-11-22 14:10:02,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1973060.0, ans=0.2 2023-11-22 14:10:12,123 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7400, loss[loss=0.07076, simple_loss=0.09102, pruned_loss=0.01515, audio_tagging_loss=0.0101, over 15357.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09277, pruned_loss=0.01463, audio_tagging_loss=0.009463, over 3034773.21 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:10:21,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1973126.6666666667, ans=0.05 2023-11-22 14:10:30,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1973193.3333333333, ans=0.0 2023-11-22 14:10:34,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1973193.3333333333, ans=0.1 2023-11-22 14:10:40,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1973260.0, ans=0.1 2023-11-22 14:10:44,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1973260.0, ans=0.125 2023-11-22 14:10:49,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296000 2023-11-22 14:10:56,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.500e+01 8.081e+01 8.804e+01 9.585e+01 1.209e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 14:11:14,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1973393.3333333333, ans=0.125 2023-11-22 14:11:19,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-22 14:11:19,956 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7450, loss[loss=0.07072, simple_loss=0.09406, pruned_loss=0.01586, audio_tagging_loss=0.007839, over 14527.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.09407, pruned_loss=0.01497, audio_tagging_loss=0.009389, over 3042086.63 frames. ], batch size: 54, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:11:29,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1973460.0, ans=0.0 2023-11-22 14:11:33,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1973526.6666666667, ans=0.125 2023-11-22 14:11:34,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1973526.6666666667, ans=0.1 2023-11-22 14:11:49,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1973593.3333333333, ans=0.125 2023-11-22 14:11:49,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-22 14:11:50,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1973593.3333333333, ans=0.125 2023-11-22 14:11:51,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1973593.3333333333, ans=0.125 2023-11-22 14:11:57,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296050 2023-11-22 14:12:24,594 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7500, loss[loss=0.06732, simple_loss=0.08456, pruned_loss=0.01353, audio_tagging_loss=0.01151, over 14992.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09379, pruned_loss=0.01498, audio_tagging_loss=0.009332, over 3044418.38 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:12:32,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1973793.3333333333, ans=0.1 2023-11-22 14:12:34,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1973793.3333333333, ans=0.0 2023-11-22 14:12:40,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1973860.0, ans=0.0 2023-11-22 14:12:42,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1973860.0, ans=0.0 2023-11-22 14:13:02,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296100 2023-11-22 14:13:02,491 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:13:06,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.171e+01 8.790e+01 9.530e+01 1.185e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 14:13:09,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1973993.3333333333, ans=0.125 2023-11-22 14:13:10,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1973993.3333333333, ans=0.125 2023-11-22 14:13:12,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1973993.3333333333, ans=0.125 2023-11-22 14:13:15,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1974060.0, ans=0.125 2023-11-22 14:13:28,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1974126.6666666667, ans=0.1 2023-11-22 14:13:29,557 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7550, loss[loss=0.06372, simple_loss=0.07733, pruned_loss=0.01405, audio_tagging_loss=0.011, over 15703.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09407, pruned_loss=0.01518, audio_tagging_loss=0.009245, over 3041303.47 frames. ], batch size: 63, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:13:36,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1974126.6666666667, ans=0.125 2023-11-22 14:13:44,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2023-11-22 14:13:51,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-22 14:13:52,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1974193.3333333333, ans=0.0 2023-11-22 14:14:07,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296150 2023-11-22 14:14:10,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1974326.6666666667, ans=0.07 2023-11-22 14:14:24,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1974393.3333333333, ans=0.2 2023-11-22 14:14:33,778 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7600, loss[loss=0.08743, simple_loss=0.1113, pruned_loss=0.02263, audio_tagging_loss=0.009132, over 15104.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09384, pruned_loss=0.01517, audio_tagging_loss=0.009332, over 3040525.66 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:14:49,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1974526.6666666667, ans=0.0 2023-11-22 14:14:56,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-22 14:14:59,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1974593.3333333333, ans=0.125 2023-11-22 14:15:06,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-22 14:15:09,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-22 14:15:11,984 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296200 2023-11-22 14:15:15,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.584e+01 8.244e+01 8.772e+01 9.413e+01 1.274e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 14:15:20,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1974660.0, ans=0.1 2023-11-22 14:15:26,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1974726.6666666667, ans=0.2 2023-11-22 14:15:39,247 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7650, loss[loss=0.05535, simple_loss=0.06222, pruned_loss=0.01021, audio_tagging_loss=0.01403, over 14632.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09262, pruned_loss=0.015, audio_tagging_loss=0.00932, over 3039621.12 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:15:47,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1974793.3333333333, ans=0.1 2023-11-22 14:16:03,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1974860.0, ans=0.2 2023-11-22 14:16:10,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1974926.6666666667, ans=0.125 2023-11-22 14:16:15,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-22 14:16:17,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296250 2023-11-22 14:16:30,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-11-22 14:16:44,136 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7700, loss[loss=0.04306, simple_loss=0.04865, pruned_loss=0.008234, audio_tagging_loss=0.01051, over 13014.00 frames. ], tot_loss[loss=0.07078, simple_loss=0.09283, pruned_loss=0.01504, audio_tagging_loss=0.009329, over 3042127.25 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:17:08,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=12.0 2023-11-22 14:17:17,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1975260.0, ans=10.0 2023-11-22 14:17:21,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296300 2023-11-22 14:17:25,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.979e+01 8.266e+01 8.985e+01 9.649e+01 1.326e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 14:17:38,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1975393.3333333333, ans=0.1 2023-11-22 14:17:38,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1975393.3333333333, ans=0.125 2023-11-22 14:17:48,831 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7750, loss[loss=0.05508, simple_loss=0.07437, pruned_loss=0.008836, audio_tagging_loss=0.009059, over 15306.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09332, pruned_loss=0.01508, audio_tagging_loss=0.009366, over 3039704.36 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 32.0 2023-11-22 14:17:54,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1975460.0, ans=0.0 2023-11-22 14:17:58,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1975460.0, ans=0.04949747468305833 2023-11-22 14:18:01,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1975526.6666666667, ans=0.125 2023-11-22 14:18:06,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:18:26,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296350 2023-11-22 14:18:38,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2023-11-22 14:18:40,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1975726.6666666667, ans=0.125 2023-11-22 14:18:52,718 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7800, loss[loss=0.07668, simple_loss=0.1038, pruned_loss=0.01512, audio_tagging_loss=0.009677, over 15761.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09505, pruned_loss=0.01535, audio_tagging_loss=0.009267, over 3046617.05 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:18:54,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1975793.3333333333, ans=0.1 2023-11-22 14:18:56,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1975793.3333333333, ans=0.125 2023-11-22 14:18:56,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-22 14:19:28,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1975926.6666666667, ans=0.1 2023-11-22 14:19:31,000 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296400 2023-11-22 14:19:36,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 7.916e+01 8.644e+01 9.515e+01 1.205e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-22 14:19:41,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1975993.3333333333, ans=0.0 2023-11-22 14:19:41,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1975993.3333333333, ans=0.125 2023-11-22 14:19:41,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-22 14:19:42,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=22.5 2023-11-22 14:19:46,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1976060.0, ans=0.125 2023-11-22 14:19:58,145 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7850, loss[loss=0.07919, simple_loss=0.098, pruned_loss=0.0171, audio_tagging_loss=0.01309, over 15911.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.09494, pruned_loss=0.01528, audio_tagging_loss=0.009352, over 3036728.96 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:20:17,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-22 14:20:19,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1976193.3333333333, ans=0.125 2023-11-22 14:20:34,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296450 2023-11-22 14:20:35,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-22 14:20:52,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1976393.3333333333, ans=0.125 2023-11-22 14:20:53,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1976393.3333333333, ans=0.125 2023-11-22 14:21:02,729 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7900, loss[loss=0.06653, simple_loss=0.08152, pruned_loss=0.01507, audio_tagging_loss=0.0107, over 15488.00 frames. ], tot_loss[loss=0.07169, simple_loss=0.09415, pruned_loss=0.01514, audio_tagging_loss=0.009475, over 3040944.37 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:21:04,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1976460.0, ans=0.125 2023-11-22 14:21:39,792 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296500 2023-11-22 14:21:44,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.490e+01 8.875e+01 9.857e+01 1.314e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 14:22:06,342 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 7950, loss[loss=0.04096, simple_loss=0.04543, pruned_loss=0.005856, audio_tagging_loss=0.01239, over 14778.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09399, pruned_loss=0.01515, audio_tagging_loss=0.009557, over 3041042.47 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 8.0 2023-11-22 14:22:12,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1976793.3333333333, ans=0.1 2023-11-22 14:22:23,736 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 14:22:44,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296550 2023-11-22 14:22:57,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1977060.0, ans=0.0 2023-11-22 14:23:10,972 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8000, loss[loss=0.07304, simple_loss=0.1029, pruned_loss=0.01283, audio_tagging_loss=0.008769, over 16282.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.09405, pruned_loss=0.01512, audio_tagging_loss=0.009613, over 3046173.14 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:23:12,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1977126.6666666667, ans=0.04949747468305833 2023-11-22 14:23:23,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1977193.3333333333, ans=0.125 2023-11-22 14:23:34,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1977193.3333333333, ans=0.125 2023-11-22 14:23:39,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1977260.0, ans=0.125 2023-11-22 14:23:46,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1977260.0, ans=0.125 2023-11-22 14:23:48,290 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296600 2023-11-22 14:23:55,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.469e+01 8.936e+01 9.726e+01 1.360e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 14:24:06,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:24:11,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1977393.3333333333, ans=0.125 2023-11-22 14:24:12,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-22 14:24:16,308 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8050, loss[loss=0.05656, simple_loss=0.067, pruned_loss=0.01181, audio_tagging_loss=0.01124, over 15659.00 frames. ], tot_loss[loss=0.07195, simple_loss=0.09435, pruned_loss=0.01521, audio_tagging_loss=0.009563, over 3041805.46 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:24:28,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1977526.6666666667, ans=0.0 2023-11-22 14:24:39,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1977526.6666666667, ans=0.0 2023-11-22 14:24:45,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1977593.3333333333, ans=0.125 2023-11-22 14:24:51,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1977593.3333333333, ans=0.1 2023-11-22 14:24:52,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296650 2023-11-22 14:25:16,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-22 14:25:20,015 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8100, loss[loss=0.05423, simple_loss=0.07519, pruned_loss=0.007514, audio_tagging_loss=0.009119, over 15299.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.0941, pruned_loss=0.0151, audio_tagging_loss=0.009513, over 3040322.18 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:25:20,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1977793.3333333333, ans=0.125 2023-11-22 14:25:53,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1977926.6666666667, ans=0.0 2023-11-22 14:25:57,383 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296700 2023-11-22 14:26:03,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.263e+01 8.996e+01 9.612e+01 1.203e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 14:26:03,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1977993.3333333333, ans=0.125 2023-11-22 14:26:07,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1977993.3333333333, ans=0.0 2023-11-22 14:26:09,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1978060.0, ans=0.025 2023-11-22 14:26:11,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1978060.0, ans=0.125 2023-11-22 14:26:22,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2023-11-22 14:26:23,602 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8150, loss[loss=0.07206, simple_loss=0.09288, pruned_loss=0.01959, audio_tagging_loss=0.006023, over 13887.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09409, pruned_loss=0.01498, audio_tagging_loss=0.009425, over 3042495.96 frames. ], batch size: 53, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:27:00,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296750 2023-11-22 14:27:04,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-22 14:27:27,875 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8200, loss[loss=0.07265, simple_loss=0.09917, pruned_loss=0.01537, audio_tagging_loss=0.0077, over 15729.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09525, pruned_loss=0.01507, audio_tagging_loss=0.009207, over 3043779.31 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:27:29,136 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 14:27:30,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1978460.0, ans=0.07 2023-11-22 14:27:48,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1978526.6666666667, ans=0.125 2023-11-22 14:27:51,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-11-22 14:28:04,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296800 2023-11-22 14:28:04,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1978660.0, ans=0.0 2023-11-22 14:28:09,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1978660.0, ans=0.0 2023-11-22 14:28:10,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 7.904e+01 8.816e+01 9.431e+01 1.476e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 14:28:27,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1978726.6666666667, ans=0.2 2023-11-22 14:28:28,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1978726.6666666667, ans=0.0 2023-11-22 14:28:30,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1978726.6666666667, ans=0.125 2023-11-22 14:28:32,348 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8250, loss[loss=0.03942, simple_loss=0.05037, pruned_loss=0.003335, audio_tagging_loss=0.0109, over 15242.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.09426, pruned_loss=0.01482, audio_tagging_loss=0.009147, over 3047941.22 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:28:35,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-22 14:28:41,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1978793.3333333333, ans=0.0 2023-11-22 14:29:01,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1978926.6666666667, ans=0.125 2023-11-22 14:29:04,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1978926.6666666667, ans=0.2 2023-11-22 14:29:05,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1978926.6666666667, ans=0.125 2023-11-22 14:29:09,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296850 2023-11-22 14:29:26,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1979060.0, ans=0.0 2023-11-22 14:29:30,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2023-11-22 14:29:35,961 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8300, loss[loss=0.06333, simple_loss=0.08555, pruned_loss=0.01219, audio_tagging_loss=0.008359, over 15566.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.095, pruned_loss=0.01482, audio_tagging_loss=0.009116, over 3044891.87 frames. ], batch size: 60, lr: 2.72e-03, grad_scale: 8.0 2023-11-22 14:29:59,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1979193.3333333333, ans=0.1 2023-11-22 14:30:03,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1979260.0, ans=0.125 2023-11-22 14:30:13,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296900 2023-11-22 14:30:19,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1979326.6666666667, ans=0.0 2023-11-22 14:30:20,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.061e+01 8.620e+01 9.639e+01 1.216e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-22 14:30:23,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=10.0 2023-11-22 14:30:25,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-11-22 14:30:39,592 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8350, loss[loss=0.06575, simple_loss=0.0915, pruned_loss=0.01224, audio_tagging_loss=0.007758, over 16345.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09515, pruned_loss=0.0148, audio_tagging_loss=0.009147, over 3048218.74 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 8.0 2023-11-22 14:30:40,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2023-11-22 14:30:50,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1979460.0, ans=0.125 2023-11-22 14:30:54,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1979526.6666666667, ans=0.125 2023-11-22 14:31:00,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1979526.6666666667, ans=0.125 2023-11-22 14:31:15,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1979593.3333333333, ans=0.0 2023-11-22 14:31:17,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 296950 2023-11-22 14:31:20,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1979660.0, ans=0.1 2023-11-22 14:31:23,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-11-22 14:31:36,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-22 14:31:43,425 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8400, loss[loss=0.07105, simple_loss=0.09036, pruned_loss=0.01524, audio_tagging_loss=0.01063, over 14883.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09384, pruned_loss=0.01465, audio_tagging_loss=0.00915, over 3050312.72 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:32:01,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1979860.0, ans=0.125 2023-11-22 14:32:20,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297000 2023-11-22 14:32:24,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1979993.3333333333, ans=0.04949747468305833 2023-11-22 14:32:29,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.093e+01 8.742e+01 9.671e+01 1.399e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 14:32:40,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-22 14:32:47,268 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8450, loss[loss=0.04862, simple_loss=0.05573, pruned_loss=0.01025, audio_tagging_loss=0.0105, over 14455.00 frames. ], tot_loss[loss=0.07055, simple_loss=0.0933, pruned_loss=0.01466, audio_tagging_loss=0.009237, over 3052954.64 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:32:50,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1980126.6666666667, ans=0.125 2023-11-22 14:32:58,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-22 14:32:59,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1980193.3333333333, ans=0.0 2023-11-22 14:33:11,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2023-11-22 14:33:25,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297050 2023-11-22 14:33:52,053 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8500, loss[loss=0.04264, simple_loss=0.05815, pruned_loss=0.005274, audio_tagging_loss=0.008285, over 15263.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09348, pruned_loss=0.01471, audio_tagging_loss=0.009254, over 3054896.60 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:33:59,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1980460.0, ans=0.05 2023-11-22 14:34:19,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1980593.3333333333, ans=0.0 2023-11-22 14:34:21,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-22 14:34:29,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297100 2023-11-22 14:34:33,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1980660.0, ans=0.0 2023-11-22 14:34:36,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.259e+01 8.972e+01 9.645e+01 1.298e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 14:34:55,740 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8550, loss[loss=0.06198, simple_loss=0.07862, pruned_loss=0.01062, audio_tagging_loss=0.01204, over 14863.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.09403, pruned_loss=0.01475, audio_tagging_loss=0.009328, over 3059906.55 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:34:58,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1980793.3333333333, ans=0.2 2023-11-22 14:35:33,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297150 2023-11-22 14:35:36,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1980993.3333333333, ans=0.125 2023-11-22 14:35:59,836 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8600, loss[loss=0.08013, simple_loss=0.1122, pruned_loss=0.01728, audio_tagging_loss=0.006759, over 15628.00 frames. ], tot_loss[loss=0.07139, simple_loss=0.09444, pruned_loss=0.01493, audio_tagging_loss=0.009241, over 3060302.28 frames. ], batch size: 56, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:36:15,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2023-11-22 14:36:21,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1981193.3333333333, ans=0.0 2023-11-22 14:36:28,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1981260.0, ans=0.125 2023-11-22 14:36:33,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1981260.0, ans=0.5 2023-11-22 14:36:37,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297200 2023-11-22 14:36:45,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.334e+01 9.004e+01 9.545e+01 1.536e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 14:37:04,293 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8650, loss[loss=0.06368, simple_loss=0.07565, pruned_loss=0.01364, audio_tagging_loss=0.01221, over 14960.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09417, pruned_loss=0.01483, audio_tagging_loss=0.009333, over 3054423.40 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 16.0 2023-11-22 14:37:04,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1981460.0, ans=0.125 2023-11-22 14:37:17,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1981526.6666666667, ans=0.07 2023-11-22 14:37:35,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2023-11-22 14:37:41,406 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297250 2023-11-22 14:37:44,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1981660.0, ans=0.125 2023-11-22 14:37:46,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1981660.0, ans=0.1 2023-11-22 14:37:51,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1981660.0, ans=0.125 2023-11-22 14:38:00,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1981726.6666666667, ans=0.5 2023-11-22 14:38:03,397 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:38:08,731 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8700, loss[loss=0.06301, simple_loss=0.07573, pruned_loss=0.01354, audio_tagging_loss=0.01161, over 15118.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09393, pruned_loss=0.0148, audio_tagging_loss=0.00944, over 3051096.61 frames. ], batch size: 57, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:38:17,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1981793.3333333333, ans=0.0 2023-11-22 14:38:26,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1981860.0, ans=0.125 2023-11-22 14:38:33,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1981926.6666666667, ans=0.0 2023-11-22 14:38:35,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1981926.6666666667, ans=0.1 2023-11-22 14:38:45,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297300 2023-11-22 14:38:53,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.337e+01 8.929e+01 9.655e+01 1.210e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 14:39:01,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-22 14:39:06,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1982060.0, ans=0.0 2023-11-22 14:39:12,710 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8750, loss[loss=0.07472, simple_loss=0.09973, pruned_loss=0.01587, audio_tagging_loss=0.008986, over 15149.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.09395, pruned_loss=0.0149, audio_tagging_loss=0.009583, over 3055402.57 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:39:24,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1982193.3333333333, ans=0.125 2023-11-22 14:39:25,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2023-11-22 14:39:49,723 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297350 2023-11-22 14:39:58,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1982326.6666666667, ans=0.125 2023-11-22 14:40:01,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1982326.6666666667, ans=0.025 2023-11-22 14:40:04,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2023-11-22 14:40:09,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:40:16,885 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8800, loss[loss=0.07478, simple_loss=0.1008, pruned_loss=0.01326, audio_tagging_loss=0.0111, over 15645.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09402, pruned_loss=0.01498, audio_tagging_loss=0.009732, over 3050391.50 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:40:26,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1982460.0, ans=0.0 2023-11-22 14:40:32,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1982526.6666666667, ans=0.2 2023-11-22 14:40:51,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1982593.3333333333, ans=0.05 2023-11-22 14:40:53,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297400 2023-11-22 14:41:04,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.228e+01 8.972e+01 9.857e+01 1.350e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 14:41:12,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1982726.6666666667, ans=0.125 2023-11-22 14:41:19,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1982793.3333333333, ans=0.0 2023-11-22 14:41:21,496 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8850, loss[loss=0.0627, simple_loss=0.0792, pruned_loss=0.01378, audio_tagging_loss=0.009319, over 15402.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09393, pruned_loss=0.01504, audio_tagging_loss=0.00972, over 3048645.76 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:41:32,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1982860.0, ans=0.125 2023-11-22 14:41:33,618 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 14:41:37,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1982860.0, ans=0.125 2023-11-22 14:41:58,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297450 2023-11-22 14:42:07,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1982993.3333333333, ans=0.125 2023-11-22 14:42:24,228 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8900, loss[loss=0.07388, simple_loss=0.1073, pruned_loss=0.01328, audio_tagging_loss=0.006948, over 16469.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09403, pruned_loss=0.01508, audio_tagging_loss=0.009605, over 3052238.49 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:42:29,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1983126.6666666667, ans=0.025 2023-11-22 14:43:01,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297500 2023-11-22 14:43:02,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1983326.6666666667, ans=0.0 2023-11-22 14:43:07,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1983326.6666666667, ans=0.05 2023-11-22 14:43:11,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.250e+01 8.702e+01 9.316e+01 1.378e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 14:43:28,592 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 8950, loss[loss=0.06291, simple_loss=0.08405, pruned_loss=0.01268, audio_tagging_loss=0.008214, over 14568.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09439, pruned_loss=0.015, audio_tagging_loss=0.009454, over 3057884.03 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:43:31,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1983460.0, ans=0.125 2023-11-22 14:43:36,222 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:43:46,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1983526.6666666667, ans=0.0 2023-11-22 14:43:49,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1983526.6666666667, ans=0.125 2023-11-22 14:43:49,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1983526.6666666667, ans=0.0 2023-11-22 14:43:57,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1983593.3333333333, ans=0.125 2023-11-22 14:44:04,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297550 2023-11-22 14:44:09,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1983660.0, ans=0.125 2023-11-22 14:44:09,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1983660.0, ans=0.125 2023-11-22 14:44:14,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-22 14:44:28,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-22 14:44:31,336 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9000, loss[loss=0.05334, simple_loss=0.06732, pruned_loss=0.01055, audio_tagging_loss=0.009128, over 15894.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.09521, pruned_loss=0.01509, audio_tagging_loss=0.009276, over 3057463.49 frames. ], batch size: 62, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:44:31,336 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 14:44:48,529 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.7447, 4.0289, 4.0338, 4.0126], device='cuda:1') 2023-11-22 14:45:11,809 INFO [train_asr.py:1253] (1/4) Epoch 25, validation: loss=0.06003, simple_loss=0.05148, pruned_loss=0.00513, audio_tagging_loss=0.02916, over 4681554.00 frames. 2023-11-22 14:45:11,810 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 14:45:15,754 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 14:45:29,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-22 14:45:37,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2023-11-22 14:45:40,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1983926.6666666667, ans=0.2 2023-11-22 14:45:49,666 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297600 2023-11-22 14:45:59,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.485e+01 9.145e+01 9.654e+01 1.276e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-22 14:46:07,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1984060.0, ans=0.0 2023-11-22 14:46:16,679 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9050, loss[loss=0.06206, simple_loss=0.07635, pruned_loss=0.01527, audio_tagging_loss=0.008618, over 14797.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09466, pruned_loss=0.01508, audio_tagging_loss=0.009249, over 3056156.69 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:46:23,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.62 vs. limit=10.0 2023-11-22 14:46:35,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1984193.3333333333, ans=0.125 2023-11-22 14:46:45,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1984260.0, ans=0.0 2023-11-22 14:46:53,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297650 2023-11-22 14:47:00,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1984326.6666666667, ans=0.2 2023-11-22 14:47:07,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1984393.3333333333, ans=0.125 2023-11-22 14:47:19,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1984460.0, ans=0.0 2023-11-22 14:47:21,062 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9100, loss[loss=0.05828, simple_loss=0.07714, pruned_loss=0.01061, audio_tagging_loss=0.009099, over 16103.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09495, pruned_loss=0.01506, audio_tagging_loss=0.009113, over 3062050.60 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:47:57,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297700 2023-11-22 14:48:06,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1984660.0, ans=0.1 2023-11-22 14:48:07,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.161e+01 8.752e+01 9.286e+01 2.425e+02, threshold=1.750e+02, percent-clipped=1.0 2023-11-22 14:48:24,392 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9150, loss[loss=0.05708, simple_loss=0.07309, pruned_loss=0.0121, audio_tagging_loss=0.008434, over 13846.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09435, pruned_loss=0.01506, audio_tagging_loss=0.009103, over 3055369.75 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 8.0 2023-11-22 14:48:54,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1984926.6666666667, ans=0.1 2023-11-22 14:49:02,451 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297750 2023-11-22 14:49:24,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1985060.0, ans=0.0 2023-11-22 14:49:25,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1985060.0, ans=0.125 2023-11-22 14:49:26,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1985060.0, ans=0.1 2023-11-22 14:49:26,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1985060.0, ans=0.125 2023-11-22 14:49:28,681 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9200, loss[loss=0.06556, simple_loss=0.08889, pruned_loss=0.01132, audio_tagging_loss=0.009796, over 15010.00 frames. ], tot_loss[loss=0.07127, simple_loss=0.09419, pruned_loss=0.01498, audio_tagging_loss=0.009192, over 3044897.65 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:49:58,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-22 14:50:04,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1985260.0, ans=0.0 2023-11-22 14:50:05,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297800 2023-11-22 14:50:05,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1985326.6666666667, ans=0.125 2023-11-22 14:50:09,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.71 vs. limit=15.0 2023-11-22 14:50:16,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.283e+01 9.032e+01 9.826e+01 1.218e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-22 14:50:16,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1985326.6666666667, ans=0.125 2023-11-22 14:50:33,437 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9250, loss[loss=0.06907, simple_loss=0.08951, pruned_loss=0.01377, audio_tagging_loss=0.01054, over 16512.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.09295, pruned_loss=0.01467, audio_tagging_loss=0.009273, over 3046746.34 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:50:34,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-22 14:50:38,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-22 14:50:39,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1985460.0, ans=0.05 2023-11-22 14:50:46,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1985526.6666666667, ans=15.0 2023-11-22 14:50:52,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1985526.6666666667, ans=0.0 2023-11-22 14:51:10,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297850 2023-11-22 14:51:12,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1985660.0, ans=0.125 2023-11-22 14:51:21,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1985660.0, ans=0.0 2023-11-22 14:51:37,700 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9300, loss[loss=0.06648, simple_loss=0.08518, pruned_loss=0.01388, audio_tagging_loss=0.01001, over 14194.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09215, pruned_loss=0.01458, audio_tagging_loss=0.009241, over 3049916.76 frames. ], batch size: 52, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:51:46,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1985793.3333333333, ans=0.1 2023-11-22 14:52:00,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1985860.0, ans=0.035 2023-11-22 14:52:14,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297900 2023-11-22 14:52:24,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.630e+01 8.158e+01 8.859e+01 9.535e+01 1.280e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 14:52:35,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1986060.0, ans=0.125 2023-11-22 14:52:41,556 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9350, loss[loss=0.07213, simple_loss=0.0927, pruned_loss=0.01506, audio_tagging_loss=0.01072, over 15949.00 frames. ], tot_loss[loss=0.06995, simple_loss=0.09206, pruned_loss=0.01461, audio_tagging_loss=0.00931, over 3036446.10 frames. ], batch size: 62, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:52:49,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-22 14:53:03,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1986193.3333333333, ans=0.125 2023-11-22 14:53:15,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-22 14:53:20,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 297950 2023-11-22 14:53:25,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1986326.6666666667, ans=0.0 2023-11-22 14:53:26,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1986326.6666666667, ans=0.0 2023-11-22 14:53:48,019 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9400, loss[loss=0.07284, simple_loss=0.1027, pruned_loss=0.01289, audio_tagging_loss=0.008595, over 15032.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09295, pruned_loss=0.01481, audio_tagging_loss=0.009458, over 3036570.95 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:53:53,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1986460.0, ans=0.2 2023-11-22 14:54:02,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1986526.6666666667, ans=0.125 2023-11-22 14:54:09,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1986526.6666666667, ans=0.0 2023-11-22 14:54:19,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1986593.3333333333, ans=0.1 2023-11-22 14:54:24,948 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298000 2023-11-22 14:54:35,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.116e+01 8.691e+01 9.663e+01 1.322e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-22 14:54:39,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-22 14:54:51,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-11-22 14:54:52,067 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 14:54:52,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-22 14:54:53,295 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9450, loss[loss=0.06557, simple_loss=0.09295, pruned_loss=0.009622, audio_tagging_loss=0.009476, over 14865.00 frames. ], tot_loss[loss=0.07168, simple_loss=0.0942, pruned_loss=0.0151, audio_tagging_loss=0.009474, over 3042814.78 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:54:59,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1986793.3333333333, ans=0.0 2023-11-22 14:55:09,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2023-11-22 14:55:31,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298050 2023-11-22 14:55:53,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1987060.0, ans=0.0 2023-11-22 14:55:57,820 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9500, loss[loss=0.07649, simple_loss=0.1005, pruned_loss=0.01708, audio_tagging_loss=0.009181, over 14864.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09283, pruned_loss=0.01496, audio_tagging_loss=0.00965, over 3045788.26 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:56:10,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1987193.3333333333, ans=0.125 2023-11-22 14:56:25,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1987260.0, ans=0.125 2023-11-22 14:56:30,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1987260.0, ans=0.125 2023-11-22 14:56:35,414 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298100 2023-11-22 14:56:45,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.353e+01 9.126e+01 1.009e+02 1.546e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-22 14:57:02,721 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9550, loss[loss=0.0649, simple_loss=0.08679, pruned_loss=0.01152, audio_tagging_loss=0.009992, over 15761.00 frames. ], tot_loss[loss=0.07055, simple_loss=0.09213, pruned_loss=0.01475, audio_tagging_loss=0.009733, over 3044818.81 frames. ], batch size: 62, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 14:57:03,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-11-22 14:57:07,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1987460.0, ans=0.125 2023-11-22 14:57:12,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1987460.0, ans=0.05 2023-11-22 14:57:37,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1987593.3333333333, ans=0.125 2023-11-22 14:57:41,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298150 2023-11-22 14:58:07,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=12.0 2023-11-22 14:58:09,000 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9600, loss[loss=0.06809, simple_loss=0.08961, pruned_loss=0.01502, audio_tagging_loss=0.008269, over 15100.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09252, pruned_loss=0.01476, audio_tagging_loss=0.009801, over 3044229.31 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 14:58:10,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1987793.3333333333, ans=0.1 2023-11-22 14:58:13,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1987793.3333333333, ans=0.0 2023-11-22 14:58:15,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1987793.3333333333, ans=0.1 2023-11-22 14:58:17,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1987793.3333333333, ans=0.0 2023-11-22 14:58:20,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1987860.0, ans=0.125 2023-11-22 14:58:38,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1987926.6666666667, ans=0.0 2023-11-22 14:58:46,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298200 2023-11-22 14:58:57,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.183e+01 8.755e+01 9.694e+01 2.318e+02, threshold=1.751e+02, percent-clipped=1.0 2023-11-22 14:59:00,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-11-22 14:59:13,501 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9650, loss[loss=0.07958, simple_loss=0.1107, pruned_loss=0.01594, audio_tagging_loss=0.008299, over 15205.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.09326, pruned_loss=0.01507, audio_tagging_loss=0.009763, over 3039926.96 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 14:59:39,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1988260.0, ans=0.0 2023-11-22 14:59:51,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298250 2023-11-22 14:59:56,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-11-22 14:59:57,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1988326.6666666667, ans=0.95 2023-11-22 15:00:00,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1988326.6666666667, ans=0.125 2023-11-22 15:00:06,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1988393.3333333333, ans=0.1 2023-11-22 15:00:17,262 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9700, loss[loss=0.05744, simple_loss=0.07478, pruned_loss=0.01291, audio_tagging_loss=0.007146, over 15041.00 frames. ], tot_loss[loss=0.07141, simple_loss=0.09371, pruned_loss=0.01507, audio_tagging_loss=0.009484, over 3041330.72 frames. ], batch size: 57, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:00:39,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1988526.6666666667, ans=0.125 2023-11-22 15:00:41,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1988526.6666666667, ans=0.1 2023-11-22 15:00:50,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1988593.3333333333, ans=0.125 2023-11-22 15:00:55,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298300 2023-11-22 15:00:57,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1988660.0, ans=0.1 2023-11-22 15:01:04,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.401e+01 8.285e+01 8.809e+01 9.678e+01 1.163e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 15:01:21,862 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9750, loss[loss=0.06204, simple_loss=0.08928, pruned_loss=0.008001, audio_tagging_loss=0.009396, over 15753.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.09453, pruned_loss=0.01512, audio_tagging_loss=0.009374, over 3040930.48 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:01:27,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1988793.3333333333, ans=0.0 2023-11-22 15:01:31,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1988793.3333333333, ans=0.0 2023-11-22 15:01:46,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1988926.6666666667, ans=0.125 2023-11-22 15:01:58,970 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298350 2023-11-22 15:01:59,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1988993.3333333333, ans=0.125 2023-11-22 15:02:22,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1989060.0, ans=0.125 2023-11-22 15:02:25,377 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9800, loss[loss=0.07498, simple_loss=0.1083, pruned_loss=0.01413, audio_tagging_loss=0.006704, over 15199.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09333, pruned_loss=0.01477, audio_tagging_loss=0.009325, over 3036990.35 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:02:27,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1989126.6666666667, ans=0.125 2023-11-22 15:02:32,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1989126.6666666667, ans=0.125 2023-11-22 15:03:02,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298400 2023-11-22 15:03:08,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1989326.6666666667, ans=0.125 2023-11-22 15:03:14,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 7.942e+01 8.704e+01 9.561e+01 1.255e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-22 15:03:23,517 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:03:29,596 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9850, loss[loss=0.09126, simple_loss=0.1257, pruned_loss=0.02001, audio_tagging_loss=0.008416, over 15224.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09463, pruned_loss=0.01512, audio_tagging_loss=0.009157, over 3034420.34 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:03:44,397 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:03:51,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1989526.6666666667, ans=0.125 2023-11-22 15:04:06,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298450 2023-11-22 15:04:19,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1989726.6666666667, ans=0.125 2023-11-22 15:04:27,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1989726.6666666667, ans=0.5 2023-11-22 15:04:29,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=1989726.6666666667, ans=15.0 2023-11-22 15:04:33,842 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9900, loss[loss=0.06728, simple_loss=0.09112, pruned_loss=0.01198, audio_tagging_loss=0.009739, over 15656.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09418, pruned_loss=0.01496, audio_tagging_loss=0.009106, over 3035146.76 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:04:35,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1989793.3333333333, ans=0.0 2023-11-22 15:04:45,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1989860.0, ans=0.07 2023-11-22 15:05:00,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-22 15:05:10,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298500 2023-11-22 15:05:23,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.333e+01 9.011e+01 9.582e+01 1.423e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-22 15:05:37,736 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 9950, loss[loss=0.06512, simple_loss=0.08224, pruned_loss=0.0129, audio_tagging_loss=0.0111, over 15803.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09354, pruned_loss=0.01479, audio_tagging_loss=0.009086, over 3041010.67 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:05:50,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1990193.3333333333, ans=0.2 2023-11-22 15:05:59,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1990193.3333333333, ans=0.125 2023-11-22 15:06:11,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1990260.0, ans=0.125 2023-11-22 15:06:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1990260.0, ans=0.125 2023-11-22 15:06:15,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298550 2023-11-22 15:06:41,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-11-22 15:06:42,066 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10000, loss[loss=0.06587, simple_loss=0.08049, pruned_loss=0.0159, audio_tagging_loss=0.00972, over 14591.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09317, pruned_loss=0.01473, audio_tagging_loss=0.00903, over 3043211.46 frames. ], batch size: 57, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:06:55,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1990526.6666666667, ans=0.125 2023-11-22 15:07:00,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1990526.6666666667, ans=0.125 2023-11-22 15:07:14,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1990593.3333333333, ans=0.09899494936611666 2023-11-22 15:07:19,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298600 2023-11-22 15:07:31,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.143e+01 8.830e+01 9.648e+01 3.146e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-22 15:07:31,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1990660.0, ans=0.125 2023-11-22 15:07:47,392 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10050, loss[loss=0.06415, simple_loss=0.09181, pruned_loss=0.01108, audio_tagging_loss=0.007172, over 14467.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09336, pruned_loss=0.01466, audio_tagging_loss=0.009137, over 3042077.41 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:07:55,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1990793.3333333333, ans=0.2 2023-11-22 15:08:01,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-11-22 15:08:05,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1990860.0, ans=0.125 2023-11-22 15:08:24,267 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298650 2023-11-22 15:08:48,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-22 15:08:51,066 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10100, loss[loss=0.08105, simple_loss=0.1112, pruned_loss=0.01641, audio_tagging_loss=0.009035, over 16055.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.09452, pruned_loss=0.01501, audio_tagging_loss=0.009172, over 3047862.51 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:08:51,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-11-22 15:09:07,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1991193.3333333333, ans=0.125 2023-11-22 15:09:15,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-22 15:09:28,267 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298700 2023-11-22 15:09:38,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1991326.6666666667, ans=0.125 2023-11-22 15:09:39,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.326e+01 8.430e+01 8.908e+01 9.951e+01 1.277e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 15:09:41,553 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:09:54,824 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10150, loss[loss=0.06886, simple_loss=0.09694, pruned_loss=0.01302, audio_tagging_loss=0.007369, over 15133.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09444, pruned_loss=0.01519, audio_tagging_loss=0.009256, over 3047390.92 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:10:19,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1991593.3333333333, ans=0.2 2023-11-22 15:10:24,400 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:10:30,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1991593.3333333333, ans=0.0 2023-11-22 15:10:31,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298750 2023-11-22 15:10:58,806 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10200, loss[loss=0.06227, simple_loss=0.08079, pruned_loss=0.0104, audio_tagging_loss=0.01148, over 14332.00 frames. ], tot_loss[loss=0.0718, simple_loss=0.0947, pruned_loss=0.01518, audio_tagging_loss=0.009277, over 3054870.33 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:11:08,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1991793.3333333333, ans=0.1 2023-11-22 15:11:21,520 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:11:36,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298800 2023-11-22 15:11:41,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-22 15:11:48,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.315e+01 8.823e+01 9.660e+01 1.198e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-22 15:12:02,814 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10250, loss[loss=0.07439, simple_loss=0.08965, pruned_loss=0.01896, audio_tagging_loss=0.0106, over 14980.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09464, pruned_loss=0.01521, audio_tagging_loss=0.009363, over 3056269.09 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:12:29,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1992260.0, ans=0.125 2023-11-22 15:12:37,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1992260.0, ans=0.0 2023-11-22 15:12:40,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298850 2023-11-22 15:12:41,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1992326.6666666667, ans=0.125 2023-11-22 15:12:58,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-22 15:13:00,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1992393.3333333333, ans=0.125 2023-11-22 15:13:06,227 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10300, loss[loss=0.06087, simple_loss=0.08045, pruned_loss=0.01047, audio_tagging_loss=0.01018, over 15103.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09444, pruned_loss=0.01515, audio_tagging_loss=0.0094, over 3048120.62 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:13:23,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1992526.6666666667, ans=0.0 2023-11-22 15:13:32,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1992593.3333333333, ans=0.0 2023-11-22 15:13:43,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298900 2023-11-22 15:13:55,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.205e+01 8.813e+01 9.475e+01 1.270e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 15:14:03,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-22 15:14:04,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1992726.6666666667, ans=0.125 2023-11-22 15:14:10,581 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10350, loss[loss=0.08071, simple_loss=0.1003, pruned_loss=0.0192, audio_tagging_loss=0.01135, over 14401.00 frames. ], tot_loss[loss=0.07186, simple_loss=0.09415, pruned_loss=0.01526, audio_tagging_loss=0.009523, over 3051711.96 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:14:39,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1992926.6666666667, ans=0.05 2023-11-22 15:14:45,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1992926.6666666667, ans=0.125 2023-11-22 15:14:46,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1992926.6666666667, ans=0.125 2023-11-22 15:14:47,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 298950 2023-11-22 15:14:48,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1992993.3333333333, ans=0.0 2023-11-22 15:15:11,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1993060.0, ans=0.125 2023-11-22 15:15:14,424 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10400, loss[loss=0.06651, simple_loss=0.09702, pruned_loss=0.01151, audio_tagging_loss=0.006487, over 15287.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09413, pruned_loss=0.0152, audio_tagging_loss=0.009573, over 3046320.73 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:15:15,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1993126.6666666667, ans=0.125 2023-11-22 15:15:33,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1993193.3333333333, ans=0.0 2023-11-22 15:15:34,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1993193.3333333333, ans=0.125 2023-11-22 15:15:42,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1993260.0, ans=0.125 2023-11-22 15:15:49,434 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:15:51,592 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299000 2023-11-22 15:16:04,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1993393.3333333333, ans=0.125 2023-11-22 15:16:05,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.066e+01 8.819e+01 9.372e+01 1.193e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 15:16:05,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1993393.3333333333, ans=0.125 2023-11-22 15:16:09,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1993393.3333333333, ans=0.125 2023-11-22 15:16:18,141 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10450, loss[loss=0.07216, simple_loss=0.1041, pruned_loss=0.01071, audio_tagging_loss=0.009408, over 15877.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09341, pruned_loss=0.01487, audio_tagging_loss=0.009591, over 3047153.72 frames. ], batch size: 57, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:16:19,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1993460.0, ans=0.0 2023-11-22 15:16:55,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299050 2023-11-22 15:17:11,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1993726.6666666667, ans=0.0 2023-11-22 15:17:15,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1993726.6666666667, ans=0.0 2023-11-22 15:17:16,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-22 15:17:21,637 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10500, loss[loss=0.07478, simple_loss=0.09266, pruned_loss=0.01531, audio_tagging_loss=0.01315, over 15411.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09364, pruned_loss=0.0149, audio_tagging_loss=0.009485, over 3043361.15 frames. ], batch size: 59, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:17:25,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1993793.3333333333, ans=0.1 2023-11-22 15:17:32,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2023-11-22 15:17:51,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1993926.6666666667, ans=0.125 2023-11-22 15:17:56,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1993926.6666666667, ans=0.0 2023-11-22 15:17:58,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299100 2023-11-22 15:18:13,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.251e+01 8.969e+01 9.494e+01 1.165e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 15:18:25,931 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10550, loss[loss=0.0504, simple_loss=0.05599, pruned_loss=0.01162, audio_tagging_loss=0.01079, over 14892.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09318, pruned_loss=0.01475, audio_tagging_loss=0.009383, over 3048747.27 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:18:51,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1994260.0, ans=0.09899494936611666 2023-11-22 15:18:57,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1994260.0, ans=0.2 2023-11-22 15:19:03,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299150 2023-11-22 15:19:07,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1994326.6666666667, ans=0.0 2023-11-22 15:19:10,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1994326.6666666667, ans=0.125 2023-11-22 15:19:11,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1994326.6666666667, ans=0.2 2023-11-22 15:19:12,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1994326.6666666667, ans=0.125 2023-11-22 15:19:29,002 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10600, loss[loss=0.04443, simple_loss=0.05846, pruned_loss=0.004282, audio_tagging_loss=0.01092, over 16381.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09283, pruned_loss=0.01477, audio_tagging_loss=0.009376, over 3050608.89 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:19:59,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1994593.3333333333, ans=0.2 2023-11-22 15:20:05,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1994593.3333333333, ans=0.0 2023-11-22 15:20:06,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299200 2023-11-22 15:20:13,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1994660.0, ans=0.0 2023-11-22 15:20:20,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.203e+01 8.764e+01 9.272e+01 1.300e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 15:20:27,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1994726.6666666667, ans=0.125 2023-11-22 15:20:28,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1994726.6666666667, ans=0.125 2023-11-22 15:20:32,956 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10650, loss[loss=0.05726, simple_loss=0.07325, pruned_loss=0.01094, audio_tagging_loss=0.009703, over 16929.00 frames. ], tot_loss[loss=0.07091, simple_loss=0.09313, pruned_loss=0.01503, audio_tagging_loss=0.009312, over 3046167.12 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:20:35,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1994793.3333333333, ans=0.0 2023-11-22 15:20:39,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-22 15:20:52,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1994860.0, ans=0.125 2023-11-22 15:21:05,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1994926.6666666667, ans=0.0 2023-11-22 15:21:10,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299250 2023-11-22 15:21:18,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1994993.3333333333, ans=0.0 2023-11-22 15:21:20,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1994993.3333333333, ans=0.1 2023-11-22 15:21:37,635 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10700, loss[loss=0.0808, simple_loss=0.1099, pruned_loss=0.01857, audio_tagging_loss=0.007266, over 15206.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09435, pruned_loss=0.0151, audio_tagging_loss=0.009144, over 3043245.60 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:21:48,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-22 15:22:06,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1995260.0, ans=0.125 2023-11-22 15:22:14,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299300 2023-11-22 15:22:17,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-22 15:22:26,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1995326.6666666667, ans=10.0 2023-11-22 15:22:28,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.250e+01 8.791e+01 9.452e+01 1.180e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 15:22:38,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1995393.3333333333, ans=0.07 2023-11-22 15:22:40,690 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10750, loss[loss=0.07577, simple_loss=0.1076, pruned_loss=0.01435, audio_tagging_loss=0.007623, over 14799.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09356, pruned_loss=0.01491, audio_tagging_loss=0.009132, over 3036359.79 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:22:55,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1995526.6666666667, ans=0.125 2023-11-22 15:23:15,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1995593.3333333333, ans=0.125 2023-11-22 15:23:18,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299350 2023-11-22 15:23:21,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1995660.0, ans=0.0 2023-11-22 15:23:23,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1995660.0, ans=10.0 2023-11-22 15:23:29,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1995660.0, ans=0.09899494936611666 2023-11-22 15:23:34,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1995726.6666666667, ans=0.0 2023-11-22 15:23:39,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1995726.6666666667, ans=0.0 2023-11-22 15:23:44,112 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10800, loss[loss=0.07921, simple_loss=0.1134, pruned_loss=0.01511, audio_tagging_loss=0.007408, over 15802.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09345, pruned_loss=0.01479, audio_tagging_loss=0.009126, over 3046033.90 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 32.0 2023-11-22 15:23:53,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1995793.3333333333, ans=0.125 2023-11-22 15:23:56,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-22 15:24:12,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1995926.6666666667, ans=0.07 2023-11-22 15:24:14,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2023-11-22 15:24:21,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299400 2023-11-22 15:24:30,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=8.0 2023-11-22 15:24:36,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.569e+01 8.241e+01 8.735e+01 9.368e+01 1.325e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 15:24:49,273 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10850, loss[loss=0.0996, simple_loss=0.1305, pruned_loss=0.02494, audio_tagging_loss=0.00942, over 15332.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.0939, pruned_loss=0.01485, audio_tagging_loss=0.009094, over 3037365.03 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 16.0 2023-11-22 15:24:53,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1996126.6666666667, ans=0.1 2023-11-22 15:24:53,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-22 15:25:21,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1996260.0, ans=0.1 2023-11-22 15:25:22,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1996260.0, ans=0.125 2023-11-22 15:25:25,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299450 2023-11-22 15:25:26,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1996326.6666666667, ans=0.0 2023-11-22 15:25:33,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2023-11-22 15:25:49,788 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:25:53,358 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10900, loss[loss=0.07402, simple_loss=0.1034, pruned_loss=0.01266, audio_tagging_loss=0.009668, over 14459.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.09329, pruned_loss=0.01465, audio_tagging_loss=0.009198, over 3040350.17 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:26:05,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1996526.6666666667, ans=0.125 2023-11-22 15:26:30,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299500 2023-11-22 15:26:42,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-22 15:26:45,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.150e+01 8.424e+01 9.041e+01 9.724e+01 1.556e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-22 15:26:57,234 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 10950, loss[loss=0.0605, simple_loss=0.07648, pruned_loss=0.01001, audio_tagging_loss=0.01226, over 15526.00 frames. ], tot_loss[loss=0.06967, simple_loss=0.09211, pruned_loss=0.01433, audio_tagging_loss=0.009277, over 3041232.45 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:26:58,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1996793.3333333333, ans=0.125 2023-11-22 15:27:29,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1996926.6666666667, ans=0.125 2023-11-22 15:27:34,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299550 2023-11-22 15:27:39,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1996993.3333333333, ans=0.2 2023-11-22 15:28:01,241 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11000, loss[loss=0.04987, simple_loss=0.05839, pruned_loss=0.00942, audio_tagging_loss=0.01126, over 15381.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09281, pruned_loss=0.01445, audio_tagging_loss=0.009251, over 3046871.59 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:28:02,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1997126.6666666667, ans=0.04949747468305833 2023-11-22 15:28:05,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2023-11-22 15:28:07,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1997126.6666666667, ans=0.2 2023-11-22 15:28:11,072 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:28:22,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1997193.3333333333, ans=0.125 2023-11-22 15:28:22,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1997193.3333333333, ans=0.125 2023-11-22 15:28:37,971 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299600 2023-11-22 15:28:47,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1997326.6666666667, ans=0.125 2023-11-22 15:28:53,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.275e+01 8.821e+01 9.536e+01 1.341e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 15:28:59,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1997393.3333333333, ans=0.125 2023-11-22 15:29:05,389 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11050, loss[loss=0.0877, simple_loss=0.1193, pruned_loss=0.02049, audio_tagging_loss=0.007559, over 15003.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09345, pruned_loss=0.01464, audio_tagging_loss=0.009352, over 3046124.04 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:29:23,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1997526.6666666667, ans=0.0 2023-11-22 15:29:42,159 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299650 2023-11-22 15:30:05,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1997726.6666666667, ans=0.025 2023-11-22 15:30:08,988 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11100, loss[loss=0.07376, simple_loss=0.1009, pruned_loss=0.01328, audio_tagging_loss=0.01004, over 14782.00 frames. ], tot_loss[loss=0.07062, simple_loss=0.09303, pruned_loss=0.01461, audio_tagging_loss=0.009496, over 3051476.24 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:30:10,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1997793.3333333333, ans=0.0 2023-11-22 15:30:15,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1997793.3333333333, ans=0.125 2023-11-22 15:30:44,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1997926.6666666667, ans=0.1 2023-11-22 15:30:45,971 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299700 2023-11-22 15:31:01,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.427e+01 8.934e+01 9.666e+01 1.581e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 15:31:05,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-22 15:31:12,916 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11150, loss[loss=0.07778, simple_loss=0.1032, pruned_loss=0.01637, audio_tagging_loss=0.009798, over 15061.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09408, pruned_loss=0.01488, audio_tagging_loss=0.009594, over 3048421.73 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:31:20,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1998126.6666666667, ans=0.1 2023-11-22 15:31:49,997 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299750 2023-11-22 15:32:16,618 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11200, loss[loss=0.08797, simple_loss=0.1107, pruned_loss=0.02429, audio_tagging_loss=0.008328, over 16154.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09361, pruned_loss=0.01465, audio_tagging_loss=0.009635, over 3043905.62 frames. ], batch size: 60, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:32:21,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1998460.0, ans=0.125 2023-11-22 15:32:23,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1998460.0, ans=0.2 2023-11-22 15:32:23,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1998460.0, ans=0.2 2023-11-22 15:32:26,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1998460.0, ans=0.0 2023-11-22 15:32:48,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1998593.3333333333, ans=0.125 2023-11-22 15:32:53,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1998593.3333333333, ans=0.1 2023-11-22 15:32:54,727 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299800 2023-11-22 15:33:01,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1998660.0, ans=0.125 2023-11-22 15:33:09,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.174e+01 8.782e+01 9.719e+01 1.179e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 15:33:22,077 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11250, loss[loss=0.06295, simple_loss=0.07548, pruned_loss=0.01506, audio_tagging_loss=0.01014, over 14514.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09301, pruned_loss=0.01472, audio_tagging_loss=0.009584, over 3040863.19 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:33:27,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1998793.3333333333, ans=0.125 2023-11-22 15:33:29,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1998793.3333333333, ans=0.125 2023-11-22 15:33:36,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1998860.0, ans=0.07 2023-11-22 15:33:43,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1998860.0, ans=0.1 2023-11-22 15:33:45,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1998860.0, ans=0.125 2023-11-22 15:33:58,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-22 15:33:58,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299850 2023-11-22 15:34:00,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1998993.3333333333, ans=0.0 2023-11-22 15:34:04,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-22 15:34:10,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1998993.3333333333, ans=0.1 2023-11-22 15:34:22,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1999060.0, ans=0.2 2023-11-22 15:34:25,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1999126.6666666667, ans=0.0 2023-11-22 15:34:26,204 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11300, loss[loss=0.08261, simple_loss=0.1116, pruned_loss=0.01971, audio_tagging_loss=0.007098, over 15365.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09359, pruned_loss=0.01492, audio_tagging_loss=0.009445, over 3040242.93 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:34:38,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-11-22 15:35:03,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299900 2023-11-22 15:35:13,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1999326.6666666667, ans=0.0 2023-11-22 15:35:16,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1999393.3333333333, ans=15.0 2023-11-22 15:35:18,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.156e+01 8.491e+01 9.004e+01 9.715e+01 1.251e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 15:35:29,973 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11350, loss[loss=0.06517, simple_loss=0.09262, pruned_loss=0.01106, audio_tagging_loss=0.007808, over 15306.00 frames. ], tot_loss[loss=0.07087, simple_loss=0.09335, pruned_loss=0.01486, audio_tagging_loss=0.009336, over 3048980.49 frames. ], batch size: 56, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:35:43,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1999526.6666666667, ans=0.125 2023-11-22 15:36:03,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1999593.3333333333, ans=0.125 2023-11-22 15:36:07,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 299950 2023-11-22 15:36:23,670 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:36:33,736 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11400, loss[loss=0.06925, simple_loss=0.08131, pruned_loss=0.01702, audio_tagging_loss=0.01157, over 14154.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09275, pruned_loss=0.0147, audio_tagging_loss=0.00933, over 3037501.27 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:36:49,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1999860.0, ans=0.015 2023-11-22 15:36:56,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1999860.0, ans=0.125 2023-11-22 15:37:10,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300000 2023-11-22 15:37:12,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1999993.3333333333, ans=0.035 2023-11-22 15:37:29,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2000060.0, ans=0.0 2023-11-22 15:37:31,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.071e+01 8.858e+01 9.602e+01 1.299e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 15:37:41,007 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11450, loss[loss=0.09432, simple_loss=0.1283, pruned_loss=0.02223, audio_tagging_loss=0.007946, over 14753.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09366, pruned_loss=0.01478, audio_tagging_loss=0.009227, over 3030491.56 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:37:44,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2000126.6666666667, ans=0.0 2023-11-22 15:37:49,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2000126.6666666667, ans=0.2 2023-11-22 15:38:06,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2000260.0, ans=0.125 2023-11-22 15:38:14,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2000260.0, ans=0.125 2023-11-22 15:38:15,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2000260.0, ans=0.1 2023-11-22 15:38:18,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300050 2023-11-22 15:38:40,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2000393.3333333333, ans=0.125 2023-11-22 15:38:40,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2000393.3333333333, ans=0.0 2023-11-22 15:38:44,816 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11500, loss[loss=0.08204, simple_loss=0.1053, pruned_loss=0.01635, audio_tagging_loss=0.01303, over 15443.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09418, pruned_loss=0.01499, audio_tagging_loss=0.009263, over 3033826.33 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:38:46,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2000460.0, ans=0.1 2023-11-22 15:38:54,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2000460.0, ans=0.125 2023-11-22 15:39:22,046 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300100 2023-11-22 15:39:36,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2000726.6666666667, ans=0.0 2023-11-22 15:39:38,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.223e+01 8.870e+01 9.437e+01 1.249e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 15:39:43,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2000726.6666666667, ans=0.0 2023-11-22 15:39:47,758 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11550, loss[loss=0.06974, simple_loss=0.08247, pruned_loss=0.01888, audio_tagging_loss=0.009622, over 13352.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09425, pruned_loss=0.01514, audio_tagging_loss=0.009261, over 3039878.32 frames. ], batch size: 52, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:39:49,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2000793.3333333333, ans=0.0 2023-11-22 15:40:25,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300150 2023-11-22 15:40:27,715 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 15:40:30,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2000993.3333333333, ans=0.125 2023-11-22 15:40:52,017 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11600, loss[loss=0.0932, simple_loss=0.117, pruned_loss=0.02377, audio_tagging_loss=0.01092, over 14397.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09408, pruned_loss=0.01499, audio_tagging_loss=0.009209, over 3039634.63 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:41:21,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=12.0 2023-11-22 15:41:21,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-22 15:41:29,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300200 2023-11-22 15:41:46,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.408e+01 8.800e+01 9.381e+01 1.504e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 15:41:48,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2001393.3333333333, ans=0.2 2023-11-22 15:41:56,485 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11650, loss[loss=0.05321, simple_loss=0.06738, pruned_loss=0.008934, audio_tagging_loss=0.01059, over 15106.00 frames. ], tot_loss[loss=0.07115, simple_loss=0.09402, pruned_loss=0.01489, audio_tagging_loss=0.009243, over 3045939.37 frames. ], batch size: 60, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:42:10,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-11-22 15:42:19,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2001526.6666666667, ans=0.1 2023-11-22 15:42:21,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2001593.3333333333, ans=0.2 2023-11-22 15:42:24,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2001593.3333333333, ans=0.125 2023-11-22 15:42:29,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2001593.3333333333, ans=0.125 2023-11-22 15:42:33,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300250 2023-11-22 15:42:34,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2001660.0, ans=0.0 2023-11-22 15:42:50,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2001726.6666666667, ans=0.125 2023-11-22 15:42:53,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2001726.6666666667, ans=0.95 2023-11-22 15:42:56,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=22.5 2023-11-22 15:42:59,658 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11700, loss[loss=0.07125, simple_loss=0.0893, pruned_loss=0.01687, audio_tagging_loss=0.009735, over 16436.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09427, pruned_loss=0.01511, audio_tagging_loss=0.009259, over 3045937.66 frames. ], batch size: 63, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:43:02,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2001793.3333333333, ans=0.125 2023-11-22 15:43:19,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2001860.0, ans=0.0 2023-11-22 15:43:23,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2001860.0, ans=0.5 2023-11-22 15:43:32,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2001926.6666666667, ans=0.125 2023-11-22 15:43:37,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300300 2023-11-22 15:43:43,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2001993.3333333333, ans=0.125 2023-11-22 15:43:48,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2001993.3333333333, ans=0.0 2023-11-22 15:43:52,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2002060.0, ans=0.125 2023-11-22 15:43:53,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.322e+01 8.823e+01 9.489e+01 1.251e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-22 15:44:03,414 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11750, loss[loss=0.0774, simple_loss=0.1018, pruned_loss=0.0198, audio_tagging_loss=0.006717, over 14541.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09385, pruned_loss=0.015, audio_tagging_loss=0.009276, over 3041139.86 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:44:19,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2002193.3333333333, ans=0.125 2023-11-22 15:44:21,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2002193.3333333333, ans=0.125 2023-11-22 15:44:39,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300350 2023-11-22 15:44:47,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2002326.6666666667, ans=0.125 2023-11-22 15:44:59,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2002393.3333333333, ans=0.1 2023-11-22 15:45:00,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2002393.3333333333, ans=0.2 2023-11-22 15:45:07,027 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11800, loss[loss=0.07775, simple_loss=0.1027, pruned_loss=0.01829, audio_tagging_loss=0.008129, over 15032.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09293, pruned_loss=0.01485, audio_tagging_loss=0.009274, over 3039601.12 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:45:16,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:45:19,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2002526.6666666667, ans=0.02 2023-11-22 15:45:30,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2002593.3333333333, ans=0.125 2023-11-22 15:45:43,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300400 2023-11-22 15:45:46,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2002660.0, ans=0.025 2023-11-22 15:46:01,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.204e+01 8.722e+01 9.366e+01 1.185e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-22 15:46:10,191 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11850, loss[loss=0.06417, simple_loss=0.08868, pruned_loss=0.01138, audio_tagging_loss=0.008459, over 14861.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09232, pruned_loss=0.01479, audio_tagging_loss=0.00943, over 3032923.97 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:46:14,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2002793.3333333333, ans=0.1 2023-11-22 15:46:23,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2002860.0, ans=0.125 2023-11-22 15:46:25,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2002860.0, ans=0.125 2023-11-22 15:46:47,774 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300450 2023-11-22 15:46:48,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2002993.3333333333, ans=0.0 2023-11-22 15:46:58,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2002993.3333333333, ans=0.125 2023-11-22 15:47:12,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2003126.6666666667, ans=0.0 2023-11-22 15:47:13,761 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11900, loss[loss=0.07248, simple_loss=0.1047, pruned_loss=0.01141, audio_tagging_loss=0.008743, over 15945.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.09261, pruned_loss=0.01481, audio_tagging_loss=0.009569, over 3035767.69 frames. ], batch size: 59, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:47:33,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2003193.3333333333, ans=0.0 2023-11-22 15:47:38,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-22 15:47:45,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2003260.0, ans=0.125 2023-11-22 15:47:47,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2003260.0, ans=0.0 2023-11-22 15:47:50,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300500 2023-11-22 15:47:50,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-22 15:47:57,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2023-11-22 15:48:05,850 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:48:07,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.243e+01 8.822e+01 9.597e+01 1.131e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 15:48:17,680 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 11950, loss[loss=0.05191, simple_loss=0.06919, pruned_loss=0.009372, audio_tagging_loss=0.007943, over 14496.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09282, pruned_loss=0.01482, audio_tagging_loss=0.009585, over 3043379.44 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 16.0 2023-11-22 15:48:17,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2003460.0, ans=0.0 2023-11-22 15:48:31,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2003526.6666666667, ans=0.0 2023-11-22 15:48:39,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2003526.6666666667, ans=0.0 2023-11-22 15:48:41,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2003593.3333333333, ans=0.0 2023-11-22 15:48:41,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2003593.3333333333, ans=0.1 2023-11-22 15:48:41,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2003593.3333333333, ans=0.125 2023-11-22 15:48:54,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300550 2023-11-22 15:49:11,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2023-11-22 15:49:13,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2003726.6666666667, ans=0.125 2023-11-22 15:49:18,883 INFO [train_asr.py:1221] (1/4) Epoch 25, batch 12000, loss[loss=0.06653, simple_loss=0.08629, pruned_loss=0.013, audio_tagging_loss=0.01038, over 14807.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09173, pruned_loss=0.01458, audio_tagging_loss=0.009778, over 3048272.88 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 32.0 2023-11-22 15:49:18,883 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 15:49:38,331 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8423, 5.7216, 5.5431, 5.4334], device='cuda:1') 2023-11-22 15:49:56,094 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2689, 4.2587, 4.4683, 4.4734], device='cuda:1') 2023-11-22 15:49:56,979 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1939, 4.0030, 4.1226, 3.6024, 4.1715, 3.8954, 3.8663, 3.8267], device='cuda:1') 2023-11-22 15:49:59,241 INFO [train_asr.py:1253] (1/4) Epoch 25, validation: loss=0.05961, simple_loss=0.05152, pruned_loss=0.005134, audio_tagging_loss=0.02872, over 4681554.00 frames. 2023-11-22 15:49:59,242 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 15:50:12,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2003860.0, ans=0.1 2023-11-22 15:50:25,523 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:50:25,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2003926.6666666667, ans=0.2 2023-11-22 15:51:02,496 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 0, loss[loss=0.08207, simple_loss=0.07874, pruned_loss=0.01476, audio_tagging_loss=0.02795, over 14727.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.07874, pruned_loss=0.01476, audio_tagging_loss=0.02795, over 14727.00 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:51:02,497 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 15:51:37,680 INFO [train_asr.py:1253] (1/4) Epoch 26, validation: loss=0.05869, simple_loss=0.05153, pruned_loss=0.005094, audio_tagging_loss=0.02783, over 4681554.00 frames. 2023-11-22 15:51:37,681 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 15:51:38,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-22 15:51:39,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2003960.0, ans=0.0 2023-11-22 15:51:43,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300600 2023-11-22 15:51:46,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2023-11-22 15:52:01,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.448e+01 9.337e+01 1.025e+02 1.392e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-22 15:52:26,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-22 15:52:38,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2004226.6666666667, ans=0.5 2023-11-22 15:52:43,338 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 50, loss[loss=0.07467, simple_loss=0.08372, pruned_loss=0.01195, audio_tagging_loss=0.02087, over 15740.00 frames. ], tot_loss[loss=0.07751, simple_loss=0.08965, pruned_loss=0.01413, audio_tagging_loss=0.01855, over 681836.49 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:52:43,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2004293.3333333333, ans=0.0 2023-11-22 15:52:48,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300650 2023-11-22 15:53:05,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2004360.0, ans=6.0 2023-11-22 15:53:27,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2004493.3333333333, ans=0.0 2023-11-22 15:53:46,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2023-11-22 15:53:48,090 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 100, loss[loss=0.07132, simple_loss=0.08602, pruned_loss=0.01591, audio_tagging_loss=0.01241, over 15326.00 frames. ], tot_loss[loss=0.079, simple_loss=0.09291, pruned_loss=0.01516, audio_tagging_loss=0.01738, over 1207718.55 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:53:52,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2004626.6666666667, ans=0.2 2023-11-22 15:53:53,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300700 2023-11-22 15:53:56,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2004626.6666666667, ans=0.125 2023-11-22 15:54:11,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 8.718e+01 9.300e+01 1.020e+02 1.184e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-22 15:54:16,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2004760.0, ans=0.125 2023-11-22 15:54:21,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2004760.0, ans=0.125 2023-11-22 15:54:53,129 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 150, loss[loss=0.07387, simple_loss=0.0926, pruned_loss=0.01678, audio_tagging_loss=0.01078, over 15053.00 frames. ], tot_loss[loss=0.07638, simple_loss=0.09243, pruned_loss=0.01466, audio_tagging_loss=0.0155, over 1609691.01 frames. ], batch size: 57, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:54:58,096 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300750 2023-11-22 15:55:04,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-22 15:55:04,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2005026.6666666667, ans=0.125 2023-11-22 15:55:11,204 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:55:16,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2005026.6666666667, ans=0.125 2023-11-22 15:55:42,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2005160.0, ans=0.0 2023-11-22 15:55:57,032 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 200, loss[loss=0.06805, simple_loss=0.08958, pruned_loss=0.01255, audio_tagging_loss=0.0107, over 14526.00 frames. ], tot_loss[loss=0.07516, simple_loss=0.09349, pruned_loss=0.01483, audio_tagging_loss=0.01359, over 1926322.68 frames. ], batch size: 56, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:56:02,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300800 2023-11-22 15:56:17,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2005360.0, ans=0.125 2023-11-22 15:56:20,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.352e+01 8.945e+01 1.004e+02 1.668e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-22 15:56:38,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=2005493.3333333333, ans=15.0 2023-11-22 15:56:38,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.09 vs. limit=15.0 2023-11-22 15:56:42,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2005493.3333333333, ans=0.1 2023-11-22 15:57:01,874 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 250, loss[loss=0.09046, simple_loss=0.1163, pruned_loss=0.02484, audio_tagging_loss=0.007491, over 14115.00 frames. ], tot_loss[loss=0.07365, simple_loss=0.09346, pruned_loss=0.01468, audio_tagging_loss=0.01224, over 2176323.80 frames. ], batch size: 52, lr: 2.65e-03, grad_scale: 32.0 2023-11-22 15:57:06,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300850 2023-11-22 15:57:22,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-22 15:57:39,554 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 15:57:43,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=22.5 2023-11-22 15:57:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2005826.6666666667, ans=0.125 2023-11-22 15:57:53,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2005893.3333333333, ans=0.2 2023-11-22 15:58:07,213 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 300, loss[loss=0.0772, simple_loss=0.1077, pruned_loss=0.01536, audio_tagging_loss=0.008005, over 15531.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.09408, pruned_loss=0.01483, audio_tagging_loss=0.01132, over 2374011.60 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 15:58:12,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300900 2023-11-22 15:58:30,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.395e+01 9.138e+01 9.781e+01 1.359e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-22 15:58:34,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2006093.3333333333, ans=0.125 2023-11-22 15:58:36,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2006093.3333333333, ans=0.0 2023-11-22 15:58:40,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2006093.3333333333, ans=0.125 2023-11-22 15:58:48,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2006160.0, ans=0.1 2023-11-22 15:58:57,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2006160.0, ans=0.125 2023-11-22 15:59:06,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2006226.6666666667, ans=0.125 2023-11-22 15:59:08,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-11-22 15:59:12,892 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 350, loss[loss=0.08638, simple_loss=0.108, pruned_loss=0.02453, audio_tagging_loss=0.00783, over 15876.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09393, pruned_loss=0.01481, audio_tagging_loss=0.01077, over 2513765.08 frames. ], batch size: 60, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 15:59:14,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2006293.3333333333, ans=0.125 2023-11-22 15:59:18,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 300950 2023-11-22 15:59:19,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2006293.3333333333, ans=0.125 2023-11-22 15:59:23,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2006293.3333333333, ans=0.125 2023-11-22 16:00:17,745 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 400, loss[loss=0.07184, simple_loss=0.08906, pruned_loss=0.01622, audio_tagging_loss=0.01109, over 15656.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.09313, pruned_loss=0.01448, audio_tagging_loss=0.01043, over 2634752.30 frames. ], batch size: 61, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:00:23,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301000 2023-11-22 16:00:32,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2006693.3333333333, ans=0.0 2023-11-22 16:00:38,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-22 16:00:42,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.912e+01 8.126e+01 8.693e+01 9.257e+01 1.166e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 16:00:57,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2006826.6666666667, ans=0.125 2023-11-22 16:00:59,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2006826.6666666667, ans=0.0 2023-11-22 16:01:21,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2023-11-22 16:01:23,357 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 450, loss[loss=0.07076, simple_loss=0.09431, pruned_loss=0.01707, audio_tagging_loss=0.006537, over 14938.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.0932, pruned_loss=0.01449, audio_tagging_loss=0.01009, over 2728336.76 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:01:29,077 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301050 2023-11-22 16:01:39,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-22 16:01:50,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2007093.3333333333, ans=0.0 2023-11-22 16:01:55,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2007093.3333333333, ans=0.125 2023-11-22 16:02:27,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-22 16:02:28,746 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 500, loss[loss=0.06565, simple_loss=0.09246, pruned_loss=0.01252, audio_tagging_loss=0.006893, over 14635.00 frames. ], tot_loss[loss=0.07115, simple_loss=0.09347, pruned_loss=0.01454, audio_tagging_loss=0.009876, over 2799452.97 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:02:33,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301100 2023-11-22 16:02:50,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.306e+01 8.908e+01 9.851e+01 1.431e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-22 16:03:23,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2023-11-22 16:03:32,319 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 550, loss[loss=0.05981, simple_loss=0.07991, pruned_loss=0.008974, audio_tagging_loss=0.01088, over 14523.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09353, pruned_loss=0.01458, audio_tagging_loss=0.009772, over 2849914.33 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:03:36,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2007626.6666666667, ans=0.125 2023-11-22 16:03:37,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301150 2023-11-22 16:04:05,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2007760.0, ans=0.0 2023-11-22 16:04:10,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2007826.6666666667, ans=0.1 2023-11-22 16:04:36,986 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 600, loss[loss=0.05959, simple_loss=0.07695, pruned_loss=0.01271, audio_tagging_loss=0.008401, over 13980.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09327, pruned_loss=0.01458, audio_tagging_loss=0.009718, over 2888351.08 frames. ], batch size: 53, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:04:42,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301200 2023-11-22 16:04:42,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2007960.0, ans=0.0 2023-11-22 16:04:49,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2008026.6666666667, ans=0.0 2023-11-22 16:05:01,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.667e+01 8.281e+01 8.745e+01 9.297e+01 1.116e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 16:05:18,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2008160.0, ans=0.2 2023-11-22 16:05:34,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2008226.6666666667, ans=0.125 2023-11-22 16:05:41,991 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 650, loss[loss=0.06739, simple_loss=0.09128, pruned_loss=0.01197, audio_tagging_loss=0.009788, over 14541.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09297, pruned_loss=0.01451, audio_tagging_loss=0.009634, over 2918536.91 frames. ], batch size: 54, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:05:46,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301250 2023-11-22 16:06:10,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2008426.6666666667, ans=0.2 2023-11-22 16:06:15,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2008426.6666666667, ans=0.0 2023-11-22 16:06:20,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2008493.3333333333, ans=0.125 2023-11-22 16:06:21,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2008493.3333333333, ans=0.2 2023-11-22 16:06:41,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2008560.0, ans=0.1 2023-11-22 16:06:45,894 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 700, loss[loss=0.07423, simple_loss=0.09527, pruned_loss=0.01378, audio_tagging_loss=0.01281, over 16144.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.09318, pruned_loss=0.01454, audio_tagging_loss=0.009642, over 2942183.00 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:06:50,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301300 2023-11-22 16:07:10,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2008693.3333333333, ans=0.0 2023-11-22 16:07:11,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 8.183e+01 8.760e+01 9.542e+01 1.269e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 16:07:35,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2008826.6666666667, ans=0.2 2023-11-22 16:07:35,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2008826.6666666667, ans=0.0 2023-11-22 16:07:47,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.07 vs. limit=10.0 2023-11-22 16:07:49,624 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 750, loss[loss=0.07958, simple_loss=0.1039, pruned_loss=0.01986, audio_tagging_loss=0.007764, over 15700.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09419, pruned_loss=0.01471, audio_tagging_loss=0.009524, over 2973275.33 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:07:55,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301350 2023-11-22 16:08:02,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2009026.6666666667, ans=0.125 2023-11-22 16:08:18,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-22 16:08:38,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-22 16:08:55,391 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 800, loss[loss=0.08015, simple_loss=0.1014, pruned_loss=0.01997, audio_tagging_loss=0.009483, over 14678.00 frames. ], tot_loss[loss=0.07158, simple_loss=0.09434, pruned_loss=0.01488, audio_tagging_loss=0.009532, over 2986863.28 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:09:00,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301400 2023-11-22 16:09:11,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2009360.0, ans=0.125 2023-11-22 16:09:19,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.335e+01 8.906e+01 9.640e+01 1.225e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-22 16:09:26,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2009426.6666666667, ans=0.125 2023-11-22 16:09:44,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2023-11-22 16:10:00,022 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 850, loss[loss=0.08731, simple_loss=0.1184, pruned_loss=0.02027, audio_tagging_loss=0.007868, over 16226.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09481, pruned_loss=0.015, audio_tagging_loss=0.009578, over 2996687.14 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:10:04,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301450 2023-11-22 16:10:20,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2009693.3333333333, ans=0.2 2023-11-22 16:10:38,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-22 16:10:55,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2009893.3333333333, ans=0.125 2023-11-22 16:11:03,350 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 900, loss[loss=0.08297, simple_loss=0.1119, pruned_loss=0.01835, audio_tagging_loss=0.008657, over 15762.00 frames. ], tot_loss[loss=0.07231, simple_loss=0.09508, pruned_loss=0.01522, audio_tagging_loss=0.009554, over 3006310.11 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:11:08,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301500 2023-11-22 16:11:14,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2009960.0, ans=0.07 2023-11-22 16:11:25,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2010026.6666666667, ans=0.125 2023-11-22 16:11:28,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.648e+01 9.209e+01 1.007e+02 1.462e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-22 16:11:35,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2010093.3333333333, ans=0.125 2023-11-22 16:11:37,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2010093.3333333333, ans=0.0 2023-11-22 16:11:43,327 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 16:11:45,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2010160.0, ans=0.1 2023-11-22 16:12:07,505 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 950, loss[loss=0.07646, simple_loss=0.09901, pruned_loss=0.01876, audio_tagging_loss=0.008201, over 15206.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09393, pruned_loss=0.01506, audio_tagging_loss=0.009431, over 3010812.31 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:12:13,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-22 16:12:13,792 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301550 2023-11-22 16:12:22,514 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 16:12:38,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2010426.6666666667, ans=0.125 2023-11-22 16:13:11,451 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1000, loss[loss=0.07867, simple_loss=0.106, pruned_loss=0.01576, audio_tagging_loss=0.009916, over 15450.00 frames. ], tot_loss[loss=0.07143, simple_loss=0.09392, pruned_loss=0.01517, audio_tagging_loss=0.009299, over 3018439.47 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:13:15,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2010626.6666666667, ans=0.0 2023-11-22 16:13:16,300 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301600 2023-11-22 16:13:26,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2010693.3333333333, ans=0.125 2023-11-22 16:13:35,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.207e+01 8.822e+01 9.755e+01 1.375e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 16:13:39,723 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:13:48,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2010760.0, ans=0.125 2023-11-22 16:14:11,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2010893.3333333333, ans=0.07 2023-11-22 16:14:15,389 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1050, loss[loss=0.0707, simple_loss=0.09851, pruned_loss=0.01473, audio_tagging_loss=0.006715, over 14963.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09398, pruned_loss=0.01522, audio_tagging_loss=0.009171, over 3017805.70 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:14:20,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301650 2023-11-22 16:14:45,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2011093.3333333333, ans=0.125 2023-11-22 16:14:59,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-11-22 16:15:09,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2011226.6666666667, ans=0.2 2023-11-22 16:15:12,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2011226.6666666667, ans=0.2 2023-11-22 16:15:16,759 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 16:15:17,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2011226.6666666667, ans=0.09899494936611666 2023-11-22 16:15:20,078 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1100, loss[loss=0.05931, simple_loss=0.08159, pruned_loss=0.01188, audio_tagging_loss=0.006628, over 15106.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09324, pruned_loss=0.01511, audio_tagging_loss=0.009173, over 3026330.94 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:15:23,758 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:15:25,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301700 2023-11-22 16:15:28,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2011293.3333333333, ans=0.125 2023-11-22 16:15:30,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-22 16:15:33,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2011360.0, ans=0.125 2023-11-22 16:15:36,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2011360.0, ans=0.125 2023-11-22 16:15:42,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2011360.0, ans=0.125 2023-11-22 16:15:44,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.766e+01 8.159e+01 8.793e+01 9.276e+01 1.252e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 16:15:46,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2011426.6666666667, ans=0.2 2023-11-22 16:16:05,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=12.0 2023-11-22 16:16:17,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-11-22 16:16:24,775 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1150, loss[loss=0.08976, simple_loss=0.1251, pruned_loss=0.0187, audio_tagging_loss=0.008492, over 15026.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09315, pruned_loss=0.01495, audio_tagging_loss=0.009191, over 3034011.32 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:16:27,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2011626.6666666667, ans=0.125 2023-11-22 16:16:29,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301750 2023-11-22 16:16:29,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2011626.6666666667, ans=0.125 2023-11-22 16:16:40,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2011693.3333333333, ans=0.0 2023-11-22 16:16:43,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-22 16:17:28,013 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1200, loss[loss=0.06455, simple_loss=0.07981, pruned_loss=0.01391, audio_tagging_loss=0.01073, over 15754.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09316, pruned_loss=0.01496, audio_tagging_loss=0.00919, over 3033478.68 frames. ], batch size: 59, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:17:29,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-22 16:17:33,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301800 2023-11-22 16:17:33,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2011960.0, ans=0.0 2023-11-22 16:17:53,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2012093.3333333333, ans=0.125 2023-11-22 16:17:54,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.163e+01 8.768e+01 9.430e+01 1.301e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 16:18:08,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2012160.0, ans=0.0 2023-11-22 16:18:16,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2012160.0, ans=0.125 2023-11-22 16:18:19,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-22 16:18:31,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2012293.3333333333, ans=0.1 2023-11-22 16:18:32,635 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1250, loss[loss=0.08813, simple_loss=0.1257, pruned_loss=0.02022, audio_tagging_loss=0.005053, over 14710.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09393, pruned_loss=0.01498, audio_tagging_loss=0.009048, over 3034027.88 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:18:37,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301850 2023-11-22 16:18:37,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2012293.3333333333, ans=0.0 2023-11-22 16:19:06,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-11-22 16:19:32,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2012560.0, ans=0.0 2023-11-22 16:19:33,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2012560.0, ans=0.0 2023-11-22 16:19:37,024 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1300, loss[loss=0.09557, simple_loss=0.1215, pruned_loss=0.02729, audio_tagging_loss=0.007545, over 15565.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09395, pruned_loss=0.01503, audio_tagging_loss=0.009102, over 3034661.71 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:19:42,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301900 2023-11-22 16:19:43,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2012626.6666666667, ans=0.0 2023-11-22 16:20:02,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.148e+01 8.961e+01 9.729e+01 1.373e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 16:20:04,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-22 16:20:18,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2012826.6666666667, ans=0.125 2023-11-22 16:20:18,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2012826.6666666667, ans=0.125 2023-11-22 16:20:23,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-22 16:20:28,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-22 16:20:41,725 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1350, loss[loss=0.06616, simple_loss=0.07951, pruned_loss=0.01462, audio_tagging_loss=0.01178, over 14084.00 frames. ], tot_loss[loss=0.07127, simple_loss=0.09408, pruned_loss=0.01509, audio_tagging_loss=0.00913, over 3044273.74 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:20:46,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 301950 2023-11-22 16:21:00,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2013026.6666666667, ans=0.0 2023-11-22 16:21:13,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-22 16:21:18,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2013093.3333333333, ans=0.125 2023-11-22 16:21:27,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2013160.0, ans=0.2 2023-11-22 16:21:29,493 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:21:39,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2013226.6666666667, ans=0.125 2023-11-22 16:21:46,584 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1400, loss[loss=0.07179, simple_loss=0.09577, pruned_loss=0.01219, audio_tagging_loss=0.01172, over 15516.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.0925, pruned_loss=0.01494, audio_tagging_loss=0.009326, over 3041963.61 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:21:51,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302000 2023-11-22 16:21:51,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2013293.3333333333, ans=0.0 2023-11-22 16:21:53,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2013293.3333333333, ans=0.0 2023-11-22 16:22:13,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.088e+01 8.833e+01 9.465e+01 1.089e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 16:22:16,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-22 16:22:22,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-22 16:22:38,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-22 16:22:50,769 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1450, loss[loss=0.09943, simple_loss=0.1274, pruned_loss=0.02684, audio_tagging_loss=0.008898, over 15207.00 frames. ], tot_loss[loss=0.07137, simple_loss=0.09388, pruned_loss=0.01514, audio_tagging_loss=0.009291, over 3047158.69 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:22:55,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=12.0 2023-11-22 16:22:55,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302050 2023-11-22 16:23:16,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2013760.0, ans=0.125 2023-11-22 16:23:18,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2013760.0, ans=0.125 2023-11-22 16:23:21,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2023-11-22 16:23:53,659 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1500, loss[loss=0.06127, simple_loss=0.07833, pruned_loss=0.01167, audio_tagging_loss=0.01043, over 15127.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09311, pruned_loss=0.01496, audio_tagging_loss=0.009389, over 3047348.26 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:23:59,431 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302100 2023-11-22 16:24:13,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2014026.6666666667, ans=0.125 2023-11-22 16:24:20,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.201e+01 8.781e+01 9.408e+01 1.231e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 16:24:35,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2014160.0, ans=0.125 2023-11-22 16:24:42,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2014160.0, ans=0.0 2023-11-22 16:24:48,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2014226.6666666667, ans=0.0 2023-11-22 16:24:58,110 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1550, loss[loss=0.06806, simple_loss=0.08862, pruned_loss=0.01619, audio_tagging_loss=0.007556, over 14753.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.0936, pruned_loss=0.01503, audio_tagging_loss=0.009511, over 3047038.78 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:25:04,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302150 2023-11-22 16:25:24,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2014426.6666666667, ans=0.1 2023-11-22 16:25:29,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2014426.6666666667, ans=0.2 2023-11-22 16:25:29,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2023-11-22 16:25:38,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2014493.3333333333, ans=0.0 2023-11-22 16:25:39,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2014493.3333333333, ans=0.0 2023-11-22 16:25:43,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2014493.3333333333, ans=0.125 2023-11-22 16:25:43,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2014493.3333333333, ans=0.125 2023-11-22 16:25:46,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2014493.3333333333, ans=0.125 2023-11-22 16:26:03,143 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1600, loss[loss=0.0625, simple_loss=0.08051, pruned_loss=0.01297, audio_tagging_loss=0.00928, over 15407.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09378, pruned_loss=0.01493, audio_tagging_loss=0.009606, over 3051414.34 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:26:07,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302200 2023-11-22 16:26:29,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.320e+01 8.949e+01 9.605e+01 1.369e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-22 16:26:56,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-22 16:27:06,574 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1650, loss[loss=0.06088, simple_loss=0.07086, pruned_loss=0.01112, audio_tagging_loss=0.01433, over 14934.00 frames. ], tot_loss[loss=0.07155, simple_loss=0.09416, pruned_loss=0.01488, audio_tagging_loss=0.009588, over 3057786.19 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:27:10,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2014960.0, ans=0.125 2023-11-22 16:27:11,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302250 2023-11-22 16:27:11,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2014960.0, ans=0.125 2023-11-22 16:27:11,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2014960.0, ans=0.1 2023-11-22 16:27:41,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2015093.3333333333, ans=0.125 2023-11-22 16:27:56,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2015226.6666666667, ans=0.04949747468305833 2023-11-22 16:27:59,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-22 16:28:09,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2015293.3333333333, ans=0.0 2023-11-22 16:28:10,465 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1700, loss[loss=0.08028, simple_loss=0.1129, pruned_loss=0.01585, audio_tagging_loss=0.008002, over 15535.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.0939, pruned_loss=0.01478, audio_tagging_loss=0.009607, over 3060901.88 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:28:10,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2015293.3333333333, ans=0.2 2023-11-22 16:28:15,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302300 2023-11-22 16:28:31,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2015360.0, ans=0.0 2023-11-22 16:28:35,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2015426.6666666667, ans=0.125 2023-11-22 16:28:36,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.605e+01 8.196e+01 8.694e+01 9.442e+01 1.208e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 16:29:12,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-22 16:29:13,279 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1750, loss[loss=0.07373, simple_loss=0.1016, pruned_loss=0.01594, audio_tagging_loss=0.00698, over 14883.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09378, pruned_loss=0.01485, audio_tagging_loss=0.009518, over 3062344.12 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:29:18,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302350 2023-11-22 16:29:25,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2015693.3333333333, ans=0.1 2023-11-22 16:29:25,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2015693.3333333333, ans=0.125 2023-11-22 16:29:28,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2015693.3333333333, ans=0.125 2023-11-22 16:29:50,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2015826.6666666667, ans=0.0 2023-11-22 16:30:09,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2015893.3333333333, ans=0.125 2023-11-22 16:30:15,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2023-11-22 16:30:17,054 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1800, loss[loss=0.06761, simple_loss=0.08565, pruned_loss=0.01316, audio_tagging_loss=0.01162, over 14480.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09472, pruned_loss=0.01502, audio_tagging_loss=0.009344, over 3067577.41 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:30:17,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2015960.0, ans=0.125 2023-11-22 16:30:22,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302400 2023-11-22 16:30:40,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2016026.6666666667, ans=0.025 2023-11-22 16:30:44,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.100e+01 8.696e+01 9.181e+01 1.134e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 16:31:20,399 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1850, loss[loss=0.07511, simple_loss=0.101, pruned_loss=0.01828, audio_tagging_loss=0.006351, over 16576.00 frames. ], tot_loss[loss=0.07161, simple_loss=0.0944, pruned_loss=0.01506, audio_tagging_loss=0.009351, over 3074558.69 frames. ], batch size: 61, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:31:24,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-22 16:31:24,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.46 vs. limit=10.0 2023-11-22 16:31:25,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302450 2023-11-22 16:31:34,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-22 16:31:36,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2016360.0, ans=0.125 2023-11-22 16:31:48,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2016426.6666666667, ans=0.125 2023-11-22 16:31:52,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2016426.6666666667, ans=6.0 2023-11-22 16:31:58,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-22 16:31:59,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2016493.3333333333, ans=0.0 2023-11-22 16:32:06,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2016493.3333333333, ans=0.125 2023-11-22 16:32:24,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2016560.0, ans=0.125 2023-11-22 16:32:26,254 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1900, loss[loss=0.06465, simple_loss=0.07847, pruned_loss=0.01481, audio_tagging_loss=0.01061, over 14824.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09374, pruned_loss=0.01493, audio_tagging_loss=0.009223, over 3074577.38 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:32:32,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302500 2023-11-22 16:32:46,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-22 16:32:48,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2023-11-22 16:32:52,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.093e+01 8.828e+01 9.639e+01 1.190e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 16:33:04,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2023-11-22 16:33:17,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2016893.3333333333, ans=0.1 2023-11-22 16:33:29,824 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 1950, loss[loss=0.08334, simple_loss=0.1075, pruned_loss=0.01719, audio_tagging_loss=0.0124, over 15567.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09396, pruned_loss=0.01491, audio_tagging_loss=0.009125, over 3060606.97 frames. ], batch size: 59, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:33:34,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302550 2023-11-22 16:33:44,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2017026.6666666667, ans=0.125 2023-11-22 16:33:48,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2017026.6666666667, ans=0.0 2023-11-22 16:33:54,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2017093.3333333333, ans=0.0 2023-11-22 16:34:12,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-22 16:34:15,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2017160.0, ans=0.125 2023-11-22 16:34:15,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2017160.0, ans=0.125 2023-11-22 16:34:18,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2017160.0, ans=0.0 2023-11-22 16:34:32,527 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2000, loss[loss=0.06931, simple_loss=0.09021, pruned_loss=0.01625, audio_tagging_loss=0.007949, over 15291.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09286, pruned_loss=0.01479, audio_tagging_loss=0.009177, over 3059847.23 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 32.0 2023-11-22 16:34:37,531 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302600 2023-11-22 16:34:54,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2017360.0, ans=0.1 2023-11-22 16:35:00,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.236e+01 8.987e+01 9.619e+01 1.204e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 16:35:27,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2017560.0, ans=0.0 2023-11-22 16:35:27,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2017560.0, ans=0.125 2023-11-22 16:35:37,064 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2050, loss[loss=0.07314, simple_loss=0.09529, pruned_loss=0.0173, audio_tagging_loss=0.008189, over 13662.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09355, pruned_loss=0.01489, audio_tagging_loss=0.009045, over 3058852.09 frames. ], batch size: 53, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:35:42,621 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302650 2023-11-22 16:35:46,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-22 16:35:48,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2017626.6666666667, ans=0.125 2023-11-22 16:35:51,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=22.5 2023-11-22 16:35:54,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2017693.3333333333, ans=0.2 2023-11-22 16:36:10,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-22 16:36:41,106 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2100, loss[loss=0.06109, simple_loss=0.08006, pruned_loss=0.01172, audio_tagging_loss=0.009345, over 16625.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09273, pruned_loss=0.01457, audio_tagging_loss=0.009119, over 3058399.43 frames. ], batch size: 62, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:36:46,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302700 2023-11-22 16:36:51,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2017960.0, ans=0.0 2023-11-22 16:37:02,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2018026.6666666667, ans=0.0 2023-11-22 16:37:09,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.575e+01 8.367e+01 8.996e+01 9.804e+01 1.259e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 16:37:30,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2018160.0, ans=0.2 2023-11-22 16:37:31,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-22 16:37:44,335 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2150, loss[loss=0.08053, simple_loss=0.1106, pruned_loss=0.01702, audio_tagging_loss=0.00823, over 14558.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09281, pruned_loss=0.01452, audio_tagging_loss=0.009065, over 3056317.25 frames. ], batch size: 54, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:37:49,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302750 2023-11-22 16:37:50,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2018293.3333333333, ans=0.125 2023-11-22 16:38:01,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2018360.0, ans=0.0 2023-11-22 16:38:23,077 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:38:47,923 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2200, loss[loss=0.06664, simple_loss=0.09173, pruned_loss=0.01309, audio_tagging_loss=0.007693, over 15159.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09318, pruned_loss=0.0146, audio_tagging_loss=0.009027, over 3061078.72 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:38:52,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302800 2023-11-22 16:39:07,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-22 16:39:16,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 8.391e+01 8.919e+01 9.600e+01 1.144e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 16:39:18,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2018760.0, ans=0.1 2023-11-22 16:39:24,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2018826.6666666667, ans=0.2 2023-11-22 16:39:30,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2018826.6666666667, ans=15.0 2023-11-22 16:39:41,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2018893.3333333333, ans=0.125 2023-11-22 16:39:46,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2023-11-22 16:39:51,941 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2250, loss[loss=0.08325, simple_loss=0.1157, pruned_loss=0.01529, audio_tagging_loss=0.01012, over 15098.00 frames. ], tot_loss[loss=0.07066, simple_loss=0.09368, pruned_loss=0.01469, audio_tagging_loss=0.009128, over 3055533.69 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:39:56,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302850 2023-11-22 16:40:01,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-11-22 16:40:15,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2019093.3333333333, ans=0.125 2023-11-22 16:40:28,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2019160.0, ans=0.2 2023-11-22 16:40:54,896 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2300, loss[loss=0.07172, simple_loss=0.09253, pruned_loss=0.01548, audio_tagging_loss=0.009976, over 15907.00 frames. ], tot_loss[loss=0.07137, simple_loss=0.09457, pruned_loss=0.01483, audio_tagging_loss=0.009249, over 3056526.16 frames. ], batch size: 59, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:40:59,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302900 2023-11-22 16:41:11,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-22 16:41:15,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2019360.0, ans=0.125 2023-11-22 16:41:19,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2019426.6666666667, ans=0.09899494936611666 2023-11-22 16:41:24,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.135e+01 8.659e+01 9.382e+01 1.572e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-22 16:41:50,900 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:41:58,381 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2350, loss[loss=0.05617, simple_loss=0.07295, pruned_loss=0.01139, audio_tagging_loss=0.008297, over 16046.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09403, pruned_loss=0.01483, audio_tagging_loss=0.009251, over 3052227.47 frames. ], batch size: 60, lr: 2.64e-03, grad_scale: 8.0 2023-11-22 16:42:03,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 302950 2023-11-22 16:42:28,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2019760.0, ans=0.125 2023-11-22 16:42:38,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2019826.6666666667, ans=0.0 2023-11-22 16:42:48,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2023-11-22 16:42:55,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=22.5 2023-11-22 16:43:02,292 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2400, loss[loss=0.06839, simple_loss=0.09228, pruned_loss=0.01274, audio_tagging_loss=0.009519, over 15204.00 frames. ], tot_loss[loss=0.07112, simple_loss=0.09379, pruned_loss=0.01482, audio_tagging_loss=0.009404, over 3046777.91 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:43:03,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2019960.0, ans=0.0 2023-11-22 16:43:07,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303000 2023-11-22 16:43:11,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-22 16:43:31,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.331e+01 8.278e+01 8.806e+01 9.570e+01 1.233e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 16:43:38,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2020093.3333333333, ans=0.0 2023-11-22 16:44:05,451 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2450, loss[loss=0.05876, simple_loss=0.07343, pruned_loss=0.01049, audio_tagging_loss=0.01155, over 15593.00 frames. ], tot_loss[loss=0.07105, simple_loss=0.09363, pruned_loss=0.01463, audio_tagging_loss=0.009606, over 3050055.33 frames. ], batch size: 60, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:44:10,408 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303050 2023-11-22 16:44:20,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2020360.0, ans=0.0 2023-11-22 16:44:25,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2020360.0, ans=0.125 2023-11-22 16:44:27,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2020360.0, ans=0.035 2023-11-22 16:44:29,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-11-22 16:44:32,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2020426.6666666667, ans=0.0 2023-11-22 16:44:46,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2020493.3333333333, ans=0.0 2023-11-22 16:45:01,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2020560.0, ans=0.125 2023-11-22 16:45:02,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-22 16:45:08,170 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2500, loss[loss=0.05679, simple_loss=0.07402, pruned_loss=0.009264, audio_tagging_loss=0.01052, over 14982.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09261, pruned_loss=0.01434, audio_tagging_loss=0.009693, over 3049978.54 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:45:13,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303100 2023-11-22 16:45:29,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2020693.3333333333, ans=0.025 2023-11-22 16:45:37,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.127e+01 8.699e+01 9.305e+01 1.342e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 16:45:50,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2020826.6666666667, ans=0.125 2023-11-22 16:46:12,046 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2550, loss[loss=0.09197, simple_loss=0.1217, pruned_loss=0.0237, audio_tagging_loss=0.007416, over 15819.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09268, pruned_loss=0.01438, audio_tagging_loss=0.009525, over 3047916.92 frames. ], batch size: 57, lr: 2.64e-03, grad_scale: 16.0 2023-11-22 16:46:16,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303150 2023-11-22 16:46:23,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-22 16:46:26,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2021026.6666666667, ans=0.0 2023-11-22 16:46:31,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2021026.6666666667, ans=0.125 2023-11-22 16:46:33,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2021026.6666666667, ans=0.0 2023-11-22 16:46:42,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2021093.3333333333, ans=0.125 2023-11-22 16:46:47,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-22 16:46:53,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2021160.0, ans=0.125 2023-11-22 16:46:59,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2021160.0, ans=0.125 2023-11-22 16:47:11,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2021226.6666666667, ans=0.125 2023-11-22 16:47:15,978 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2600, loss[loss=0.09336, simple_loss=0.1301, pruned_loss=0.01825, audio_tagging_loss=0.01008, over 14417.00 frames. ], tot_loss[loss=0.06967, simple_loss=0.09224, pruned_loss=0.01413, audio_tagging_loss=0.009426, over 3049680.06 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:47:20,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303200 2023-11-22 16:47:24,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=22.5 2023-11-22 16:47:36,244 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 16:47:44,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.118e+01 8.704e+01 9.634e+01 1.548e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-22 16:47:52,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2021493.3333333333, ans=0.125 2023-11-22 16:47:57,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2021493.3333333333, ans=0.2 2023-11-22 16:48:15,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2021560.0, ans=0.2 2023-11-22 16:48:19,340 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2650, loss[loss=0.0579, simple_loss=0.07293, pruned_loss=0.01323, audio_tagging_loss=0.008206, over 13992.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.0934, pruned_loss=0.01439, audio_tagging_loss=0.009321, over 3049230.45 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:48:19,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2021626.6666666667, ans=0.125 2023-11-22 16:48:24,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303250 2023-11-22 16:48:27,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-22 16:48:34,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-22 16:48:42,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2021693.3333333333, ans=0.125 2023-11-22 16:48:43,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2021693.3333333333, ans=10.0 2023-11-22 16:49:23,147 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2700, loss[loss=0.07165, simple_loss=0.1019, pruned_loss=0.01287, audio_tagging_loss=0.007854, over 15653.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.0942, pruned_loss=0.01469, audio_tagging_loss=0.009311, over 3052254.30 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:49:28,822 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303300 2023-11-22 16:49:52,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-22 16:49:53,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.226e+01 8.902e+01 9.755e+01 1.486e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-22 16:50:17,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2022226.6666666667, ans=0.125 2023-11-22 16:50:23,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2022226.6666666667, ans=0.125 2023-11-22 16:50:26,595 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2750, loss[loss=0.06002, simple_loss=0.07433, pruned_loss=0.01255, audio_tagging_loss=0.01031, over 15238.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.09335, pruned_loss=0.01464, audio_tagging_loss=0.009372, over 3046404.03 frames. ], batch size: 60, lr: 2.63e-03, grad_scale: 8.0 2023-11-22 16:50:31,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303350 2023-11-22 16:50:43,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2022360.0, ans=0.125 2023-11-22 16:50:45,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-22 16:50:53,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2022426.6666666667, ans=0.0 2023-11-22 16:50:57,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2022426.6666666667, ans=0.125 2023-11-22 16:51:01,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2022426.6666666667, ans=0.2 2023-11-22 16:51:01,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2022426.6666666667, ans=0.07 2023-11-22 16:51:13,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2022493.3333333333, ans=0.125 2023-11-22 16:51:20,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-22 16:51:21,032 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:51:30,231 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2800, loss[loss=0.06715, simple_loss=0.08716, pruned_loss=0.01444, audio_tagging_loss=0.009126, over 15699.00 frames. ], tot_loss[loss=0.0703, simple_loss=0.09318, pruned_loss=0.01449, audio_tagging_loss=0.009219, over 3044377.99 frames. ], batch size: 60, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:51:35,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303400 2023-11-22 16:51:41,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2022626.6666666667, ans=0.1 2023-11-22 16:51:46,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-22 16:52:01,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.038e+01 8.757e+01 9.417e+01 1.241e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 16:52:34,462 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2850, loss[loss=0.06113, simple_loss=0.07351, pruned_loss=0.01175, audio_tagging_loss=0.01263, over 16352.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09215, pruned_loss=0.0144, audio_tagging_loss=0.009212, over 3044283.23 frames. ], batch size: 63, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:52:40,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303450 2023-11-22 16:52:47,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2023026.6666666667, ans=0.125 2023-11-22 16:53:02,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-11-22 16:53:14,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2023160.0, ans=0.2 2023-11-22 16:53:37,525 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2900, loss[loss=0.06257, simple_loss=0.07673, pruned_loss=0.01421, audio_tagging_loss=0.00999, over 15027.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09284, pruned_loss=0.0146, audio_tagging_loss=0.009169, over 3043063.22 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:53:42,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303500 2023-11-22 16:53:52,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2023360.0, ans=0.2 2023-11-22 16:54:07,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.261e+01 8.993e+01 9.849e+01 1.244e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 16:54:17,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2023493.3333333333, ans=0.125 2023-11-22 16:54:18,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2023-11-22 16:54:25,766 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 16:54:29,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2023560.0, ans=0.125 2023-11-22 16:54:32,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2023-11-22 16:54:39,785 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 2950, loss[loss=0.06156, simple_loss=0.08615, pruned_loss=0.008833, audio_tagging_loss=0.009649, over 14922.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09378, pruned_loss=0.01465, audio_tagging_loss=0.00911, over 3042336.91 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:54:45,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303550 2023-11-22 16:54:48,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2023-11-22 16:54:49,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2023626.6666666667, ans=0.125 2023-11-22 16:54:57,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.60 vs. limit=15.0 2023-11-22 16:55:01,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2023693.3333333333, ans=0.125 2023-11-22 16:55:43,875 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3000, loss[loss=0.07347, simple_loss=0.09505, pruned_loss=0.01754, audio_tagging_loss=0.008402, over 15189.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09424, pruned_loss=0.0148, audio_tagging_loss=0.009149, over 3041535.47 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:55:43,876 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 16:56:11,745 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2550, 4.2167, 4.4707, 4.4414], device='cuda:1') 2023-11-22 16:56:24,238 INFO [train_asr.py:1253] (1/4) Epoch 26, validation: loss=0.05863, simple_loss=0.05148, pruned_loss=0.005087, audio_tagging_loss=0.0278, over 4681554.00 frames. 2023-11-22 16:56:24,239 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 16:56:25,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-22 16:56:29,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303600 2023-11-22 16:56:36,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2024026.6666666667, ans=0.125 2023-11-22 16:56:41,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2024026.6666666667, ans=0.0 2023-11-22 16:56:55,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.309e+01 8.958e+01 9.598e+01 2.915e+02, threshold=1.792e+02, percent-clipped=1.0 2023-11-22 16:57:24,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2024226.6666666667, ans=0.05 2023-11-22 16:57:27,850 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3050, loss[loss=0.06438, simple_loss=0.08379, pruned_loss=0.01286, audio_tagging_loss=0.009619, over 14601.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09418, pruned_loss=0.01498, audio_tagging_loss=0.009307, over 3041317.98 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:57:32,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303650 2023-11-22 16:57:48,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2024360.0, ans=0.125 2023-11-22 16:57:52,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2024360.0, ans=0.125 2023-11-22 16:58:06,787 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 16:58:18,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2024560.0, ans=0.0 2023-11-22 16:58:20,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2024560.0, ans=0.1 2023-11-22 16:58:33,082 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3100, loss[loss=0.08159, simple_loss=0.1064, pruned_loss=0.01795, audio_tagging_loss=0.01044, over 14482.00 frames. ], tot_loss[loss=0.07188, simple_loss=0.09471, pruned_loss=0.01512, audio_tagging_loss=0.00941, over 3040899.01 frames. ], batch size: 52, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:58:38,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.82 vs. limit=10.0 2023-11-22 16:58:38,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303700 2023-11-22 16:58:52,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2024693.3333333333, ans=0.2 2023-11-22 16:58:56,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2024693.3333333333, ans=0.0 2023-11-22 16:59:03,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.373e+01 8.955e+01 9.319e+01 1.385e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 16:59:32,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2024893.3333333333, ans=0.125 2023-11-22 16:59:35,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2024960.0, ans=0.125 2023-11-22 16:59:36,707 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3150, loss[loss=0.05577, simple_loss=0.06737, pruned_loss=0.0115, audio_tagging_loss=0.01059, over 15089.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09403, pruned_loss=0.01485, audio_tagging_loss=0.009468, over 3036399.93 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 16:59:41,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303750 2023-11-22 16:59:54,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-22 17:00:08,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2025093.3333333333, ans=0.1 2023-11-22 17:00:18,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2025160.0, ans=0.125 2023-11-22 17:00:34,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2025226.6666666667, ans=0.0 2023-11-22 17:00:39,346 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3200, loss[loss=0.07405, simple_loss=0.1024, pruned_loss=0.01455, audio_tagging_loss=0.008282, over 15525.00 frames. ], tot_loss[loss=0.07152, simple_loss=0.09402, pruned_loss=0.01494, audio_tagging_loss=0.009569, over 3040822.18 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:00:44,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303800 2023-11-22 17:00:58,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2025360.0, ans=0.0 2023-11-22 17:01:00,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2025360.0, ans=0.125 2023-11-22 17:01:10,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.209e+01 8.808e+01 9.580e+01 1.231e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 17:01:36,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2025560.0, ans=0.125 2023-11-22 17:01:38,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2025560.0, ans=0.125 2023-11-22 17:01:42,692 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3250, loss[loss=0.07741, simple_loss=0.104, pruned_loss=0.01705, audio_tagging_loss=0.008363, over 15289.00 frames. ], tot_loss[loss=0.07196, simple_loss=0.09485, pruned_loss=0.01501, audio_tagging_loss=0.009517, over 3047922.68 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:01:48,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303850 2023-11-22 17:02:03,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-11-22 17:02:03,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2023-11-22 17:02:04,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.43 vs. limit=22.5 2023-11-22 17:02:16,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2025760.0, ans=0.2 2023-11-22 17:02:45,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2025960.0, ans=0.2 2023-11-22 17:02:46,300 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3300, loss[loss=0.06463, simple_loss=0.08293, pruned_loss=0.009689, audio_tagging_loss=0.01348, over 14757.00 frames. ], tot_loss[loss=0.0721, simple_loss=0.095, pruned_loss=0.01497, audio_tagging_loss=0.009634, over 3045150.04 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:02:48,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-22 17:02:51,232 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303900 2023-11-22 17:02:51,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2025960.0, ans=0.125 2023-11-22 17:02:52,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2025960.0, ans=0.125 2023-11-22 17:03:01,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2026026.6666666667, ans=0.125 2023-11-22 17:03:13,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2026093.3333333333, ans=0.125 2023-11-22 17:03:16,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.259e+01 8.793e+01 9.669e+01 1.578e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 17:03:28,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2026160.0, ans=0.125 2023-11-22 17:03:34,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2026160.0, ans=0.125 2023-11-22 17:03:34,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2026160.0, ans=0.07 2023-11-22 17:03:37,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2026226.6666666667, ans=0.0 2023-11-22 17:03:49,233 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3350, loss[loss=0.06446, simple_loss=0.08273, pruned_loss=0.01452, audio_tagging_loss=0.008574, over 15198.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.0944, pruned_loss=0.01487, audio_tagging_loss=0.009466, over 3056659.39 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:03:54,277 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 303950 2023-11-22 17:04:30,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2023-11-22 17:04:32,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2026493.3333333333, ans=0.0 2023-11-22 17:04:37,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2026493.3333333333, ans=0.125 2023-11-22 17:04:41,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2026560.0, ans=0.035 2023-11-22 17:04:45,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2026560.0, ans=0.0 2023-11-22 17:04:51,549 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3400, loss[loss=0.07974, simple_loss=0.09782, pruned_loss=0.01991, audio_tagging_loss=0.01092, over 14712.00 frames. ], tot_loss[loss=0.07112, simple_loss=0.09387, pruned_loss=0.01483, audio_tagging_loss=0.009352, over 3048271.58 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:04:56,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304000 2023-11-22 17:05:08,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2026693.3333333333, ans=0.0 2023-11-22 17:05:10,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2026693.3333333333, ans=10.0 2023-11-22 17:05:17,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-22 17:05:20,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2026760.0, ans=0.0 2023-11-22 17:05:26,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.152e+01 8.823e+01 9.410e+01 1.182e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-22 17:05:41,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2026826.6666666667, ans=0.125 2023-11-22 17:05:51,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2026893.3333333333, ans=0.035 2023-11-22 17:05:52,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2023-11-22 17:05:58,782 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3450, loss[loss=0.06434, simple_loss=0.08718, pruned_loss=0.01105, audio_tagging_loss=0.009692, over 15986.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09435, pruned_loss=0.01503, audio_tagging_loss=0.009252, over 3043816.01 frames. ], batch size: 63, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:06:03,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304050 2023-11-22 17:06:19,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2027026.6666666667, ans=0.0 2023-11-22 17:06:34,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2027160.0, ans=0.125 2023-11-22 17:06:45,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2027160.0, ans=0.125 2023-11-22 17:06:53,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2027226.6666666667, ans=0.125 2023-11-22 17:06:55,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2027226.6666666667, ans=0.1 2023-11-22 17:07:01,436 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3500, loss[loss=0.06315, simple_loss=0.08631, pruned_loss=0.0113, audio_tagging_loss=0.0087, over 14258.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09347, pruned_loss=0.01493, audio_tagging_loss=0.009244, over 3039664.36 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:07:06,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304100 2023-11-22 17:07:06,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2027293.3333333333, ans=0.035 2023-11-22 17:07:33,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.704e+01 8.467e+01 8.969e+01 9.686e+01 1.239e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 17:07:36,578 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:07:43,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-22 17:08:01,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2027560.0, ans=0.05 2023-11-22 17:08:04,510 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3550, loss[loss=0.07133, simple_loss=0.08996, pruned_loss=0.01709, audio_tagging_loss=0.00926, over 15408.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09412, pruned_loss=0.01493, audio_tagging_loss=0.009188, over 3037405.35 frames. ], batch size: 59, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:08:07,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2027626.6666666667, ans=0.125 2023-11-22 17:08:10,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304150 2023-11-22 17:08:10,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2027626.6666666667, ans=0.2 2023-11-22 17:08:15,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2027626.6666666667, ans=0.125 2023-11-22 17:08:27,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2027693.3333333333, ans=0.0 2023-11-22 17:08:41,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2027760.0, ans=0.04949747468305833 2023-11-22 17:08:59,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2027893.3333333333, ans=0.1 2023-11-22 17:09:08,683 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3600, loss[loss=0.08094, simple_loss=0.1179, pruned_loss=0.0153, audio_tagging_loss=0.006686, over 17479.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09217, pruned_loss=0.01454, audio_tagging_loss=0.009249, over 3042649.41 frames. ], batch size: 62, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:09:10,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=12.0 2023-11-22 17:09:14,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304200 2023-11-22 17:09:18,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2027960.0, ans=0.0 2023-11-22 17:09:32,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2028093.3333333333, ans=0.125 2023-11-22 17:09:41,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.177e+01 8.783e+01 9.582e+01 1.117e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 17:09:45,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2028160.0, ans=0.1 2023-11-22 17:09:49,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2028160.0, ans=0.125 2023-11-22 17:09:59,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2028226.6666666667, ans=0.1 2023-11-22 17:10:06,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2028226.6666666667, ans=0.125 2023-11-22 17:10:13,403 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3650, loss[loss=0.06009, simple_loss=0.0813, pruned_loss=0.01125, audio_tagging_loss=0.008188, over 14888.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09241, pruned_loss=0.01465, audio_tagging_loss=0.009277, over 3041037.92 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:10:18,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304250 2023-11-22 17:10:29,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2028360.0, ans=0.1 2023-11-22 17:10:45,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2028426.6666666667, ans=0.2 2023-11-22 17:10:45,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2028426.6666666667, ans=0.2 2023-11-22 17:10:47,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2028426.6666666667, ans=0.125 2023-11-22 17:10:50,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-11-22 17:10:51,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2028493.3333333333, ans=0.0 2023-11-22 17:11:16,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2028626.6666666667, ans=0.125 2023-11-22 17:11:16,841 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3700, loss[loss=0.07472, simple_loss=0.08472, pruned_loss=0.02066, audio_tagging_loss=0.0117, over 14472.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.0929, pruned_loss=0.01477, audio_tagging_loss=0.009205, over 3040408.78 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:11:21,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304300 2023-11-22 17:11:26,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2028626.6666666667, ans=0.125 2023-11-22 17:11:29,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2028693.3333333333, ans=0.2 2023-11-22 17:11:50,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.264e+01 8.888e+01 9.594e+01 1.594e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 17:11:53,373 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 17:12:21,537 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3750, loss[loss=0.0949, simple_loss=0.1296, pruned_loss=0.0208, audio_tagging_loss=0.009302, over 16270.00 frames. ], tot_loss[loss=0.07096, simple_loss=0.09347, pruned_loss=0.01503, audio_tagging_loss=0.009198, over 3045838.90 frames. ], batch size: 59, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:12:26,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304350 2023-11-22 17:13:00,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2029160.0, ans=0.1 2023-11-22 17:13:05,481 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:13:14,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-22 17:13:25,467 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3800, loss[loss=0.06226, simple_loss=0.07673, pruned_loss=0.01106, audio_tagging_loss=0.01283, over 14694.00 frames. ], tot_loss[loss=0.07105, simple_loss=0.09367, pruned_loss=0.01497, audio_tagging_loss=0.009239, over 3044779.35 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:13:29,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2029293.3333333333, ans=0.07 2023-11-22 17:13:30,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304400 2023-11-22 17:13:55,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2029426.6666666667, ans=0.125 2023-11-22 17:14:00,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.257e+01 8.903e+01 9.664e+01 1.355e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-22 17:14:30,800 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3850, loss[loss=0.07025, simple_loss=0.08873, pruned_loss=0.01419, audio_tagging_loss=0.01169, over 14792.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09424, pruned_loss=0.01497, audio_tagging_loss=0.009248, over 3047036.15 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:14:35,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304450 2023-11-22 17:14:36,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2029626.6666666667, ans=0.0 2023-11-22 17:15:01,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2029760.0, ans=0.0 2023-11-22 17:15:03,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2029760.0, ans=0.125 2023-11-22 17:15:19,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2029826.6666666667, ans=0.125 2023-11-22 17:15:35,605 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3900, loss[loss=0.06014, simple_loss=0.0829, pruned_loss=0.008506, audio_tagging_loss=0.01019, over 15701.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09333, pruned_loss=0.01473, audio_tagging_loss=0.009318, over 3041488.58 frames. ], batch size: 59, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:15:38,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2029960.0, ans=0.125 2023-11-22 17:15:41,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304500 2023-11-22 17:16:01,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2030093.3333333333, ans=0.125 2023-11-22 17:16:08,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.197e+01 8.799e+01 9.518e+01 1.700e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 17:16:11,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2030093.3333333333, ans=0.0 2023-11-22 17:16:19,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2030160.0, ans=0.0 2023-11-22 17:16:38,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2030293.3333333333, ans=0.2 2023-11-22 17:16:39,726 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 3950, loss[loss=0.06731, simple_loss=0.08398, pruned_loss=0.01354, audio_tagging_loss=0.01178, over 14528.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09307, pruned_loss=0.01473, audio_tagging_loss=0.009468, over 3047351.89 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:16:42,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2030293.3333333333, ans=0.035 2023-11-22 17:16:44,837 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304550 2023-11-22 17:16:52,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2030360.0, ans=0.02 2023-11-22 17:17:10,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-11-22 17:17:23,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2030493.3333333333, ans=0.125 2023-11-22 17:17:43,613 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4000, loss[loss=0.08557, simple_loss=0.0963, pruned_loss=0.02703, audio_tagging_loss=0.01039, over 16437.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.09307, pruned_loss=0.01484, audio_tagging_loss=0.009616, over 3048025.32 frames. ], batch size: 62, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:17:47,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2030626.6666666667, ans=0.125 2023-11-22 17:17:48,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304600 2023-11-22 17:18:17,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.393e+01 9.096e+01 9.757e+01 1.242e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-22 17:18:48,590 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4050, loss[loss=0.07617, simple_loss=0.09786, pruned_loss=0.01439, audio_tagging_loss=0.01285, over 15928.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.09369, pruned_loss=0.01492, audio_tagging_loss=0.009635, over 3060231.20 frames. ], batch size: 61, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:18:53,058 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:18:54,250 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304650 2023-11-22 17:19:08,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2031026.6666666667, ans=0.1 2023-11-22 17:19:08,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2031026.6666666667, ans=0.0 2023-11-22 17:19:13,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2023-11-22 17:19:16,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2031093.3333333333, ans=0.2 2023-11-22 17:19:33,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2031160.0, ans=0.07 2023-11-22 17:19:38,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=22.5 2023-11-22 17:19:52,430 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4100, loss[loss=0.05381, simple_loss=0.06648, pruned_loss=0.009625, audio_tagging_loss=0.01095, over 15872.00 frames. ], tot_loss[loss=0.07161, simple_loss=0.09379, pruned_loss=0.01503, audio_tagging_loss=0.009685, over 3052396.45 frames. ], batch size: 60, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:19:57,311 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304700 2023-11-22 17:20:11,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2031360.0, ans=0.0 2023-11-22 17:20:22,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2031426.6666666667, ans=0.1 2023-11-22 17:20:25,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.289e+01 8.885e+01 9.485e+01 1.195e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 17:20:27,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-22 17:20:39,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2031493.3333333333, ans=0.0 2023-11-22 17:20:56,261 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4150, loss[loss=0.06931, simple_loss=0.09327, pruned_loss=0.01218, audio_tagging_loss=0.0105, over 15019.00 frames. ], tot_loss[loss=0.07191, simple_loss=0.09453, pruned_loss=0.01522, audio_tagging_loss=0.009424, over 3046026.31 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:20:59,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2031626.6666666667, ans=0.125 2023-11-22 17:21:01,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304750 2023-11-22 17:21:12,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-22 17:21:26,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2031760.0, ans=0.1 2023-11-22 17:21:42,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2031826.6666666667, ans=0.1 2023-11-22 17:21:43,420 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:21:51,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-22 17:22:00,325 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4200, loss[loss=0.05511, simple_loss=0.07488, pruned_loss=0.008961, audio_tagging_loss=0.008705, over 14726.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.0943, pruned_loss=0.01522, audio_tagging_loss=0.009341, over 3045577.95 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:22:05,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304800 2023-11-22 17:22:08,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2023-11-22 17:22:10,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2031960.0, ans=0.1 2023-11-22 17:22:11,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2031960.0, ans=0.125 2023-11-22 17:22:14,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-22 17:22:21,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2032026.6666666667, ans=0.125 2023-11-22 17:22:34,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.472e+01 8.973e+01 9.606e+01 1.148e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-22 17:22:34,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2032093.3333333333, ans=0.95 2023-11-22 17:23:04,821 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4250, loss[loss=0.0879, simple_loss=0.1143, pruned_loss=0.02175, audio_tagging_loss=0.008994, over 15749.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09505, pruned_loss=0.0152, audio_tagging_loss=0.009255, over 3048689.12 frames. ], batch size: 60, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:23:09,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304850 2023-11-22 17:23:19,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2032360.0, ans=0.95 2023-11-22 17:23:28,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2032426.6666666667, ans=0.1 2023-11-22 17:24:08,356 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4300, loss[loss=0.07981, simple_loss=0.1111, pruned_loss=0.01542, audio_tagging_loss=0.008851, over 14978.00 frames. ], tot_loss[loss=0.07162, simple_loss=0.09468, pruned_loss=0.01509, audio_tagging_loss=0.009186, over 3046975.24 frames. ], batch size: 54, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:24:09,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2032626.6666666667, ans=0.1 2023-11-22 17:24:09,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2032626.6666666667, ans=0.0 2023-11-22 17:24:12,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2032626.6666666667, ans=0.0 2023-11-22 17:24:13,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304900 2023-11-22 17:24:17,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2032626.6666666667, ans=0.0 2023-11-22 17:24:24,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2032693.3333333333, ans=0.125 2023-11-22 17:24:44,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.402e+01 8.884e+01 9.738e+01 1.155e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 17:24:45,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2032760.0, ans=0.125 2023-11-22 17:25:07,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-22 17:25:13,318 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4350, loss[loss=0.0585, simple_loss=0.07595, pruned_loss=0.01171, audio_tagging_loss=0.008814, over 15944.00 frames. ], tot_loss[loss=0.07188, simple_loss=0.09522, pruned_loss=0.01515, audio_tagging_loss=0.009125, over 3053562.91 frames. ], batch size: 62, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:25:13,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2032960.0, ans=0.125 2023-11-22 17:25:19,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 304950 2023-11-22 17:25:19,781 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 17:25:24,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2032960.0, ans=0.2 2023-11-22 17:25:52,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2033160.0, ans=0.125 2023-11-22 17:25:54,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.20 vs. limit=22.5 2023-11-22 17:26:09,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2033226.6666666667, ans=0.0 2023-11-22 17:26:18,663 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4400, loss[loss=0.0601, simple_loss=0.08006, pruned_loss=0.009351, audio_tagging_loss=0.01072, over 15121.00 frames. ], tot_loss[loss=0.07165, simple_loss=0.09472, pruned_loss=0.01513, audio_tagging_loss=0.009163, over 3050229.94 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:26:23,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305000 2023-11-22 17:26:44,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2033426.6666666667, ans=0.0 2023-11-22 17:26:52,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.342e+01 8.115e+01 8.926e+01 9.630e+01 1.276e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-22 17:27:22,543 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4450, loss[loss=0.06375, simple_loss=0.08247, pruned_loss=0.01293, audio_tagging_loss=0.009583, over 14535.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09456, pruned_loss=0.01501, audio_tagging_loss=0.009089, over 3050784.81 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:27:27,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305050 2023-11-22 17:27:36,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2033693.3333333333, ans=0.125 2023-11-22 17:28:25,533 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4500, loss[loss=0.06968, simple_loss=0.09372, pruned_loss=0.01488, audio_tagging_loss=0.00793, over 15175.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09457, pruned_loss=0.01506, audio_tagging_loss=0.009102, over 3049792.21 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:28:31,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305100 2023-11-22 17:28:37,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=2033960.0, ans=0.2 2023-11-22 17:29:01,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.181e+01 8.834e+01 9.479e+01 1.227e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 17:29:05,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2034160.0, ans=0.2 2023-11-22 17:29:09,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-22 17:29:16,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2034226.6666666667, ans=0.0 2023-11-22 17:29:21,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2034226.6666666667, ans=0.125 2023-11-22 17:29:24,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2034226.6666666667, ans=0.125 2023-11-22 17:29:31,643 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4550, loss[loss=0.07093, simple_loss=0.09467, pruned_loss=0.01245, audio_tagging_loss=0.01114, over 15784.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09443, pruned_loss=0.01485, audio_tagging_loss=0.009007, over 3051507.83 frames. ], batch size: 59, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:29:36,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305150 2023-11-22 17:29:39,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2034293.3333333333, ans=0.0 2023-11-22 17:29:41,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2034293.3333333333, ans=0.2 2023-11-22 17:29:49,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2034360.0, ans=0.0 2023-11-22 17:29:50,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-22 17:29:53,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2023-11-22 17:30:12,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2034493.3333333333, ans=0.125 2023-11-22 17:30:21,204 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:30:23,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2034560.0, ans=0.0 2023-11-22 17:30:34,624 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4600, loss[loss=0.07045, simple_loss=0.09523, pruned_loss=0.01199, audio_tagging_loss=0.01084, over 15191.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09344, pruned_loss=0.01485, audio_tagging_loss=0.00915, over 3047284.19 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:30:36,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-22 17:30:39,755 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305200 2023-11-22 17:30:52,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2034693.3333333333, ans=0.0 2023-11-22 17:31:10,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2034760.0, ans=0.125 2023-11-22 17:31:11,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.212e+01 8.747e+01 9.317e+01 1.226e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 17:31:31,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2034893.3333333333, ans=0.125 2023-11-22 17:31:38,364 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4650, loss[loss=0.06534, simple_loss=0.08622, pruned_loss=0.01032, audio_tagging_loss=0.01191, over 15594.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09297, pruned_loss=0.01484, audio_tagging_loss=0.009267, over 3047130.18 frames. ], batch size: 62, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:31:43,286 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305250 2023-11-22 17:31:49,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2034960.0, ans=0.125 2023-11-22 17:32:01,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2035026.6666666667, ans=0.125 2023-11-22 17:32:13,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2035093.3333333333, ans=0.0 2023-11-22 17:32:38,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-22 17:32:43,680 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4700, loss[loss=0.06918, simple_loss=0.09369, pruned_loss=0.01289, audio_tagging_loss=0.009443, over 15748.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.0921, pruned_loss=0.01458, audio_tagging_loss=0.009401, over 3047359.58 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:32:49,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305300 2023-11-22 17:33:05,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2035360.0, ans=0.125 2023-11-22 17:33:08,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2035426.6666666667, ans=0.0 2023-11-22 17:33:15,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2035426.6666666667, ans=0.5 2023-11-22 17:33:16,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2035426.6666666667, ans=0.0 2023-11-22 17:33:18,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.537e+01 8.112e+01 8.766e+01 9.442e+01 1.103e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-22 17:33:32,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-22 17:33:41,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2035560.0, ans=0.0 2023-11-22 17:33:47,865 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4750, loss[loss=0.07547, simple_loss=0.1081, pruned_loss=0.01466, audio_tagging_loss=0.006752, over 14818.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09259, pruned_loss=0.01447, audio_tagging_loss=0.009454, over 3039776.34 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:33:52,952 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305350 2023-11-22 17:34:00,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-22 17:34:04,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2035693.3333333333, ans=0.0 2023-11-22 17:34:09,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2035693.3333333333, ans=0.2 2023-11-22 17:34:11,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-22 17:34:23,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2035760.0, ans=0.125 2023-11-22 17:34:25,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2035826.6666666667, ans=0.1 2023-11-22 17:34:42,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.08 vs. limit=10.0 2023-11-22 17:34:51,749 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4800, loss[loss=0.08154, simple_loss=0.1049, pruned_loss=0.02124, audio_tagging_loss=0.007872, over 15095.00 frames. ], tot_loss[loss=0.07052, simple_loss=0.09311, pruned_loss=0.01449, audio_tagging_loss=0.009478, over 3044955.59 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 32.0 2023-11-22 17:34:56,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305400 2023-11-22 17:35:28,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.281e+01 8.887e+01 9.672e+01 1.150e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 17:35:31,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2036160.0, ans=0.0 2023-11-22 17:35:44,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2036226.6666666667, ans=0.0 2023-11-22 17:35:57,043 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4850, loss[loss=0.06642, simple_loss=0.07985, pruned_loss=0.01726, audio_tagging_loss=0.009244, over 15332.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.09302, pruned_loss=0.01447, audio_tagging_loss=0.009532, over 3053320.28 frames. ], batch size: 60, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:36:02,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305450 2023-11-22 17:36:18,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-22 17:36:48,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2036560.0, ans=0.1 2023-11-22 17:36:52,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2036560.0, ans=0.2 2023-11-22 17:36:52,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2036560.0, ans=0.125 2023-11-22 17:36:57,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2036560.0, ans=0.0 2023-11-22 17:36:57,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2036560.0, ans=0.125 2023-11-22 17:37:01,309 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4900, loss[loss=0.08549, simple_loss=0.1169, pruned_loss=0.02006, audio_tagging_loss=0.006984, over 16264.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09328, pruned_loss=0.01458, audio_tagging_loss=0.009515, over 3058441.15 frames. ], batch size: 57, lr: 2.63e-03, grad_scale: 16.0 2023-11-22 17:37:02,728 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 17:37:02,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2036626.6666666667, ans=0.125 2023-11-22 17:37:06,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305500 2023-11-22 17:37:27,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2036760.0, ans=0.05 2023-11-22 17:37:30,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2036760.0, ans=0.0 2023-11-22 17:37:38,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.464e+01 8.092e+01 8.921e+01 9.687e+01 1.263e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 17:37:55,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2036893.3333333333, ans=0.125 2023-11-22 17:38:00,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2023-11-22 17:38:04,866 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 4950, loss[loss=0.06189, simple_loss=0.07565, pruned_loss=0.0144, audio_tagging_loss=0.009669, over 15953.00 frames. ], tot_loss[loss=0.0708, simple_loss=0.09383, pruned_loss=0.01459, audio_tagging_loss=0.009292, over 3057985.10 frames. ], batch size: 60, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:38:07,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-11-22 17:38:09,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305550 2023-11-22 17:38:15,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2036960.0, ans=0.125 2023-11-22 17:38:18,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2037026.6666666667, ans=0.2 2023-11-22 17:38:47,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2037160.0, ans=10.0 2023-11-22 17:38:48,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2037160.0, ans=0.125 2023-11-22 17:39:00,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2037226.6666666667, ans=0.0 2023-11-22 17:39:01,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2037226.6666666667, ans=0.0 2023-11-22 17:39:10,211 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5000, loss[loss=0.08091, simple_loss=0.1032, pruned_loss=0.01822, audio_tagging_loss=0.01108, over 15465.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09318, pruned_loss=0.01448, audio_tagging_loss=0.009174, over 3063387.19 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:39:15,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305600 2023-11-22 17:39:28,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2037360.0, ans=0.0 2023-11-22 17:39:37,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2037426.6666666667, ans=0.125 2023-11-22 17:39:41,695 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 17:39:47,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.198e+01 8.797e+01 9.538e+01 1.250e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 17:39:47,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2037493.3333333333, ans=0.2 2023-11-22 17:39:57,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2037493.3333333333, ans=0.125 2023-11-22 17:39:58,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2037493.3333333333, ans=0.0 2023-11-22 17:40:01,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2037560.0, ans=0.125 2023-11-22 17:40:05,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2037560.0, ans=0.5 2023-11-22 17:40:06,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2037560.0, ans=0.0 2023-11-22 17:40:15,622 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5050, loss[loss=0.05839, simple_loss=0.08189, pruned_loss=0.007978, audio_tagging_loss=0.009467, over 15287.00 frames. ], tot_loss[loss=0.07052, simple_loss=0.09384, pruned_loss=0.01453, audio_tagging_loss=0.009067, over 3057650.36 frames. ], batch size: 57, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:40:21,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305650 2023-11-22 17:40:27,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2023-11-22 17:41:46,524 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5100, loss[loss=0.06631, simple_loss=0.08087, pruned_loss=0.01378, audio_tagging_loss=0.0121, over 14948.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09281, pruned_loss=0.01447, audio_tagging_loss=0.009178, over 3047878.63 frames. ], batch size: 57, lr: 2.62e-03, grad_scale: 8.0 2023-11-22 17:41:54,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305700 2023-11-22 17:42:05,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2038026.6666666667, ans=0.125 2023-11-22 17:42:27,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2038093.3333333333, ans=0.0 2023-11-22 17:42:41,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.235e+01 8.784e+01 9.418e+01 1.512e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 17:42:51,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-22 17:43:03,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-22 17:43:12,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2038226.6666666667, ans=0.125 2023-11-22 17:43:19,596 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5150, loss[loss=0.07446, simple_loss=0.09507, pruned_loss=0.0171, audio_tagging_loss=0.009824, over 14655.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09322, pruned_loss=0.01468, audio_tagging_loss=0.00915, over 3052051.29 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 8.0 2023-11-22 17:43:21,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2038293.3333333333, ans=0.0 2023-11-22 17:43:27,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305750 2023-11-22 17:43:53,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2023-11-22 17:43:56,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2038426.6666666667, ans=0.0 2023-11-22 17:43:56,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2038426.6666666667, ans=0.05 2023-11-22 17:44:02,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-22 17:44:25,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-11-22 17:44:28,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2038493.3333333333, ans=0.125 2023-11-22 17:44:35,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2038560.0, ans=0.125 2023-11-22 17:44:39,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2038560.0, ans=0.0 2023-11-22 17:44:52,030 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5200, loss[loss=0.06428, simple_loss=0.09007, pruned_loss=0.01145, audio_tagging_loss=0.007792, over 15260.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.09478, pruned_loss=0.01496, audio_tagging_loss=0.009123, over 3048580.94 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:44:59,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305800 2023-11-22 17:45:20,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2038693.3333333333, ans=0.125 2023-11-22 17:45:22,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2038693.3333333333, ans=0.125 2023-11-22 17:45:37,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2038760.0, ans=0.125 2023-11-22 17:45:48,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.193e+01 8.836e+01 9.389e+01 1.241e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 17:46:01,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-22 17:46:08,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2038893.3333333333, ans=0.125 2023-11-22 17:46:14,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2038893.3333333333, ans=0.125 2023-11-22 17:46:24,763 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5250, loss[loss=0.08952, simple_loss=0.1131, pruned_loss=0.02048, audio_tagging_loss=0.01249, over 16027.00 frames. ], tot_loss[loss=0.07183, simple_loss=0.09528, pruned_loss=0.01513, audio_tagging_loss=0.009065, over 3042119.96 frames. ], batch size: 61, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:46:32,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305850 2023-11-22 17:46:56,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2039026.6666666667, ans=0.0 2023-11-22 17:47:57,858 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5300, loss[loss=0.1028, simple_loss=0.1382, pruned_loss=0.02713, audio_tagging_loss=0.006574, over 15442.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.09499, pruned_loss=0.01515, audio_tagging_loss=0.009084, over 3042355.80 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:47:58,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2039293.3333333333, ans=0.1 2023-11-22 17:48:02,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=22.5 2023-11-22 17:48:05,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305900 2023-11-22 17:48:53,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.483e+01 8.855e+01 9.483e+01 1.164e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 17:48:55,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-22 17:49:03,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2039493.3333333333, ans=0.125 2023-11-22 17:49:24,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2039560.0, ans=0.2 2023-11-22 17:49:30,727 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5350, loss[loss=0.07158, simple_loss=0.09999, pruned_loss=0.01403, audio_tagging_loss=0.007553, over 15687.00 frames. ], tot_loss[loss=0.07161, simple_loss=0.09492, pruned_loss=0.01504, audio_tagging_loss=0.009102, over 3044847.04 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:49:38,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 305950 2023-11-22 17:49:51,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=2039693.3333333333, ans=15.0 2023-11-22 17:50:35,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=12.0 2023-11-22 17:50:38,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-22 17:50:45,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2039893.3333333333, ans=0.0 2023-11-22 17:50:57,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-22 17:51:03,653 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5400, loss[loss=0.07775, simple_loss=0.1054, pruned_loss=0.01222, audio_tagging_loss=0.01284, over 15195.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09628, pruned_loss=0.01532, audio_tagging_loss=0.009135, over 3052936.50 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:51:11,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306000 2023-11-22 17:51:51,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-11-22 17:51:59,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.288e+01 8.843e+01 9.633e+01 1.187e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 17:52:36,561 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5450, loss[loss=0.03886, simple_loss=0.03996, pruned_loss=0.004438, audio_tagging_loss=0.01444, over 12900.00 frames. ], tot_loss[loss=0.07161, simple_loss=0.09464, pruned_loss=0.01512, audio_tagging_loss=0.009165, over 3039064.51 frames. ], batch size: 50, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:52:44,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306050 2023-11-22 17:53:08,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2040360.0, ans=0.1 2023-11-22 17:53:10,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-22 17:53:14,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2040426.6666666667, ans=0.1 2023-11-22 17:53:25,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2040426.6666666667, ans=0.125 2023-11-22 17:54:03,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2040560.0, ans=0.0 2023-11-22 17:54:10,064 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5500, loss[loss=0.07259, simple_loss=0.1009, pruned_loss=0.01383, audio_tagging_loss=0.008314, over 14253.00 frames. ], tot_loss[loss=0.07175, simple_loss=0.09507, pruned_loss=0.01504, audio_tagging_loss=0.009175, over 3036335.31 frames. ], batch size: 52, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:54:14,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-22 17:54:17,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306100 2023-11-22 17:54:17,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2040626.6666666667, ans=0.125 2023-11-22 17:54:43,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2040693.3333333333, ans=0.125 2023-11-22 17:54:45,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-22 17:55:05,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.451e+01 8.930e+01 9.689e+01 1.282e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 17:55:28,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2040893.3333333333, ans=0.125 2023-11-22 17:55:32,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2040893.3333333333, ans=0.0 2023-11-22 17:55:42,884 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5550, loss[loss=0.05803, simple_loss=0.07279, pruned_loss=0.01058, audio_tagging_loss=0.01105, over 14018.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09594, pruned_loss=0.01533, audio_tagging_loss=0.009177, over 3043952.80 frames. ], batch size: 53, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:55:48,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2040960.0, ans=0.125 2023-11-22 17:55:50,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306150 2023-11-22 17:57:07,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2041226.6666666667, ans=0.95 2023-11-22 17:57:14,872 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5600, loss[loss=0.07472, simple_loss=0.107, pruned_loss=0.01266, audio_tagging_loss=0.008569, over 15337.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.09546, pruned_loss=0.01515, audio_tagging_loss=0.009312, over 3043204.87 frames. ], batch size: 56, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 17:57:22,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306200 2023-11-22 17:57:24,311 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 17:57:45,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-22 17:58:02,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2023-11-22 17:58:03,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2041426.6666666667, ans=0.0 2023-11-22 17:58:11,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.138e+01 8.696e+01 9.555e+01 1.151e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 17:58:20,039 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 17:58:26,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-22 17:58:38,260 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5650, loss[loss=0.07612, simple_loss=0.09438, pruned_loss=0.01593, audio_tagging_loss=0.013, over 18038.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.09531, pruned_loss=0.01506, audio_tagging_loss=0.009328, over 3051374.83 frames. ], batch size: 68, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 17:58:43,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306250 2023-11-22 17:58:52,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-22 17:58:58,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-22 17:59:07,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2041760.0, ans=0.125 2023-11-22 17:59:07,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2041760.0, ans=0.125 2023-11-22 17:59:12,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2041760.0, ans=0.0 2023-11-22 17:59:15,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2041826.6666666667, ans=0.0 2023-11-22 17:59:21,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2041826.6666666667, ans=0.0 2023-11-22 17:59:23,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2041826.6666666667, ans=0.125 2023-11-22 17:59:23,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2041826.6666666667, ans=0.1 2023-11-22 17:59:30,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2041893.3333333333, ans=0.125 2023-11-22 17:59:42,492 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5700, loss[loss=0.07563, simple_loss=0.1029, pruned_loss=0.01466, audio_tagging_loss=0.009499, over 14315.00 frames. ], tot_loss[loss=0.0722, simple_loss=0.0956, pruned_loss=0.01507, audio_tagging_loss=0.009334, over 3053778.77 frames. ], batch size: 53, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 17:59:45,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-22 17:59:46,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2041960.0, ans=0.0 2023-11-22 17:59:47,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306300 2023-11-22 18:00:10,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-22 18:00:22,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.203e+01 8.769e+01 9.418e+01 1.173e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 18:00:39,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-22 18:00:46,245 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5750, loss[loss=0.08147, simple_loss=0.1014, pruned_loss=0.01989, audio_tagging_loss=0.01089, over 14720.00 frames. ], tot_loss[loss=0.07183, simple_loss=0.0947, pruned_loss=0.01515, audio_tagging_loss=0.009332, over 3054088.93 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:00:51,437 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306350 2023-11-22 18:01:02,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2042360.0, ans=0.125 2023-11-22 18:01:07,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2042360.0, ans=0.0 2023-11-22 18:01:08,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2023-11-22 18:01:14,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2042426.6666666667, ans=0.125 2023-11-22 18:01:32,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2042493.3333333333, ans=0.125 2023-11-22 18:01:36,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-22 18:01:50,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2042626.6666666667, ans=0.2 2023-11-22 18:01:51,689 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5800, loss[loss=0.06087, simple_loss=0.0824, pruned_loss=0.0118, audio_tagging_loss=0.007871, over 15033.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.09471, pruned_loss=0.01529, audio_tagging_loss=0.009198, over 3050558.13 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:01:54,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2042626.6666666667, ans=0.0 2023-11-22 18:01:57,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306400 2023-11-22 18:01:58,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-22 18:01:59,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2042626.6666666667, ans=0.1 2023-11-22 18:02:15,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2042693.3333333333, ans=0.1 2023-11-22 18:02:31,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.101e+01 8.686e+01 9.623e+01 1.169e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-22 18:02:56,980 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5850, loss[loss=0.0446, simple_loss=0.05774, pruned_loss=0.00695, audio_tagging_loss=0.008783, over 13880.00 frames. ], tot_loss[loss=0.07151, simple_loss=0.09463, pruned_loss=0.01504, audio_tagging_loss=0.009146, over 3049038.89 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:03:02,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306450 2023-11-22 18:03:04,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-22 18:03:17,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2043026.6666666667, ans=0.1 2023-11-22 18:03:23,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2043093.3333333333, ans=0.0 2023-11-22 18:03:24,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2043093.3333333333, ans=0.2 2023-11-22 18:03:33,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2043093.3333333333, ans=0.035 2023-11-22 18:03:54,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2043226.6666666667, ans=0.125 2023-11-22 18:04:06,897 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5900, loss[loss=0.07211, simple_loss=0.1016, pruned_loss=0.015, audio_tagging_loss=0.006314, over 15720.00 frames. ], tot_loss[loss=0.07186, simple_loss=0.09534, pruned_loss=0.01512, audio_tagging_loss=0.009066, over 3050493.30 frames. ], batch size: 57, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:04:13,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306500 2023-11-22 18:04:14,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-22 18:04:28,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2043360.0, ans=0.0 2023-11-22 18:05:02,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.183e+01 8.699e+01 9.524e+01 1.128e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 18:05:02,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2043493.3333333333, ans=0.125 2023-11-22 18:05:13,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2043493.3333333333, ans=0.125 2023-11-22 18:05:29,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2023-11-22 18:05:35,774 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 5950, loss[loss=0.06214, simple_loss=0.08896, pruned_loss=0.009307, audio_tagging_loss=0.008354, over 16015.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09487, pruned_loss=0.01488, audio_tagging_loss=0.009066, over 3051440.03 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:05:37,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2043626.6666666667, ans=0.2 2023-11-22 18:05:42,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2043626.6666666667, ans=0.125 2023-11-22 18:05:43,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306550 2023-11-22 18:06:06,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2043693.3333333333, ans=0.125 2023-11-22 18:06:09,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.89 vs. limit=10.0 2023-11-22 18:07:05,362 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6000, loss[loss=0.06818, simple_loss=0.07794, pruned_loss=0.01904, audio_tagging_loss=0.01016, over 17012.00 frames. ], tot_loss[loss=0.07122, simple_loss=0.0943, pruned_loss=0.01495, audio_tagging_loss=0.009115, over 3044839.42 frames. ], batch size: 65, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:07:05,363 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 18:07:56,219 INFO [train_asr.py:1253] (1/4) Epoch 26, validation: loss=0.05819, simple_loss=0.05149, pruned_loss=0.005105, audio_tagging_loss=0.02734, over 4681554.00 frames. 2023-11-22 18:07:56,221 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 18:08:01,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306600 2023-11-22 18:08:10,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2044026.6666666667, ans=0.1 2023-11-22 18:08:36,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.220e+01 8.809e+01 9.457e+01 1.359e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 18:08:37,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2044160.0, ans=0.0 2023-11-22 18:08:43,055 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 18:08:56,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-22 18:09:01,888 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6050, loss[loss=0.05821, simple_loss=0.07933, pruned_loss=0.008109, audio_tagging_loss=0.01044, over 14185.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.09513, pruned_loss=0.01505, audio_tagging_loss=0.009089, over 3049427.77 frames. ], batch size: 53, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:09:06,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306650 2023-11-22 18:09:08,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-11-22 18:09:22,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-11-22 18:09:32,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2044426.6666666667, ans=0.2 2023-11-22 18:09:35,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2044426.6666666667, ans=0.125 2023-11-22 18:09:50,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2023-11-22 18:10:00,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2044560.0, ans=0.125 2023-11-22 18:10:04,712 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6100, loss[loss=0.07666, simple_loss=0.1095, pruned_loss=0.01395, audio_tagging_loss=0.007965, over 16411.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09467, pruned_loss=0.01479, audio_tagging_loss=0.009203, over 3055409.38 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:10:10,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306700 2023-11-22 18:10:10,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2044626.6666666667, ans=0.0 2023-11-22 18:10:11,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2044626.6666666667, ans=0.0 2023-11-22 18:10:16,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2044693.3333333333, ans=0.125 2023-11-22 18:10:33,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2044760.0, ans=12.0 2023-11-22 18:10:42,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2044826.6666666667, ans=0.035 2023-11-22 18:10:46,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.417e+01 8.381e+01 8.850e+01 9.348e+01 1.200e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-22 18:10:46,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2044826.6666666667, ans=0.125 2023-11-22 18:10:51,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2044826.6666666667, ans=0.5 2023-11-22 18:11:08,990 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6150, loss[loss=0.06817, simple_loss=0.08452, pruned_loss=0.0147, audio_tagging_loss=0.01121, over 16010.00 frames. ], tot_loss[loss=0.07086, simple_loss=0.09376, pruned_loss=0.01474, audio_tagging_loss=0.009239, over 3046230.99 frames. ], batch size: 62, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:11:13,843 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306750 2023-11-22 18:11:33,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2045093.3333333333, ans=0.0 2023-11-22 18:11:48,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=22.5 2023-11-22 18:11:54,077 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:12:13,101 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6200, loss[loss=0.07017, simple_loss=0.09253, pruned_loss=0.01309, audio_tagging_loss=0.01081, over 15250.00 frames. ], tot_loss[loss=0.07079, simple_loss=0.09341, pruned_loss=0.01474, audio_tagging_loss=0.009347, over 3050093.17 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:12:15,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-22 18:12:18,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306800 2023-11-22 18:12:25,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2045360.0, ans=0.125 2023-11-22 18:12:53,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.049e+01 8.588e+01 9.515e+01 1.238e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-22 18:12:56,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-22 18:13:17,035 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6250, loss[loss=0.08049, simple_loss=0.1033, pruned_loss=0.01899, audio_tagging_loss=0.009868, over 14271.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09264, pruned_loss=0.01456, audio_tagging_loss=0.0095, over 3051009.46 frames. ], batch size: 56, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:13:22,082 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306850 2023-11-22 18:13:24,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2045626.6666666667, ans=0.0 2023-11-22 18:13:36,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2045693.3333333333, ans=0.2 2023-11-22 18:13:37,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2045693.3333333333, ans=0.035 2023-11-22 18:13:38,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-22 18:13:43,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2045760.0, ans=0.125 2023-11-22 18:14:07,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2023-11-22 18:14:13,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2045893.3333333333, ans=0.05 2023-11-22 18:14:18,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2045893.3333333333, ans=0.0 2023-11-22 18:14:19,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2045893.3333333333, ans=0.125 2023-11-22 18:14:21,616 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6300, loss[loss=0.06534, simple_loss=0.08147, pruned_loss=0.01422, audio_tagging_loss=0.01038, over 16594.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09332, pruned_loss=0.01474, audio_tagging_loss=0.009494, over 3046689.61 frames. ], batch size: 64, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:14:26,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306900 2023-11-22 18:14:26,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2045960.0, ans=0.2 2023-11-22 18:14:40,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-22 18:14:42,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2046026.6666666667, ans=15.0 2023-11-22 18:14:44,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2046026.6666666667, ans=0.035 2023-11-22 18:15:00,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2046160.0, ans=0.1 2023-11-22 18:15:00,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=12.0 2023-11-22 18:15:02,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.359e+01 8.414e+01 9.060e+01 9.921e+01 1.444e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-22 18:15:19,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2046226.6666666667, ans=0.5 2023-11-22 18:15:21,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2046226.6666666667, ans=0.125 2023-11-22 18:15:25,747 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6350, loss[loss=0.06692, simple_loss=0.1004, pruned_loss=0.01004, audio_tagging_loss=0.006685, over 15371.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09224, pruned_loss=0.0146, audio_tagging_loss=0.009543, over 3051689.26 frames. ], batch size: 56, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:15:31,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 306950 2023-11-22 18:15:36,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2046293.3333333333, ans=0.125 2023-11-22 18:15:45,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-22 18:16:03,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=22.5 2023-11-22 18:16:29,687 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6400, loss[loss=0.08834, simple_loss=0.1339, pruned_loss=0.01681, audio_tagging_loss=0.004569, over 16748.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09258, pruned_loss=0.01456, audio_tagging_loss=0.009533, over 3051663.75 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:16:34,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307000 2023-11-22 18:16:36,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-11-22 18:16:45,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=22.5 2023-11-22 18:16:50,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2046693.3333333333, ans=0.0 2023-11-22 18:16:55,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=22.5 2023-11-22 18:16:59,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2023-11-22 18:17:10,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.418e+01 9.159e+01 1.026e+02 1.241e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-22 18:17:19,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2046893.3333333333, ans=0.1 2023-11-22 18:17:19,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2046893.3333333333, ans=0.2 2023-11-22 18:17:33,067 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6450, loss[loss=0.08323, simple_loss=0.1051, pruned_loss=0.02109, audio_tagging_loss=0.009595, over 14836.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09273, pruned_loss=0.01465, audio_tagging_loss=0.009662, over 3047285.73 frames. ], batch size: 54, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:17:34,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2046960.0, ans=0.125 2023-11-22 18:17:38,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307050 2023-11-22 18:17:50,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2047026.6666666667, ans=0.0 2023-11-22 18:18:14,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2047160.0, ans=0.125 2023-11-22 18:18:21,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2047160.0, ans=0.2 2023-11-22 18:18:24,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2047226.6666666667, ans=0.0 2023-11-22 18:18:36,813 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6500, loss[loss=0.0916, simple_loss=0.1167, pruned_loss=0.02585, audio_tagging_loss=0.007398, over 15617.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09325, pruned_loss=0.01475, audio_tagging_loss=0.009602, over 3055886.04 frames. ], batch size: 56, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:18:42,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307100 2023-11-22 18:18:53,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2047360.0, ans=0.125 2023-11-22 18:19:04,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2047426.6666666667, ans=0.2 2023-11-22 18:19:19,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.186e+01 8.231e+01 8.746e+01 9.661e+01 1.657e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 18:19:29,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2047560.0, ans=0.125 2023-11-22 18:19:30,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2047560.0, ans=0.07 2023-11-22 18:19:41,391 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6550, loss[loss=0.07764, simple_loss=0.1032, pruned_loss=0.01783, audio_tagging_loss=0.008226, over 15832.00 frames. ], tot_loss[loss=0.07096, simple_loss=0.09324, pruned_loss=0.01487, audio_tagging_loss=0.009465, over 3057544.38 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:19:41,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2047626.6666666667, ans=0.125 2023-11-22 18:19:45,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2047626.6666666667, ans=0.125 2023-11-22 18:19:46,421 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307150 2023-11-22 18:19:53,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2047693.3333333333, ans=22.5 2023-11-22 18:20:07,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2047760.0, ans=0.125 2023-11-22 18:20:10,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2047760.0, ans=0.2 2023-11-22 18:20:21,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2047826.6666666667, ans=0.0 2023-11-22 18:20:28,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-22 18:20:35,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-22 18:20:37,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2047893.3333333333, ans=0.125 2023-11-22 18:20:44,970 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6600, loss[loss=0.06797, simple_loss=0.08854, pruned_loss=0.01511, audio_tagging_loss=0.008592, over 13998.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09355, pruned_loss=0.01496, audio_tagging_loss=0.009333, over 3054807.16 frames. ], batch size: 53, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:20:50,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307200 2023-11-22 18:21:02,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2048026.6666666667, ans=0.125 2023-11-22 18:21:12,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2048093.3333333333, ans=0.125 2023-11-22 18:21:13,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2048093.3333333333, ans=0.125 2023-11-22 18:21:27,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.541e+01 8.316e+01 8.739e+01 9.394e+01 1.270e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 18:21:32,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2048160.0, ans=0.1 2023-11-22 18:21:42,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2048226.6666666667, ans=0.0 2023-11-22 18:21:49,011 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6650, loss[loss=0.05989, simple_loss=0.07358, pruned_loss=0.01123, audio_tagging_loss=0.01187, over 15304.00 frames. ], tot_loss[loss=0.07087, simple_loss=0.0933, pruned_loss=0.01491, audio_tagging_loss=0.009302, over 3053150.40 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:21:54,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307250 2023-11-22 18:22:11,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2048360.0, ans=0.125 2023-11-22 18:22:21,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2048426.6666666667, ans=0.0 2023-11-22 18:22:25,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2048493.3333333333, ans=0.1 2023-11-22 18:22:35,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2048493.3333333333, ans=0.0 2023-11-22 18:22:38,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2048560.0, ans=0.125 2023-11-22 18:22:53,622 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6700, loss[loss=0.063, simple_loss=0.0821, pruned_loss=0.01257, audio_tagging_loss=0.009387, over 15811.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09407, pruned_loss=0.01502, audio_tagging_loss=0.009185, over 3049058.59 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:22:58,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307300 2023-11-22 18:23:12,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2048693.3333333333, ans=0.125 2023-11-22 18:23:12,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-22 18:23:14,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2048693.3333333333, ans=0.125 2023-11-22 18:23:29,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2048826.6666666667, ans=0.2 2023-11-22 18:23:35,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.221e+01 9.103e+01 9.902e+01 1.404e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-22 18:23:56,351 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6750, loss[loss=0.04978, simple_loss=0.05135, pruned_loss=0.0124, audio_tagging_loss=0.0117, over 16211.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.09346, pruned_loss=0.01494, audio_tagging_loss=0.009161, over 3047159.67 frames. ], batch size: 64, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:23:57,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-22 18:24:01,259 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307350 2023-11-22 18:24:01,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2048960.0, ans=0.125 2023-11-22 18:24:04,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-22 18:24:06,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2048960.0, ans=0.125 2023-11-22 18:24:17,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2049026.6666666667, ans=0.04949747468305833 2023-11-22 18:24:25,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-22 18:24:31,749 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:24:51,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2049226.6666666667, ans=0.1 2023-11-22 18:24:59,200 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6800, loss[loss=0.06502, simple_loss=0.08782, pruned_loss=0.01126, audio_tagging_loss=0.009846, over 15091.00 frames. ], tot_loss[loss=0.0703, simple_loss=0.09296, pruned_loss=0.01465, audio_tagging_loss=0.009171, over 3036530.19 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:25:01,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2049293.3333333333, ans=0.0 2023-11-22 18:25:04,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307400 2023-11-22 18:25:40,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.331e+01 8.948e+01 9.524e+01 1.247e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-22 18:25:56,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2049560.0, ans=0.125 2023-11-22 18:26:03,616 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6850, loss[loss=0.08368, simple_loss=0.1076, pruned_loss=0.02165, audio_tagging_loss=0.008218, over 15584.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09161, pruned_loss=0.0144, audio_tagging_loss=0.00922, over 3036278.67 frames. ], batch size: 54, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:26:05,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2049626.6666666667, ans=0.1 2023-11-22 18:26:08,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307450 2023-11-22 18:26:32,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2049760.0, ans=0.0 2023-11-22 18:26:34,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2049760.0, ans=0.1 2023-11-22 18:27:06,589 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6900, loss[loss=0.06764, simple_loss=0.09257, pruned_loss=0.01212, audio_tagging_loss=0.009236, over 15419.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09325, pruned_loss=0.01469, audio_tagging_loss=0.009123, over 3039321.09 frames. ], batch size: 58, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:27:06,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2049960.0, ans=0.125 2023-11-22 18:27:11,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307500 2023-11-22 18:27:22,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2050026.6666666667, ans=0.0 2023-11-22 18:27:36,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2050093.3333333333, ans=0.2 2023-11-22 18:27:49,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.211e+01 8.819e+01 9.704e+01 1.350e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 18:27:53,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2050160.0, ans=0.0 2023-11-22 18:27:56,099 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 18:27:59,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2050226.6666666667, ans=0.125 2023-11-22 18:28:10,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-22 18:28:11,158 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 6950, loss[loss=0.0742, simple_loss=0.1006, pruned_loss=0.01521, audio_tagging_loss=0.00867, over 15471.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09332, pruned_loss=0.01475, audio_tagging_loss=0.009155, over 3044777.40 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:28:16,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307550 2023-11-22 18:28:16,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2050293.3333333333, ans=0.125 2023-11-22 18:28:43,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2050426.6666666667, ans=0.07 2023-11-22 18:28:54,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2050493.3333333333, ans=0.125 2023-11-22 18:28:54,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2050493.3333333333, ans=0.1 2023-11-22 18:28:57,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2050493.3333333333, ans=0.1 2023-11-22 18:29:01,021 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:29:17,593 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7000, loss[loss=0.08909, simple_loss=0.1243, pruned_loss=0.01997, audio_tagging_loss=0.006977, over 14422.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09354, pruned_loss=0.01478, audio_tagging_loss=0.009127, over 3039874.73 frames. ], batch size: 53, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:29:23,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307600 2023-11-22 18:29:59,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.302e+01 8.887e+01 9.651e+01 1.249e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 18:30:16,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2023-11-22 18:30:17,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2050893.3333333333, ans=0.125 2023-11-22 18:30:22,291 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7050, loss[loss=0.06051, simple_loss=0.06691, pruned_loss=0.01508, audio_tagging_loss=0.01198, over 14774.00 frames. ], tot_loss[loss=0.07086, simple_loss=0.09362, pruned_loss=0.01478, audio_tagging_loss=0.009264, over 3045930.14 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:30:27,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307650 2023-11-22 18:30:31,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-11-22 18:30:42,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2051026.6666666667, ans=0.0 2023-11-22 18:30:55,519 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:31:09,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2051160.0, ans=0.0 2023-11-22 18:31:21,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2051226.6666666667, ans=0.2 2023-11-22 18:31:27,263 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7100, loss[loss=0.08278, simple_loss=0.1075, pruned_loss=0.01845, audio_tagging_loss=0.01059, over 16404.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09316, pruned_loss=0.01469, audio_tagging_loss=0.009364, over 3045956.91 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:31:32,413 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307700 2023-11-22 18:31:32,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2051293.3333333333, ans=0.125 2023-11-22 18:31:53,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2051426.6666666667, ans=0.0 2023-11-22 18:32:04,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2051426.6666666667, ans=0.0 2023-11-22 18:32:11,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.336e+01 9.020e+01 9.845e+01 2.040e+02, threshold=1.804e+02, percent-clipped=1.0 2023-11-22 18:32:33,400 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7150, loss[loss=0.07126, simple_loss=0.09, pruned_loss=0.01594, audio_tagging_loss=0.01032, over 15685.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09349, pruned_loss=0.01472, audio_tagging_loss=0.00935, over 3048431.37 frames. ], batch size: 59, lr: 2.62e-03, grad_scale: 16.0 2023-11-22 18:32:38,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307750 2023-11-22 18:32:44,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2051693.3333333333, ans=0.1 2023-11-22 18:32:45,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2051693.3333333333, ans=0.2 2023-11-22 18:32:45,928 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:32:53,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2051693.3333333333, ans=0.125 2023-11-22 18:32:57,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-22 18:33:04,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=8.0 2023-11-22 18:33:13,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-22 18:33:15,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-22 18:33:36,858 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7200, loss[loss=0.08265, simple_loss=0.1112, pruned_loss=0.01856, audio_tagging_loss=0.008497, over 14783.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09379, pruned_loss=0.01471, audio_tagging_loss=0.009349, over 3048289.37 frames. ], batch size: 55, lr: 2.62e-03, grad_scale: 32.0 2023-11-22 18:33:41,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2051960.0, ans=0.07 2023-11-22 18:33:42,466 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307800 2023-11-22 18:33:42,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2051960.0, ans=0.07 2023-11-22 18:33:49,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-11-22 18:34:19,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2052160.0, ans=0.125 2023-11-22 18:34:20,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.282e+01 8.868e+01 9.678e+01 1.341e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 18:34:33,941 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:34:37,609 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:34:40,987 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7250, loss[loss=0.06635, simple_loss=0.08548, pruned_loss=0.01244, audio_tagging_loss=0.01118, over 14961.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09393, pruned_loss=0.01477, audio_tagging_loss=0.009425, over 3049234.30 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:34:46,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307850 2023-11-22 18:35:14,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2052426.6666666667, ans=6.0 2023-11-22 18:35:17,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2052426.6666666667, ans=0.0 2023-11-22 18:35:21,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2052493.3333333333, ans=0.07 2023-11-22 18:35:37,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2023-11-22 18:35:45,813 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7300, loss[loss=0.06047, simple_loss=0.07437, pruned_loss=0.0145, audio_tagging_loss=0.008785, over 14342.00 frames. ], tot_loss[loss=0.07131, simple_loss=0.09408, pruned_loss=0.01496, audio_tagging_loss=0.009316, over 3050875.28 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:35:46,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2052626.6666666667, ans=0.07 2023-11-22 18:35:51,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307900 2023-11-22 18:36:22,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2052826.6666666667, ans=0.125 2023-11-22 18:36:28,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.001e+01 8.613e+01 9.353e+01 1.295e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-22 18:36:49,234 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7350, loss[loss=0.0705, simple_loss=0.09995, pruned_loss=0.01275, audio_tagging_loss=0.007776, over 15931.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09426, pruned_loss=0.01506, audio_tagging_loss=0.009193, over 3052558.46 frames. ], batch size: 60, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:36:54,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 307950 2023-11-22 18:36:55,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2052960.0, ans=0.0 2023-11-22 18:37:25,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2053093.3333333333, ans=0.1 2023-11-22 18:37:52,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2053293.3333333333, ans=0.0 2023-11-22 18:37:53,614 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7400, loss[loss=0.07401, simple_loss=0.09793, pruned_loss=0.01262, audio_tagging_loss=0.01243, over 15406.00 frames. ], tot_loss[loss=0.0707, simple_loss=0.09345, pruned_loss=0.01485, audio_tagging_loss=0.009129, over 3045802.20 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:37:58,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308000 2023-11-22 18:38:12,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2053360.0, ans=0.125 2023-11-22 18:38:15,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2053360.0, ans=0.125 2023-11-22 18:38:23,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2053426.6666666667, ans=0.0 2023-11-22 18:38:40,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.189e+01 8.818e+01 9.680e+01 1.265e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 18:38:50,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2053560.0, ans=0.125 2023-11-22 18:39:02,139 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7450, loss[loss=0.0746, simple_loss=0.09448, pruned_loss=0.01579, audio_tagging_loss=0.01157, over 15253.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09408, pruned_loss=0.0149, audio_tagging_loss=0.00901, over 3043058.15 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:39:06,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2023-11-22 18:39:08,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308050 2023-11-22 18:39:11,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2053626.6666666667, ans=0.125 2023-11-22 18:39:46,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.0 2023-11-22 18:40:07,059 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7500, loss[loss=0.08254, simple_loss=0.1137, pruned_loss=0.0173, audio_tagging_loss=0.008389, over 15873.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.09463, pruned_loss=0.01489, audio_tagging_loss=0.00897, over 3044765.25 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:40:07,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2053960.0, ans=0.0 2023-11-22 18:40:09,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2053960.0, ans=0.125 2023-11-22 18:40:12,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308100 2023-11-22 18:40:18,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2054026.6666666667, ans=0.125 2023-11-22 18:40:52,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.267e+01 8.775e+01 9.244e+01 1.175e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-22 18:41:10,654 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7550, loss[loss=0.0752, simple_loss=0.1012, pruned_loss=0.01497, audio_tagging_loss=0.009626, over 15994.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.09435, pruned_loss=0.0148, audio_tagging_loss=0.009023, over 3054018.04 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:41:16,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308150 2023-11-22 18:41:29,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:41:33,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2054360.0, ans=0.0 2023-11-22 18:41:34,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2054360.0, ans=0.125 2023-11-22 18:41:51,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-22 18:41:52,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2054493.3333333333, ans=0.0 2023-11-22 18:41:59,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2054493.3333333333, ans=0.125 2023-11-22 18:42:14,497 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7600, loss[loss=0.0683, simple_loss=0.09977, pruned_loss=0.0121, audio_tagging_loss=0.006317, over 13889.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09279, pruned_loss=0.0144, audio_tagging_loss=0.009086, over 3046673.17 frames. ], batch size: 54, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:42:16,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2054626.6666666667, ans=0.0 2023-11-22 18:42:20,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308200 2023-11-22 18:42:32,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2054693.3333333333, ans=0.125 2023-11-22 18:42:40,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-22 18:42:56,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2054826.6666666667, ans=0.0 2023-11-22 18:42:59,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 7.973e+01 8.478e+01 9.439e+01 1.214e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-22 18:43:15,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2054893.3333333333, ans=0.125 2023-11-22 18:43:20,405 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7650, loss[loss=0.05724, simple_loss=0.07896, pruned_loss=0.01094, audio_tagging_loss=0.006816, over 14307.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09253, pruned_loss=0.01419, audio_tagging_loss=0.009049, over 3051993.42 frames. ], batch size: 54, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:43:25,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308250 2023-11-22 18:43:26,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-11-22 18:43:48,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2055093.3333333333, ans=0.125 2023-11-22 18:43:52,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2055093.3333333333, ans=0.0 2023-11-22 18:44:12,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2055226.6666666667, ans=10.0 2023-11-22 18:44:13,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2055226.6666666667, ans=0.2 2023-11-22 18:44:24,538 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7700, loss[loss=0.08304, simple_loss=0.1122, pruned_loss=0.01804, audio_tagging_loss=0.008916, over 15139.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09271, pruned_loss=0.01433, audio_tagging_loss=0.009103, over 3052890.06 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:44:29,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308300 2023-11-22 18:44:49,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2055360.0, ans=0.0 2023-11-22 18:45:04,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2055493.3333333333, ans=0.125 2023-11-22 18:45:10,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.353e+01 9.009e+01 9.552e+01 1.417e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-22 18:45:20,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2055560.0, ans=0.125 2023-11-22 18:45:29,283 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7750, loss[loss=0.08229, simple_loss=0.1203, pruned_loss=0.015, audio_tagging_loss=0.007154, over 16186.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09346, pruned_loss=0.01435, audio_tagging_loss=0.009062, over 3052720.50 frames. ], batch size: 59, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:45:34,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308350 2023-11-22 18:46:12,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2055826.6666666667, ans=0.125 2023-11-22 18:46:13,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2055826.6666666667, ans=0.0 2023-11-22 18:46:18,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2023-11-22 18:46:28,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2055893.3333333333, ans=0.05 2023-11-22 18:46:33,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2055960.0, ans=0.0 2023-11-22 18:46:34,449 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7800, loss[loss=0.05322, simple_loss=0.06881, pruned_loss=0.007219, audio_tagging_loss=0.01159, over 15059.00 frames. ], tot_loss[loss=0.07066, simple_loss=0.09399, pruned_loss=0.01451, audio_tagging_loss=0.009153, over 3043757.24 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:46:39,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308400 2023-11-22 18:46:48,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2056026.6666666667, ans=0.0 2023-11-22 18:46:51,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2056026.6666666667, ans=0.0 2023-11-22 18:46:52,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2023-11-22 18:47:02,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2056093.3333333333, ans=0.0 2023-11-22 18:47:07,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2056093.3333333333, ans=0.125 2023-11-22 18:47:21,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.337e+01 8.829e+01 9.514e+01 1.389e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 18:47:21,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2056160.0, ans=0.0 2023-11-22 18:47:31,213 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:47:32,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2056226.6666666667, ans=0.5 2023-11-22 18:47:33,566 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 18:47:38,094 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7850, loss[loss=0.07599, simple_loss=0.1048, pruned_loss=0.01372, audio_tagging_loss=0.009885, over 15598.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09396, pruned_loss=0.01465, audio_tagging_loss=0.009264, over 3040190.56 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:47:43,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308450 2023-11-22 18:47:53,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-22 18:47:58,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2056360.0, ans=0.0 2023-11-22 18:48:01,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2056360.0, ans=0.125 2023-11-22 18:48:03,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2056426.6666666667, ans=0.0 2023-11-22 18:48:07,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2056426.6666666667, ans=0.125 2023-11-22 18:48:12,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2056426.6666666667, ans=0.0 2023-11-22 18:48:15,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2056426.6666666667, ans=0.125 2023-11-22 18:48:19,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2056493.3333333333, ans=0.2 2023-11-22 18:48:25,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=22.5 2023-11-22 18:48:39,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2056560.0, ans=0.125 2023-11-22 18:48:41,449 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7900, loss[loss=0.09881, simple_loss=0.1401, pruned_loss=0.02197, audio_tagging_loss=0.006806, over 16445.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09398, pruned_loss=0.01463, audio_tagging_loss=0.009347, over 3053845.94 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:48:46,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308500 2023-11-22 18:48:49,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2056626.6666666667, ans=0.125 2023-11-22 18:48:53,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=22.5 2023-11-22 18:48:55,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2056693.3333333333, ans=0.07 2023-11-22 18:49:07,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2023-11-22 18:49:27,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 8.185e+01 8.852e+01 9.450e+01 1.416e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-22 18:49:39,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-11-22 18:49:46,611 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 7950, loss[loss=0.08965, simple_loss=0.112, pruned_loss=0.02244, audio_tagging_loss=0.01119, over 15631.00 frames. ], tot_loss[loss=0.07128, simple_loss=0.0939, pruned_loss=0.0148, audio_tagging_loss=0.009527, over 3049674.00 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:49:51,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308550 2023-11-22 18:50:00,022 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 18:50:02,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2057026.6666666667, ans=0.0 2023-11-22 18:50:29,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2057160.0, ans=0.125 2023-11-22 18:50:38,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2057226.6666666667, ans=0.2 2023-11-22 18:50:49,791 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8000, loss[loss=0.05554, simple_loss=0.07142, pruned_loss=0.01032, audio_tagging_loss=0.009509, over 16055.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09293, pruned_loss=0.01478, audio_tagging_loss=0.009658, over 3040600.32 frames. ], batch size: 61, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:50:52,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.95 vs. limit=22.5 2023-11-22 18:50:54,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308600 2023-11-22 18:51:01,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-22 18:51:20,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-22 18:51:25,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2057426.6666666667, ans=0.0 2023-11-22 18:51:36,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.209e+01 8.705e+01 9.523e+01 1.226e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-22 18:51:53,694 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8050, loss[loss=0.09704, simple_loss=0.1304, pruned_loss=0.02455, audio_tagging_loss=0.007281, over 14560.00 frames. ], tot_loss[loss=0.07114, simple_loss=0.09309, pruned_loss=0.0149, audio_tagging_loss=0.009696, over 3045064.77 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:51:54,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2057626.6666666667, ans=0.125 2023-11-22 18:51:58,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308650 2023-11-22 18:52:23,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2057760.0, ans=0.0 2023-11-22 18:52:25,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2057760.0, ans=0.0 2023-11-22 18:52:30,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2057760.0, ans=0.1 2023-11-22 18:52:30,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2057760.0, ans=0.07 2023-11-22 18:52:56,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2057893.3333333333, ans=0.0 2023-11-22 18:52:59,055 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8100, loss[loss=0.06995, simple_loss=0.09655, pruned_loss=0.0135, audio_tagging_loss=0.008175, over 14124.00 frames. ], tot_loss[loss=0.07119, simple_loss=0.09358, pruned_loss=0.01489, audio_tagging_loss=0.009512, over 3045977.82 frames. ], batch size: 53, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:53:04,567 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308700 2023-11-22 18:53:36,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2058160.0, ans=0.0 2023-11-22 18:53:36,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2058160.0, ans=0.0 2023-11-22 18:53:45,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.206e+01 8.841e+01 9.625e+01 1.237e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-22 18:54:03,365 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8150, loss[loss=0.06591, simple_loss=0.08785, pruned_loss=0.01424, audio_tagging_loss=0.007736, over 15607.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09354, pruned_loss=0.01477, audio_tagging_loss=0.009404, over 3048505.35 frames. ], batch size: 60, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:54:08,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308750 2023-11-22 18:54:42,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2058493.3333333333, ans=0.0 2023-11-22 18:55:07,320 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8200, loss[loss=0.0772, simple_loss=0.0969, pruned_loss=0.01737, audio_tagging_loss=0.01139, over 14596.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09307, pruned_loss=0.01461, audio_tagging_loss=0.009329, over 3045525.10 frames. ], batch size: 54, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:55:07,340 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 18:55:12,450 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308800 2023-11-22 18:55:19,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2023-11-22 18:55:25,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2058693.3333333333, ans=0.0 2023-11-22 18:55:43,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2058760.0, ans=0.125 2023-11-22 18:55:46,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2058826.6666666667, ans=0.1 2023-11-22 18:55:47,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2058826.6666666667, ans=0.2 2023-11-22 18:55:55,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.302e+01 8.890e+01 9.670e+01 1.288e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 18:56:11,855 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8250, loss[loss=0.06931, simple_loss=0.08753, pruned_loss=0.01528, audio_tagging_loss=0.01027, over 14633.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09332, pruned_loss=0.01468, audio_tagging_loss=0.009243, over 3044793.83 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:56:16,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-22 18:56:17,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308850 2023-11-22 18:56:18,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2023-11-22 18:56:31,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2059026.6666666667, ans=0.125 2023-11-22 18:56:48,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2059160.0, ans=0.1 2023-11-22 18:56:53,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2059160.0, ans=0.0 2023-11-22 18:57:00,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2059160.0, ans=0.07 2023-11-22 18:57:03,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-22 18:57:04,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2059226.6666666667, ans=0.125 2023-11-22 18:57:05,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2059226.6666666667, ans=0.125 2023-11-22 18:57:06,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2059226.6666666667, ans=0.1 2023-11-22 18:57:16,296 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8300, loss[loss=0.07134, simple_loss=0.1011, pruned_loss=0.01267, audio_tagging_loss=0.008112, over 16472.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09407, pruned_loss=0.01483, audio_tagging_loss=0.009166, over 3039549.58 frames. ], batch size: 62, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:57:21,467 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308900 2023-11-22 18:57:21,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2059293.3333333333, ans=0.125 2023-11-22 18:57:35,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2059360.0, ans=0.0 2023-11-22 18:57:59,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-22 18:58:03,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.357e+01 8.920e+01 9.669e+01 1.328e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 18:58:14,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2059560.0, ans=0.125 2023-11-22 18:58:20,134 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8350, loss[loss=0.06423, simple_loss=0.08379, pruned_loss=0.01237, audio_tagging_loss=0.009961, over 15401.00 frames. ], tot_loss[loss=0.07086, simple_loss=0.09426, pruned_loss=0.01471, audio_tagging_loss=0.009021, over 3042934.56 frames. ], batch size: 61, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 18:58:22,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2059626.6666666667, ans=0.1 2023-11-22 18:58:25,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 308950 2023-11-22 18:58:56,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-22 18:59:22,826 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8400, loss[loss=0.07031, simple_loss=0.09216, pruned_loss=0.01412, audio_tagging_loss=0.01011, over 15348.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09323, pruned_loss=0.01471, audio_tagging_loss=0.009102, over 3050949.40 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 18:59:29,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309000 2023-11-22 18:59:31,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2059960.0, ans=0.125 2023-11-22 18:59:34,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-22 18:59:44,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2060026.6666666667, ans=0.1 2023-11-22 18:59:48,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2060093.3333333333, ans=0.125 2023-11-22 19:00:04,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2060160.0, ans=0.125 2023-11-22 19:00:10,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.438e+01 8.273e+01 8.789e+01 9.898e+01 1.441e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-22 19:00:22,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-22 19:00:23,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2060226.6666666667, ans=0.125 2023-11-22 19:00:28,055 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8450, loss[loss=0.05648, simple_loss=0.0833, pruned_loss=0.007367, audio_tagging_loss=0.007464, over 15439.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09332, pruned_loss=0.01468, audio_tagging_loss=0.009112, over 3052523.12 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:00:33,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309050 2023-11-22 19:00:33,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2060293.3333333333, ans=0.2 2023-11-22 19:00:38,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2060293.3333333333, ans=0.125 2023-11-22 19:01:00,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2060426.6666666667, ans=0.125 2023-11-22 19:01:02,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2060426.6666666667, ans=0.0 2023-11-22 19:01:10,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2060493.3333333333, ans=0.125 2023-11-22 19:01:18,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2060560.0, ans=0.0 2023-11-22 19:01:31,972 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8500, loss[loss=0.07033, simple_loss=0.08654, pruned_loss=0.01672, audio_tagging_loss=0.01033, over 15912.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09314, pruned_loss=0.01457, audio_tagging_loss=0.009082, over 3051679.63 frames. ], batch size: 60, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:01:36,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309100 2023-11-22 19:01:40,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2060626.6666666667, ans=0.125 2023-11-22 19:02:11,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2060826.6666666667, ans=0.125 2023-11-22 19:02:13,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=12.0 2023-11-22 19:02:19,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.300e+01 8.864e+01 9.381e+01 1.141e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 19:02:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2060893.3333333333, ans=0.125 2023-11-22 19:02:33,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2060893.3333333333, ans=0.125 2023-11-22 19:02:35,152 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8550, loss[loss=0.05751, simple_loss=0.07096, pruned_loss=0.01106, audio_tagging_loss=0.01097, over 15230.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09374, pruned_loss=0.01465, audio_tagging_loss=0.009297, over 3060682.10 frames. ], batch size: 59, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:02:39,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2060960.0, ans=0.125 2023-11-22 19:02:41,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309150 2023-11-22 19:02:49,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2061026.6666666667, ans=0.125 2023-11-22 19:02:54,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2061026.6666666667, ans=0.125 2023-11-22 19:03:07,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2061093.3333333333, ans=0.125 2023-11-22 19:03:11,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2061093.3333333333, ans=0.125 2023-11-22 19:03:17,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2061160.0, ans=10.0 2023-11-22 19:03:37,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2061226.6666666667, ans=0.1 2023-11-22 19:03:40,630 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8600, loss[loss=0.07533, simple_loss=0.09674, pruned_loss=0.01901, audio_tagging_loss=0.007942, over 15341.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09386, pruned_loss=0.01467, audio_tagging_loss=0.009214, over 3063073.30 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:03:40,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2061293.3333333333, ans=0.125 2023-11-22 19:03:45,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309200 2023-11-22 19:03:48,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2061293.3333333333, ans=0.125 2023-11-22 19:03:51,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2061293.3333333333, ans=0.125 2023-11-22 19:03:57,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2061360.0, ans=0.0 2023-11-22 19:03:57,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2061360.0, ans=0.0 2023-11-22 19:04:07,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-22 19:04:17,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-11-22 19:04:27,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.278e+01 9.087e+01 9.783e+01 1.157e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-22 19:04:29,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2061493.3333333333, ans=0.125 2023-11-22 19:04:43,837 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8650, loss[loss=0.05387, simple_loss=0.0699, pruned_loss=0.009512, audio_tagging_loss=0.009412, over 15106.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09419, pruned_loss=0.01475, audio_tagging_loss=0.009265, over 3049942.73 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:04:49,428 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309250 2023-11-22 19:04:49,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2023-11-22 19:04:51,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-22 19:05:11,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-22 19:05:33,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2061893.3333333333, ans=0.125 2023-11-22 19:05:37,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2061893.3333333333, ans=0.0 2023-11-22 19:05:47,815 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8700, loss[loss=0.06362, simple_loss=0.0805, pruned_loss=0.01185, audio_tagging_loss=0.01152, over 15734.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.09453, pruned_loss=0.01483, audio_tagging_loss=0.009324, over 3050071.86 frames. ], batch size: 59, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:05:53,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309300 2023-11-22 19:06:00,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-22 19:06:11,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2062026.6666666667, ans=0.125 2023-11-22 19:06:30,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2062160.0, ans=0.0 2023-11-22 19:06:32,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2062160.0, ans=0.125 2023-11-22 19:06:35,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-11-22 19:06:36,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.375e+01 8.886e+01 9.746e+01 1.312e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 19:06:38,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-11-22 19:06:43,066 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:06:46,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-22 19:06:52,431 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8750, loss[loss=0.0697, simple_loss=0.09149, pruned_loss=0.014, audio_tagging_loss=0.009952, over 14154.00 frames. ], tot_loss[loss=0.07192, simple_loss=0.0953, pruned_loss=0.015, audio_tagging_loss=0.009276, over 3048076.94 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:06:55,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2062293.3333333333, ans=0.125 2023-11-22 19:06:57,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309350 2023-11-22 19:07:09,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-22 19:07:42,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2062560.0, ans=0.125 2023-11-22 19:07:55,105 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8800, loss[loss=0.07678, simple_loss=0.09969, pruned_loss=0.01781, audio_tagging_loss=0.009119, over 14787.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.09554, pruned_loss=0.01498, audio_tagging_loss=0.009399, over 3048630.80 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:08:00,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309400 2023-11-22 19:08:23,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2062760.0, ans=0.125 2023-11-22 19:08:33,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2023-11-22 19:08:43,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.328e+01 8.323e+01 8.895e+01 9.421e+01 1.402e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-22 19:08:58,974 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8850, loss[loss=0.06633, simple_loss=0.08966, pruned_loss=0.01222, audio_tagging_loss=0.009277, over 14169.00 frames. ], tot_loss[loss=0.07193, simple_loss=0.09527, pruned_loss=0.01489, audio_tagging_loss=0.009409, over 3042830.10 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:09:03,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309450 2023-11-22 19:09:10,419 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:09:17,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2063026.6666666667, ans=0.125 2023-11-22 19:09:37,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2063160.0, ans=0.0 2023-11-22 19:09:44,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2063160.0, ans=0.1 2023-11-22 19:10:02,961 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8900, loss[loss=0.06796, simple_loss=0.09567, pruned_loss=0.01118, audio_tagging_loss=0.008941, over 15809.00 frames. ], tot_loss[loss=0.07211, simple_loss=0.09556, pruned_loss=0.01499, audio_tagging_loss=0.009337, over 3045546.48 frames. ], batch size: 59, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:10:07,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309500 2023-11-22 19:10:10,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2063293.3333333333, ans=0.125 2023-11-22 19:10:45,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2063493.3333333333, ans=0.125 2023-11-22 19:10:50,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.152e+01 8.729e+01 9.572e+01 1.153e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 19:10:52,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2063560.0, ans=0.1 2023-11-22 19:11:02,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2023-11-22 19:11:05,624 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 8950, loss[loss=0.06718, simple_loss=0.08493, pruned_loss=0.01474, audio_tagging_loss=0.009982, over 15090.00 frames. ], tot_loss[loss=0.07127, simple_loss=0.09453, pruned_loss=0.01476, audio_tagging_loss=0.009247, over 3053482.09 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:11:10,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309550 2023-11-22 19:11:28,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2063693.3333333333, ans=0.07 2023-11-22 19:11:32,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-22 19:11:37,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-22 19:11:47,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2063826.6666666667, ans=0.0 2023-11-22 19:11:59,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2063893.3333333333, ans=0.2 2023-11-22 19:12:07,968 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9000, loss[loss=0.05458, simple_loss=0.06236, pruned_loss=0.01186, audio_tagging_loss=0.01153, over 14734.00 frames. ], tot_loss[loss=0.07104, simple_loss=0.09411, pruned_loss=0.01473, audio_tagging_loss=0.009254, over 3050523.52 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:12:07,969 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 19:12:45,252 INFO [train_asr.py:1253] (1/4) Epoch 26, validation: loss=0.0595, simple_loss=0.05137, pruned_loss=0.00505, audio_tagging_loss=0.02877, over 4681554.00 frames. 2023-11-22 19:12:45,253 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 19:12:48,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2063960.0, ans=0.125 2023-11-22 19:12:50,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309600 2023-11-22 19:13:11,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2064093.3333333333, ans=0.04949747468305833 2023-11-22 19:13:14,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2023-11-22 19:13:16,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=2064093.3333333333, ans=0.02 2023-11-22 19:13:21,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2064160.0, ans=0.5 2023-11-22 19:13:27,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.67 vs. limit=10.0 2023-11-22 19:13:34,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.292e+01 8.198e+01 8.954e+01 9.788e+01 1.376e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 19:13:40,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2064226.6666666667, ans=0.125 2023-11-22 19:13:43,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2064226.6666666667, ans=0.125 2023-11-22 19:13:48,327 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9050, loss[loss=0.07492, simple_loss=0.1029, pruned_loss=0.01723, audio_tagging_loss=0.006248, over 16821.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09471, pruned_loss=0.01495, audio_tagging_loss=0.009191, over 3051611.53 frames. ], batch size: 66, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:13:49,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-22 19:13:51,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2064293.3333333333, ans=0.125 2023-11-22 19:13:53,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309650 2023-11-22 19:13:59,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2064360.0, ans=0.125 2023-11-22 19:14:10,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2064360.0, ans=0.0 2023-11-22 19:14:21,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2064426.6666666667, ans=0.125 2023-11-22 19:14:35,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-22 19:14:50,832 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9100, loss[loss=0.0616, simple_loss=0.0831, pruned_loss=0.009919, audio_tagging_loss=0.01013, over 15345.00 frames. ], tot_loss[loss=0.07138, simple_loss=0.09489, pruned_loss=0.01484, audio_tagging_loss=0.009089, over 3056491.18 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:14:52,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-22 19:14:55,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309700 2023-11-22 19:14:55,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2064626.6666666667, ans=0.125 2023-11-22 19:15:00,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-22 19:15:07,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2064693.3333333333, ans=0.0 2023-11-22 19:15:09,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2064693.3333333333, ans=0.2 2023-11-22 19:15:21,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2064760.0, ans=0.0 2023-11-22 19:15:21,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2064760.0, ans=0.1 2023-11-22 19:15:26,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2064760.0, ans=0.0 2023-11-22 19:15:26,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2064760.0, ans=0.1 2023-11-22 19:15:30,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2064826.6666666667, ans=0.0 2023-11-22 19:15:39,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.178e+01 9.169e+01 9.914e+01 1.442e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-22 19:15:54,992 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9150, loss[loss=0.08218, simple_loss=0.1074, pruned_loss=0.01894, audio_tagging_loss=0.009516, over 14847.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09417, pruned_loss=0.01476, audio_tagging_loss=0.009151, over 3056164.13 frames. ], batch size: 54, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:15:55,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-22 19:15:59,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309750 2023-11-22 19:16:06,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2065026.6666666667, ans=0.2 2023-11-22 19:16:21,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2023-11-22 19:16:55,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2065226.6666666667, ans=0.125 2023-11-22 19:16:57,882 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9200, loss[loss=0.07242, simple_loss=0.09928, pruned_loss=0.01471, audio_tagging_loss=0.00807, over 15083.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09438, pruned_loss=0.01482, audio_tagging_loss=0.009077, over 3052925.63 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 32.0 2023-11-22 19:17:02,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309800 2023-11-22 19:17:27,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2065426.6666666667, ans=0.125 2023-11-22 19:17:34,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-11-22 19:17:36,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2065493.3333333333, ans=0.125 2023-11-22 19:17:48,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.424e+01 8.996e+01 9.911e+01 1.165e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 19:17:52,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2065560.0, ans=10.0 2023-11-22 19:17:56,920 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:17:59,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2065626.6666666667, ans=0.125 2023-11-22 19:18:00,307 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9250, loss[loss=0.06357, simple_loss=0.09267, pruned_loss=0.007058, audio_tagging_loss=0.01018, over 14996.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09303, pruned_loss=0.01456, audio_tagging_loss=0.009137, over 3057440.65 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:18:05,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309850 2023-11-22 19:18:09,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2065626.6666666667, ans=0.125 2023-11-22 19:18:11,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2065693.3333333333, ans=0.125 2023-11-22 19:18:29,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2065760.0, ans=0.1 2023-11-22 19:18:39,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2065826.6666666667, ans=0.125 2023-11-22 19:19:04,401 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9300, loss[loss=0.0924, simple_loss=0.1239, pruned_loss=0.02419, audio_tagging_loss=0.00624, over 16711.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09281, pruned_loss=0.01463, audio_tagging_loss=0.009176, over 3055544.93 frames. ], batch size: 62, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:19:09,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309900 2023-11-22 19:19:19,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2066026.6666666667, ans=0.125 2023-11-22 19:19:31,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2066093.3333333333, ans=0.1 2023-11-22 19:19:46,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2066160.0, ans=0.07 2023-11-22 19:19:51,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2066160.0, ans=0.0 2023-11-22 19:19:56,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 7.933e+01 8.549e+01 9.311e+01 1.177e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-22 19:20:02,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2066226.6666666667, ans=0.05 2023-11-22 19:20:09,193 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9350, loss[loss=0.05834, simple_loss=0.06846, pruned_loss=0.01373, audio_tagging_loss=0.01038, over 14905.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09255, pruned_loss=0.01456, audio_tagging_loss=0.009167, over 3050027.50 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:20:14,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 309950 2023-11-22 19:20:40,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2066426.6666666667, ans=0.125 2023-11-22 19:20:49,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2066493.3333333333, ans=0.1 2023-11-22 19:21:08,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-22 19:21:11,939 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9400, loss[loss=0.07726, simple_loss=0.1039, pruned_loss=0.01746, audio_tagging_loss=0.007836, over 15382.00 frames. ], tot_loss[loss=0.07039, simple_loss=0.09312, pruned_loss=0.01464, audio_tagging_loss=0.009194, over 3053684.37 frames. ], batch size: 58, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:21:13,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2066626.6666666667, ans=0.125 2023-11-22 19:21:16,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310000 2023-11-22 19:21:22,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2066626.6666666667, ans=0.125 2023-11-22 19:21:23,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2066693.3333333333, ans=0.0 2023-11-22 19:21:29,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2066693.3333333333, ans=0.125 2023-11-22 19:21:29,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2066693.3333333333, ans=0.1 2023-11-22 19:21:30,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2066693.3333333333, ans=0.125 2023-11-22 19:21:33,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-11-22 19:21:54,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-22 19:22:03,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.287e+01 9.048e+01 9.651e+01 1.161e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-22 19:22:13,661 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:22:16,572 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9450, loss[loss=0.06062, simple_loss=0.08482, pruned_loss=0.00911, audio_tagging_loss=0.009106, over 14873.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09311, pruned_loss=0.01463, audio_tagging_loss=0.009283, over 3054561.49 frames. ], batch size: 55, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:22:22,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310050 2023-11-22 19:22:26,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2066960.0, ans=0.125 2023-11-22 19:22:34,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2067026.6666666667, ans=0.025 2023-11-22 19:22:38,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2067026.6666666667, ans=0.09899494936611666 2023-11-22 19:22:48,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2067093.3333333333, ans=0.04949747468305833 2023-11-22 19:22:53,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2067160.0, ans=0.125 2023-11-22 19:23:06,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2067226.6666666667, ans=0.09899494936611666 2023-11-22 19:23:20,798 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9500, loss[loss=0.07152, simple_loss=0.09722, pruned_loss=0.01317, audio_tagging_loss=0.009741, over 15177.00 frames. ], tot_loss[loss=0.07098, simple_loss=0.09351, pruned_loss=0.01488, audio_tagging_loss=0.009348, over 3056941.84 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 8.0 2023-11-22 19:23:26,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310100 2023-11-22 19:23:29,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2067293.3333333333, ans=0.125 2023-11-22 19:23:31,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2067293.3333333333, ans=0.5 2023-11-22 19:23:39,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2067360.0, ans=0.1 2023-11-22 19:23:42,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2023-11-22 19:23:46,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-11-22 19:24:12,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.144e+01 8.401e+01 8.999e+01 9.659e+01 1.129e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-22 19:24:22,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2067560.0, ans=0.0 2023-11-22 19:24:24,399 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9550, loss[loss=0.06953, simple_loss=0.08543, pruned_loss=0.01341, audio_tagging_loss=0.01342, over 14647.00 frames. ], tot_loss[loss=0.07163, simple_loss=0.09437, pruned_loss=0.01502, audio_tagging_loss=0.009423, over 3052670.48 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 8.0 2023-11-22 19:24:27,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2067626.6666666667, ans=0.1 2023-11-22 19:24:29,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310150 2023-11-22 19:24:39,242 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:25:15,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-22 19:25:27,151 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9600, loss[loss=0.1027, simple_loss=0.137, pruned_loss=0.02695, audio_tagging_loss=0.007267, over 15231.00 frames. ], tot_loss[loss=0.07209, simple_loss=0.09509, pruned_loss=0.01511, audio_tagging_loss=0.009429, over 3052759.02 frames. ], batch size: 56, lr: 2.61e-03, grad_scale: 16.0 2023-11-22 19:25:30,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2067960.0, ans=0.125 2023-11-22 19:25:32,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310200 2023-11-22 19:25:34,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2067960.0, ans=0.125 2023-11-22 19:26:08,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2023-11-22 19:26:13,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2068160.0, ans=0.025 2023-11-22 19:26:15,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2068160.0, ans=0.125 2023-11-22 19:26:20,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.239e+01 8.821e+01 9.524e+01 1.134e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 19:26:32,855 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9650, loss[loss=0.06672, simple_loss=0.08957, pruned_loss=0.01478, audio_tagging_loss=0.007158, over 15447.00 frames. ], tot_loss[loss=0.07194, simple_loss=0.09475, pruned_loss=0.01511, audio_tagging_loss=0.009447, over 3046727.88 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:26:37,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310250 2023-11-22 19:26:42,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2068293.3333333333, ans=0.1 2023-11-22 19:27:24,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-22 19:27:35,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=22.5 2023-11-22 19:27:36,279 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9700, loss[loss=0.09538, simple_loss=0.1321, pruned_loss=0.02475, audio_tagging_loss=0.004599, over 16323.00 frames. ], tot_loss[loss=0.07135, simple_loss=0.0939, pruned_loss=0.01508, audio_tagging_loss=0.009318, over 3046998.10 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:27:39,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2068626.6666666667, ans=0.125 2023-11-22 19:27:40,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=15.0 2023-11-22 19:27:41,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310300 2023-11-22 19:27:42,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2068626.6666666667, ans=0.1 2023-11-22 19:27:49,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2068693.3333333333, ans=0.125 2023-11-22 19:27:49,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-22 19:27:54,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-11-22 19:27:57,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2068693.3333333333, ans=0.125 2023-11-22 19:28:06,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2068760.0, ans=0.125 2023-11-22 19:28:29,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.323e+01 8.859e+01 9.612e+01 1.303e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 19:28:29,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2068893.3333333333, ans=0.0 2023-11-22 19:28:39,872 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9750, loss[loss=0.06231, simple_loss=0.08465, pruned_loss=0.01377, audio_tagging_loss=0.006219, over 15748.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09379, pruned_loss=0.01496, audio_tagging_loss=0.009171, over 3051183.97 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:28:41,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2068960.0, ans=0.1 2023-11-22 19:28:45,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310350 2023-11-22 19:28:45,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2068960.0, ans=0.125 2023-11-22 19:28:47,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2068960.0, ans=15.0 2023-11-22 19:28:48,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2023-11-22 19:28:52,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2069026.6666666667, ans=0.2 2023-11-22 19:29:15,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-22 19:29:16,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2069093.3333333333, ans=0.125 2023-11-22 19:29:28,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2069160.0, ans=0.125 2023-11-22 19:29:28,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2069160.0, ans=0.0 2023-11-22 19:29:44,818 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9800, loss[loss=0.09117, simple_loss=0.1267, pruned_loss=0.02113, audio_tagging_loss=0.006704, over 15547.00 frames. ], tot_loss[loss=0.07092, simple_loss=0.09358, pruned_loss=0.01502, audio_tagging_loss=0.009121, over 3045282.19 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:29:45,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-11-22 19:29:46,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2069293.3333333333, ans=0.1 2023-11-22 19:29:49,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310400 2023-11-22 19:29:52,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2069293.3333333333, ans=0.0 2023-11-22 19:29:57,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2069360.0, ans=0.1 2023-11-22 19:30:37,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 8.460e+01 9.204e+01 9.900e+01 1.495e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-22 19:30:41,213 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:30:48,376 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9850, loss[loss=0.07353, simple_loss=0.08949, pruned_loss=0.01892, audio_tagging_loss=0.009865, over 15152.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.0936, pruned_loss=0.01488, audio_tagging_loss=0.009088, over 3047247.04 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 8.0 2023-11-22 19:30:49,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2069626.6666666667, ans=0.125 2023-11-22 19:30:53,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310450 2023-11-22 19:30:56,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2069626.6666666667, ans=0.0 2023-11-22 19:31:11,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2069693.3333333333, ans=0.125 2023-11-22 19:31:26,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2069826.6666666667, ans=0.0 2023-11-22 19:31:38,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2069893.3333333333, ans=0.125 2023-11-22 19:31:40,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2069893.3333333333, ans=0.125 2023-11-22 19:31:51,987 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9900, loss[loss=0.0537, simple_loss=0.06099, pruned_loss=0.01146, audio_tagging_loss=0.01174, over 16110.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09373, pruned_loss=0.01492, audio_tagging_loss=0.00898, over 3043528.05 frames. ], batch size: 63, lr: 2.60e-03, grad_scale: 8.0 2023-11-22 19:31:53,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2069960.0, ans=0.0 2023-11-22 19:31:56,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310500 2023-11-22 19:32:01,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2069960.0, ans=0.125 2023-11-22 19:32:22,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2070093.3333333333, ans=0.125 2023-11-22 19:32:34,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2070160.0, ans=0.2 2023-11-22 19:32:40,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2070160.0, ans=0.125 2023-11-22 19:32:44,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2070226.6666666667, ans=0.125 2023-11-22 19:32:45,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.326e+01 8.797e+01 9.739e+01 1.395e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 19:32:56,589 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 9950, loss[loss=0.06992, simple_loss=0.09142, pruned_loss=0.01508, audio_tagging_loss=0.009133, over 15086.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09323, pruned_loss=0.01461, audio_tagging_loss=0.008961, over 3048339.90 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 8.0 2023-11-22 19:33:01,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310550 2023-11-22 19:33:07,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2070360.0, ans=0.1 2023-11-22 19:33:09,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-22 19:33:10,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-22 19:33:35,403 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:33:52,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2070560.0, ans=0.0 2023-11-22 19:33:53,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2070560.0, ans=0.125 2023-11-22 19:33:59,655 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10000, loss[loss=0.05004, simple_loss=0.06241, pruned_loss=0.0079, audio_tagging_loss=0.01094, over 15316.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09287, pruned_loss=0.0144, audio_tagging_loss=0.008999, over 3046204.37 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:34:03,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2070626.6666666667, ans=0.0 2023-11-22 19:34:04,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310600 2023-11-22 19:34:14,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=22.5 2023-11-22 19:34:53,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.266e+01 8.772e+01 9.539e+01 1.390e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 19:34:59,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2070893.3333333333, ans=0.1 2023-11-22 19:35:03,290 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10050, loss[loss=0.05113, simple_loss=0.06898, pruned_loss=0.009735, audio_tagging_loss=0.006907, over 15472.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09272, pruned_loss=0.01439, audio_tagging_loss=0.009014, over 3048463.22 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:35:08,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310650 2023-11-22 19:35:09,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2070960.0, ans=0.0 2023-11-22 19:35:17,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2023-11-22 19:35:22,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2071026.6666666667, ans=0.125 2023-11-22 19:35:26,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-22 19:35:34,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2071093.3333333333, ans=0.0 2023-11-22 19:35:44,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2071160.0, ans=0.125 2023-11-22 19:36:08,960 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10100, loss[loss=0.05428, simple_loss=0.06954, pruned_loss=0.007807, audio_tagging_loss=0.01171, over 15144.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09252, pruned_loss=0.01421, audio_tagging_loss=0.009113, over 3049069.23 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:36:14,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310700 2023-11-22 19:36:18,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071293.3333333333, ans=0.1 2023-11-22 19:36:18,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2023-11-22 19:36:33,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2071426.6666666667, ans=0.125 2023-11-22 19:36:40,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2071426.6666666667, ans=0.125 2023-11-22 19:36:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2071493.3333333333, ans=0.125 2023-11-22 19:36:59,225 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:37:02,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.311e+01 8.891e+01 9.853e+01 2.191e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-22 19:37:03,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2071560.0, ans=0.0 2023-11-22 19:37:12,784 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10150, loss[loss=0.06249, simple_loss=0.08452, pruned_loss=0.01062, audio_tagging_loss=0.009612, over 16425.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09279, pruned_loss=0.01434, audio_tagging_loss=0.009116, over 3057765.98 frames. ], batch size: 61, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:37:17,774 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310750 2023-11-22 19:37:31,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2071693.3333333333, ans=0.95 2023-11-22 19:37:41,914 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:37:48,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2071760.0, ans=0.0 2023-11-22 19:37:51,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-22 19:37:56,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2071826.6666666667, ans=0.2 2023-11-22 19:38:00,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-22 19:38:11,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2071893.3333333333, ans=0.0 2023-11-22 19:38:15,735 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10200, loss[loss=0.0603, simple_loss=0.08037, pruned_loss=0.01112, audio_tagging_loss=0.00899, over 14690.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.0937, pruned_loss=0.01459, audio_tagging_loss=0.009047, over 3057062.42 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:38:20,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310800 2023-11-22 19:38:39,614 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:38:43,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2072093.3333333333, ans=0.0 2023-11-22 19:38:55,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2072160.0, ans=0.04949747468305833 2023-11-22 19:39:08,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.131e+01 8.876e+01 9.472e+01 1.289e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-22 19:39:09,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2072226.6666666667, ans=0.1 2023-11-22 19:39:15,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-11-22 19:39:19,376 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10250, loss[loss=0.06863, simple_loss=0.09038, pruned_loss=0.01524, audio_tagging_loss=0.008192, over 15562.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09345, pruned_loss=0.01475, audio_tagging_loss=0.009117, over 3060419.51 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:39:25,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310850 2023-11-22 19:39:29,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2072293.3333333333, ans=0.125 2023-11-22 19:39:33,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2023-11-22 19:39:34,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2072360.0, ans=0.2 2023-11-22 19:39:41,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2072360.0, ans=0.1 2023-11-22 19:39:45,499 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:39:46,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2072426.6666666667, ans=0.0 2023-11-22 19:39:46,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2072426.6666666667, ans=0.125 2023-11-22 19:39:55,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2072426.6666666667, ans=0.0 2023-11-22 19:39:58,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2072493.3333333333, ans=0.0 2023-11-22 19:40:05,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2072493.3333333333, ans=0.04949747468305833 2023-11-22 19:40:08,417 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:40:22,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2072626.6666666667, ans=0.1 2023-11-22 19:40:23,794 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10300, loss[loss=0.06839, simple_loss=0.09142, pruned_loss=0.01447, audio_tagging_loss=0.008211, over 16613.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09342, pruned_loss=0.01477, audio_tagging_loss=0.009164, over 3055530.18 frames. ], batch size: 64, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:40:27,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2072626.6666666667, ans=0.125 2023-11-22 19:40:27,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2072626.6666666667, ans=0.1 2023-11-22 19:40:28,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310900 2023-11-22 19:40:37,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2072693.3333333333, ans=0.05 2023-11-22 19:40:53,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2072760.0, ans=0.125 2023-11-22 19:40:53,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2072760.0, ans=0.125 2023-11-22 19:40:56,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2072760.0, ans=0.125 2023-11-22 19:41:05,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2072826.6666666667, ans=0.125 2023-11-22 19:41:16,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2072893.3333333333, ans=0.125 2023-11-22 19:41:16,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.279e+01 8.744e+01 9.443e+01 1.295e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 19:41:17,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2072893.3333333333, ans=0.0 2023-11-22 19:41:18,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2072893.3333333333, ans=0.125 2023-11-22 19:41:18,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2072893.3333333333, ans=0.125 2023-11-22 19:41:26,752 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10350, loss[loss=0.06626, simple_loss=0.0868, pruned_loss=0.01331, audio_tagging_loss=0.009553, over 15582.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09326, pruned_loss=0.01475, audio_tagging_loss=0.009243, over 3061174.56 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:41:29,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2072960.0, ans=0.125 2023-11-22 19:41:31,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 310950 2023-11-22 19:41:39,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2073026.6666666667, ans=0.0 2023-11-22 19:42:23,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2073226.6666666667, ans=0.125 2023-11-22 19:42:30,611 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10400, loss[loss=0.07142, simple_loss=0.08997, pruned_loss=0.02034, audio_tagging_loss=0.006099, over 14185.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09353, pruned_loss=0.01478, audio_tagging_loss=0.009313, over 3057205.84 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 19:42:36,826 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311000 2023-11-22 19:42:56,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2073426.6666666667, ans=0.125 2023-11-22 19:43:25,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.233e+01 8.739e+01 9.492e+01 1.200e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-22 19:43:26,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2073560.0, ans=10.0 2023-11-22 19:43:27,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2073560.0, ans=0.0 2023-11-22 19:43:36,097 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10450, loss[loss=0.06722, simple_loss=0.08649, pruned_loss=0.01375, audio_tagging_loss=0.01022, over 15038.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09324, pruned_loss=0.01478, audio_tagging_loss=0.009342, over 3051703.56 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 19:43:41,079 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311050 2023-11-22 19:43:44,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2073626.6666666667, ans=0.125 2023-11-22 19:44:13,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2073826.6666666667, ans=0.125 2023-11-22 19:44:25,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2073826.6666666667, ans=0.0 2023-11-22 19:44:37,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2073893.3333333333, ans=0.125 2023-11-22 19:44:39,307 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10500, loss[loss=0.09357, simple_loss=0.1266, pruned_loss=0.02224, audio_tagging_loss=0.008059, over 15877.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.093, pruned_loss=0.01467, audio_tagging_loss=0.009223, over 3045856.99 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:44:44,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311100 2023-11-22 19:44:54,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2074026.6666666667, ans=0.125 2023-11-22 19:45:06,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2074093.3333333333, ans=0.125 2023-11-22 19:45:12,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2074093.3333333333, ans=0.0 2023-11-22 19:45:22,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2023-11-22 19:45:33,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 7.970e+01 8.549e+01 9.373e+01 1.177e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-22 19:45:38,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 19:45:42,203 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10550, loss[loss=0.06421, simple_loss=0.08962, pruned_loss=0.012, audio_tagging_loss=0.007401, over 16320.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09225, pruned_loss=0.01451, audio_tagging_loss=0.009127, over 3042022.08 frames. ], batch size: 60, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:45:47,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311150 2023-11-22 19:45:58,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-22 19:45:59,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2023-11-22 19:46:06,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2074360.0, ans=0.125 2023-11-22 19:46:20,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2074493.3333333333, ans=0.0 2023-11-22 19:46:20,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2074493.3333333333, ans=0.2 2023-11-22 19:46:47,682 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10600, loss[loss=0.06942, simple_loss=0.09532, pruned_loss=0.01456, audio_tagging_loss=0.007201, over 15504.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09238, pruned_loss=0.01452, audio_tagging_loss=0.009118, over 3043017.95 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:46:53,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311200 2023-11-22 19:47:13,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2074760.0, ans=0.2 2023-11-22 19:47:14,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2074760.0, ans=0.0 2023-11-22 19:47:24,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2074826.6666666667, ans=0.0 2023-11-22 19:47:43,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.586e+01 8.272e+01 8.914e+01 9.712e+01 1.198e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-22 19:47:51,578 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10650, loss[loss=0.0656, simple_loss=0.08605, pruned_loss=0.01462, audio_tagging_loss=0.007948, over 14703.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09299, pruned_loss=0.01466, audio_tagging_loss=0.009082, over 3039604.72 frames. ], batch size: 54, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:47:54,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2074960.0, ans=0.125 2023-11-22 19:47:56,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311250 2023-11-22 19:47:57,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2074960.0, ans=0.125 2023-11-22 19:48:07,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2075026.6666666667, ans=0.0 2023-11-22 19:48:08,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2075026.6666666667, ans=0.125 2023-11-22 19:48:16,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2075093.3333333333, ans=0.125 2023-11-22 19:48:26,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2075093.3333333333, ans=0.2 2023-11-22 19:48:54,725 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10700, loss[loss=0.06381, simple_loss=0.08572, pruned_loss=0.01084, audio_tagging_loss=0.01011, over 14871.00 frames. ], tot_loss[loss=0.06967, simple_loss=0.09232, pruned_loss=0.01443, audio_tagging_loss=0.009084, over 3036660.61 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:49:00,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311300 2023-11-22 19:49:09,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2075360.0, ans=0.2 2023-11-22 19:49:30,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2023-11-22 19:49:32,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2075493.3333333333, ans=0.0 2023-11-22 19:49:33,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-22 19:49:44,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2075560.0, ans=0.125 2023-11-22 19:49:50,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.233e+01 8.994e+01 9.619e+01 1.201e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 19:50:00,035 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10750, loss[loss=0.06396, simple_loss=0.08483, pruned_loss=0.01341, audio_tagging_loss=0.008129, over 15608.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.09214, pruned_loss=0.01424, audio_tagging_loss=0.009069, over 3034601.80 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:50:00,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-22 19:50:01,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2075626.6666666667, ans=0.07 2023-11-22 19:50:04,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2075626.6666666667, ans=0.125 2023-11-22 19:50:05,147 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311350 2023-11-22 19:50:24,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-22 19:50:40,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2075826.6666666667, ans=0.0 2023-11-22 19:50:41,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2075826.6666666667, ans=0.2 2023-11-22 19:50:44,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-22 19:51:04,218 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10800, loss[loss=0.07284, simple_loss=0.09538, pruned_loss=0.01617, audio_tagging_loss=0.008978, over 15183.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09248, pruned_loss=0.01439, audio_tagging_loss=0.009149, over 3035400.45 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 19:51:09,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311400 2023-11-22 19:51:14,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2023-11-22 19:51:38,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-22 19:51:59,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.263e+01 8.884e+01 9.522e+01 1.111e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 19:52:07,948 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10850, loss[loss=0.07377, simple_loss=0.1099, pruned_loss=0.0136, audio_tagging_loss=0.005202, over 15618.00 frames. ], tot_loss[loss=0.07039, simple_loss=0.09348, pruned_loss=0.0146, audio_tagging_loss=0.009054, over 3035223.28 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 19:52:09,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2076293.3333333333, ans=0.2 2023-11-22 19:52:13,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311450 2023-11-22 19:52:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2076360.0, ans=10.0 2023-11-22 19:52:59,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-11-22 19:53:03,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2076560.0, ans=0.0 2023-11-22 19:53:07,032 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:53:13,061 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10900, loss[loss=0.05579, simple_loss=0.05766, pruned_loss=0.01207, audio_tagging_loss=0.0149, over 13829.00 frames. ], tot_loss[loss=0.07054, simple_loss=0.09355, pruned_loss=0.01466, audio_tagging_loss=0.00911, over 3030517.25 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:53:17,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2076626.6666666667, ans=0.125 2023-11-22 19:53:17,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311500 2023-11-22 19:53:28,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2076693.3333333333, ans=0.1 2023-11-22 19:53:48,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2076760.0, ans=0.09899494936611666 2023-11-22 19:53:49,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-22 19:54:09,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.142e+01 8.836e+01 9.377e+01 1.085e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 19:54:10,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2076893.3333333333, ans=0.125 2023-11-22 19:54:16,917 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 10950, loss[loss=0.06534, simple_loss=0.07852, pruned_loss=0.01382, audio_tagging_loss=0.01227, over 15380.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.0925, pruned_loss=0.01445, audio_tagging_loss=0.009182, over 3032500.79 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:54:21,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311550 2023-11-22 19:54:24,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2076960.0, ans=0.0 2023-11-22 19:54:25,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2076960.0, ans=0.1 2023-11-22 19:54:32,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2077026.6666666667, ans=0.125 2023-11-22 19:54:32,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2077026.6666666667, ans=0.07 2023-11-22 19:54:43,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2077093.3333333333, ans=0.0 2023-11-22 19:55:01,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2077160.0, ans=0.125 2023-11-22 19:55:21,199 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11000, loss[loss=0.07495, simple_loss=0.1028, pruned_loss=0.01401, audio_tagging_loss=0.00952, over 14513.00 frames. ], tot_loss[loss=0.07032, simple_loss=0.09308, pruned_loss=0.01455, audio_tagging_loss=0.009233, over 3029489.38 frames. ], batch size: 54, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:55:26,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311600 2023-11-22 19:55:30,044 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 19:55:45,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2077360.0, ans=0.0 2023-11-22 19:55:46,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2023-11-22 19:55:48,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-11-22 19:55:54,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2077426.6666666667, ans=0.125 2023-11-22 19:56:09,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2077493.3333333333, ans=0.2 2023-11-22 19:56:12,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-22 19:56:13,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2077560.0, ans=0.1 2023-11-22 19:56:17,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 8.294e+01 8.863e+01 9.568e+01 1.311e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 19:56:19,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2077560.0, ans=0.125 2023-11-22 19:56:19,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-22 19:56:26,009 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11050, loss[loss=0.06966, simple_loss=0.09128, pruned_loss=0.01411, audio_tagging_loss=0.009917, over 15419.00 frames. ], tot_loss[loss=0.07053, simple_loss=0.09337, pruned_loss=0.01457, audio_tagging_loss=0.009277, over 3039399.24 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:56:31,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311650 2023-11-22 19:57:02,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2077826.6666666667, ans=0.2 2023-11-22 19:57:19,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2077893.3333333333, ans=0.1 2023-11-22 19:57:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2077893.3333333333, ans=0.07 2023-11-22 19:57:29,634 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11100, loss[loss=0.07816, simple_loss=0.1046, pruned_loss=0.01612, audio_tagging_loss=0.009749, over 15114.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09235, pruned_loss=0.01439, audio_tagging_loss=0.009457, over 3038420.53 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:57:34,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311700 2023-11-22 19:57:34,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2077960.0, ans=0.1 2023-11-22 19:58:15,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2078160.0, ans=0.125 2023-11-22 19:58:25,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.159e+01 8.862e+01 9.664e+01 1.542e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-22 19:58:27,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-22 19:58:32,651 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11150, loss[loss=0.09106, simple_loss=0.1309, pruned_loss=0.01803, audio_tagging_loss=0.007579, over 15391.00 frames. ], tot_loss[loss=0.07078, simple_loss=0.09345, pruned_loss=0.01453, audio_tagging_loss=0.009521, over 3042528.91 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 19:58:37,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311750 2023-11-22 19:58:44,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2078360.0, ans=0.0 2023-11-22 19:58:50,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2078360.0, ans=0.125 2023-11-22 19:58:54,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2078360.0, ans=0.0 2023-11-22 19:59:08,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-11-22 19:59:14,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2078493.3333333333, ans=0.1 2023-11-22 19:59:37,903 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11200, loss[loss=0.06527, simple_loss=0.08715, pruned_loss=0.01233, audio_tagging_loss=0.009364, over 14876.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09217, pruned_loss=0.01426, audio_tagging_loss=0.009663, over 3040508.84 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 19:59:43,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311800 2023-11-22 19:59:47,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2078626.6666666667, ans=0.125 2023-11-22 19:59:56,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2078693.3333333333, ans=0.0 2023-11-22 20:00:12,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2078760.0, ans=0.125 2023-11-22 20:00:34,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.732e+01 7.967e+01 8.796e+01 9.529e+01 1.148e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 20:00:42,241 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11250, loss[loss=0.06495, simple_loss=0.07974, pruned_loss=0.01203, audio_tagging_loss=0.01305, over 15270.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09216, pruned_loss=0.01435, audio_tagging_loss=0.009575, over 3038476.17 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 20:00:47,393 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311850 2023-11-22 20:00:58,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2079026.6666666667, ans=0.0 2023-11-22 20:01:03,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2079026.6666666667, ans=0.125 2023-11-22 20:01:25,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2079160.0, ans=0.1 2023-11-22 20:01:37,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2079226.6666666667, ans=0.125 2023-11-22 20:01:45,786 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11300, loss[loss=0.04435, simple_loss=0.05163, pruned_loss=0.008165, audio_tagging_loss=0.01037, over 14271.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09235, pruned_loss=0.01441, audio_tagging_loss=0.009422, over 3039458.34 frames. ], batch size: 54, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 20:01:46,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2079293.3333333333, ans=0.2 2023-11-22 20:01:50,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311900 2023-11-22 20:01:57,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-22 20:02:12,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2079426.6666666667, ans=0.125 2023-11-22 20:02:18,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2023-11-22 20:02:24,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2079493.3333333333, ans=0.0 2023-11-22 20:02:34,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2079560.0, ans=0.125 2023-11-22 20:02:41,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.136e+01 8.665e+01 9.544e+01 1.497e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-22 20:02:47,793 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11350, loss[loss=0.07094, simple_loss=0.09949, pruned_loss=0.0147, audio_tagging_loss=0.006498, over 16162.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09301, pruned_loss=0.01448, audio_tagging_loss=0.009252, over 3046613.66 frames. ], batch size: 62, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:02:53,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 311950 2023-11-22 20:03:05,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2079693.3333333333, ans=0.2 2023-11-22 20:03:05,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2079693.3333333333, ans=0.2 2023-11-22 20:03:07,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2079693.3333333333, ans=0.125 2023-11-22 20:03:10,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2023-11-22 20:03:24,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2079826.6666666667, ans=0.125 2023-11-22 20:03:26,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2079826.6666666667, ans=0.125 2023-11-22 20:03:26,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2079826.6666666667, ans=0.125 2023-11-22 20:03:43,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-22 20:03:50,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2079893.3333333333, ans=0.1 2023-11-22 20:03:52,244 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11400, loss[loss=0.09255, simple_loss=0.1246, pruned_loss=0.0247, audio_tagging_loss=0.005572, over 15250.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.09352, pruned_loss=0.01464, audio_tagging_loss=0.009202, over 3047435.37 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:03:57,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312000 2023-11-22 20:04:47,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2080226.6666666667, ans=0.0 2023-11-22 20:04:48,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2080226.6666666667, ans=0.0 2023-11-22 20:04:52,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.235e+01 8.932e+01 9.432e+01 1.169e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 20:04:53,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2080226.6666666667, ans=0.125 2023-11-22 20:04:59,054 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11450, loss[loss=0.05119, simple_loss=0.05977, pruned_loss=0.007749, audio_tagging_loss=0.01355, over 14885.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09326, pruned_loss=0.01458, audio_tagging_loss=0.009174, over 3049699.55 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:05:04,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312050 2023-11-22 20:05:12,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2080360.0, ans=0.0 2023-11-22 20:05:20,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2080360.0, ans=0.0 2023-11-22 20:05:25,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-11-22 20:05:31,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-11-22 20:06:02,409 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11500, loss[loss=0.08372, simple_loss=0.1188, pruned_loss=0.01848, audio_tagging_loss=0.005843, over 15576.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09343, pruned_loss=0.01478, audio_tagging_loss=0.009094, over 3051010.78 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:06:03,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2080626.6666666667, ans=10.0 2023-11-22 20:06:07,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312100 2023-11-22 20:06:21,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2080693.3333333333, ans=0.125 2023-11-22 20:06:26,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2080693.3333333333, ans=0.0 2023-11-22 20:06:27,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2080760.0, ans=0.0 2023-11-22 20:06:49,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2023-11-22 20:06:55,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2080893.3333333333, ans=0.1 2023-11-22 20:07:00,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.062e+01 8.847e+01 9.578e+01 1.218e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-22 20:07:06,745 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11550, loss[loss=0.08655, simple_loss=0.1235, pruned_loss=0.01801, audio_tagging_loss=0.006781, over 15963.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.09383, pruned_loss=0.01476, audio_tagging_loss=0.009056, over 3051558.41 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:07:11,678 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312150 2023-11-22 20:07:13,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-22 20:07:42,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-22 20:07:43,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-22 20:07:43,836 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 20:07:53,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:08:01,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2081226.6666666667, ans=0.125 2023-11-22 20:08:01,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2081226.6666666667, ans=0.0 2023-11-22 20:08:09,801 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11600, loss[loss=0.05866, simple_loss=0.07853, pruned_loss=0.01046, audio_tagging_loss=0.008937, over 14836.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.0948, pruned_loss=0.01485, audio_tagging_loss=0.008957, over 3047849.80 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 20:08:14,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312200 2023-11-22 20:08:17,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2081293.3333333333, ans=0.0 2023-11-22 20:08:37,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2081426.6666666667, ans=0.125 2023-11-22 20:08:50,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2081493.3333333333, ans=0.07 2023-11-22 20:08:51,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-11-22 20:09:00,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2081560.0, ans=0.2 2023-11-22 20:09:08,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.330e+01 9.074e+01 9.765e+01 1.554e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-22 20:09:11,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-22 20:09:13,386 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11650, loss[loss=0.05555, simple_loss=0.07518, pruned_loss=0.009295, audio_tagging_loss=0.008659, over 14299.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09448, pruned_loss=0.0149, audio_tagging_loss=0.009021, over 3040573.10 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:09:19,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312250 2023-11-22 20:09:33,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2081693.3333333333, ans=0.2 2023-11-22 20:09:45,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2081760.0, ans=0.0 2023-11-22 20:09:47,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2081760.0, ans=0.0 2023-11-22 20:09:49,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2081760.0, ans=0.2 2023-11-22 20:09:53,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2081826.6666666667, ans=0.125 2023-11-22 20:09:54,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2081826.6666666667, ans=0.125 2023-11-22 20:10:03,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2081893.3333333333, ans=0.2 2023-11-22 20:10:16,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2081893.3333333333, ans=0.0 2023-11-22 20:10:18,460 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11700, loss[loss=0.07382, simple_loss=0.1005, pruned_loss=0.01249, audio_tagging_loss=0.01108, over 14347.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09419, pruned_loss=0.01491, audio_tagging_loss=0.009067, over 3044378.84 frames. ], batch size: 54, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:10:23,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-22 20:10:23,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312300 2023-11-22 20:10:36,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-22 20:10:58,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-22 20:11:11,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2082226.6666666667, ans=0.2 2023-11-22 20:11:17,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.850e+01 8.195e+01 8.722e+01 9.532e+01 1.280e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-22 20:11:22,123 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11750, loss[loss=0.1004, simple_loss=0.1279, pruned_loss=0.02758, audio_tagging_loss=0.008925, over 15439.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09407, pruned_loss=0.01489, audio_tagging_loss=0.009105, over 3046515.78 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:11:24,774 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:11:27,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312350 2023-11-22 20:11:29,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2082293.3333333333, ans=0.2 2023-11-22 20:11:39,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2082360.0, ans=0.1 2023-11-22 20:11:40,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2082360.0, ans=0.1 2023-11-22 20:11:47,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2082426.6666666667, ans=0.0 2023-11-22 20:11:54,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-22 20:11:57,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2023-11-22 20:12:00,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2082493.3333333333, ans=0.0 2023-11-22 20:12:15,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2082560.0, ans=0.0 2023-11-22 20:12:25,269 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11800, loss[loss=0.06578, simple_loss=0.08645, pruned_loss=0.01308, audio_tagging_loss=0.009476, over 15478.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09388, pruned_loss=0.01493, audio_tagging_loss=0.009143, over 3042544.99 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:12:30,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312400 2023-11-22 20:12:35,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2082626.6666666667, ans=0.125 2023-11-22 20:12:53,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2082760.0, ans=0.125 2023-11-22 20:13:00,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2082760.0, ans=0.0 2023-11-22 20:13:03,325 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:13:13,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.11 vs. limit=22.5 2023-11-22 20:13:24,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.351e+01 8.869e+01 9.580e+01 1.621e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-22 20:13:30,308 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11850, loss[loss=0.05097, simple_loss=0.0611, pruned_loss=0.008397, audio_tagging_loss=0.01203, over 15733.00 frames. ], tot_loss[loss=0.07169, simple_loss=0.09474, pruned_loss=0.01509, audio_tagging_loss=0.009223, over 3043561.60 frames. ], batch size: 61, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:13:35,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312450 2023-11-22 20:13:43,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2083026.6666666667, ans=0.125 2023-11-22 20:13:48,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2083026.6666666667, ans=0.2 2023-11-22 20:13:51,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2083026.6666666667, ans=0.125 2023-11-22 20:13:55,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2083093.3333333333, ans=0.0 2023-11-22 20:14:10,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2083160.0, ans=0.125 2023-11-22 20:14:16,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2083160.0, ans=0.1 2023-11-22 20:14:20,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2083226.6666666667, ans=0.2 2023-11-22 20:14:25,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2083226.6666666667, ans=0.1 2023-11-22 20:14:34,571 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11900, loss[loss=0.06874, simple_loss=0.08615, pruned_loss=0.01539, audio_tagging_loss=0.01028, over 15336.00 frames. ], tot_loss[loss=0.07189, simple_loss=0.09488, pruned_loss=0.01507, audio_tagging_loss=0.009377, over 3050077.64 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:14:35,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2083293.3333333333, ans=0.125 2023-11-22 20:14:37,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2083293.3333333333, ans=0.1 2023-11-22 20:14:37,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-22 20:14:39,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312500 2023-11-22 20:14:42,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2083293.3333333333, ans=0.1 2023-11-22 20:14:50,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-22 20:14:52,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2083360.0, ans=0.0 2023-11-22 20:15:26,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2083560.0, ans=0.015 2023-11-22 20:15:28,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2083560.0, ans=0.125 2023-11-22 20:15:31,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2083560.0, ans=0.0 2023-11-22 20:15:32,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 8.415e+01 9.018e+01 9.656e+01 1.551e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-22 20:15:36,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2083626.6666666667, ans=0.1 2023-11-22 20:15:37,471 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 11950, loss[loss=0.07425, simple_loss=0.09973, pruned_loss=0.01407, audio_tagging_loss=0.01031, over 14984.00 frames. ], tot_loss[loss=0.07195, simple_loss=0.09494, pruned_loss=0.01505, audio_tagging_loss=0.00943, over 3061499.61 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 16.0 2023-11-22 20:15:40,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2083626.6666666667, ans=0.0 2023-11-22 20:15:42,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312550 2023-11-22 20:16:05,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-22 20:16:26,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2083893.3333333333, ans=0.2 2023-11-22 20:16:38,908 INFO [train_asr.py:1221] (1/4) Epoch 26, batch 12000, loss[loss=0.0671, simple_loss=0.09224, pruned_loss=0.01209, audio_tagging_loss=0.008896, over 15957.00 frames. ], tot_loss[loss=0.07201, simple_loss=0.09487, pruned_loss=0.01506, audio_tagging_loss=0.009513, over 3065430.08 frames. ], batch size: 60, lr: 2.60e-03, grad_scale: 32.0 2023-11-22 20:16:38,909 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 20:16:59,025 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4474, 3.1573, 4.2639, 4.0488, 4.1048, 4.1802, 3.9920, 4.2328], device='cuda:1') 2023-11-22 20:17:20,958 INFO [train_asr.py:1253] (1/4) Epoch 26, validation: loss=0.05885, simple_loss=0.05139, pruned_loss=0.00512, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-22 20:17:20,960 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 20:17:25,647 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312600 2023-11-22 20:17:30,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.63 vs. limit=22.5 2023-11-22 20:17:43,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2023-11-22 20:18:24,317 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 0, loss[loss=0.0848, simple_loss=0.112, pruned_loss=0.0108, audio_tagging_loss=0.01802, over 14434.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.112, pruned_loss=0.0108, audio_tagging_loss=0.01802, over 14434.00 frames. ], batch size: 55, lr: 2.55e-03, grad_scale: 32.0 2023-11-22 20:18:24,317 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 20:18:40,381 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1259, 4.5314, 5.1862, 4.8812], device='cuda:1') 2023-11-22 20:19:01,955 INFO [train_asr.py:1253] (1/4) Epoch 27, validation: loss=0.05818, simple_loss=0.05133, pruned_loss=0.005046, audio_tagging_loss=0.02747, over 4681554.00 frames. 2023-11-22 20:19:01,956 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 20:19:22,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2084180.0, ans=0.1 2023-11-22 20:19:23,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2084180.0, ans=0.125 2023-11-22 20:19:28,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2084246.6666666667, ans=0.1 2023-11-22 20:19:29,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2084246.6666666667, ans=0.0 2023-11-22 20:19:32,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.281e+01 9.255e+01 1.009e+02 1.305e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-22 20:19:34,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.77 vs. limit=10.0 2023-11-22 20:19:43,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312650 2023-11-22 20:19:45,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=22.5 2023-11-22 20:19:50,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2023-11-22 20:19:59,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2084380.0, ans=0.125 2023-11-22 20:20:00,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2084380.0, ans=0.2 2023-11-22 20:20:00,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2084380.0, ans=0.0 2023-11-22 20:20:06,125 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 50, loss[loss=0.09985, simple_loss=0.1218, pruned_loss=0.02519, audio_tagging_loss=0.01377, over 14987.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.09228, pruned_loss=0.01493, audio_tagging_loss=0.01851, over 689058.64 frames. ], batch size: 54, lr: 2.55e-03, grad_scale: 16.0 2023-11-22 20:20:11,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2084446.6666666667, ans=0.125 2023-11-22 20:20:12,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2084446.6666666667, ans=0.0 2023-11-22 20:20:25,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2084513.3333333333, ans=0.09899494936611666 2023-11-22 20:20:34,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2084580.0, ans=0.5 2023-11-22 20:20:47,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312700 2023-11-22 20:21:03,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2084713.3333333333, ans=0.0 2023-11-22 20:21:12,346 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 100, loss[loss=0.07407, simple_loss=0.08515, pruned_loss=0.01292, audio_tagging_loss=0.01857, over 16056.00 frames. ], tot_loss[loss=0.07954, simple_loss=0.09476, pruned_loss=0.01498, audio_tagging_loss=0.01719, over 1210945.31 frames. ], batch size: 62, lr: 2.55e-03, grad_scale: 16.0 2023-11-22 20:21:18,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2084780.0, ans=0.025 2023-11-22 20:21:20,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2084780.0, ans=0.125 2023-11-22 20:21:27,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-22 20:21:35,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2084846.6666666667, ans=0.1 2023-11-22 20:21:36,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2023-11-22 20:21:42,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.923e+01 9.571e+01 1.006e+02 1.303e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-22 20:21:51,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312750 2023-11-22 20:21:58,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2084980.0, ans=0.125 2023-11-22 20:22:01,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2084980.0, ans=0.125 2023-11-22 20:22:17,556 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 150, loss[loss=0.09714, simple_loss=0.131, pruned_loss=0.02096, audio_tagging_loss=0.01069, over 15443.00 frames. ], tot_loss[loss=0.0778, simple_loss=0.09488, pruned_loss=0.01504, audio_tagging_loss=0.01532, over 1621048.88 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:22:27,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2085113.3333333333, ans=0.07 2023-11-22 20:22:47,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2085246.6666666667, ans=0.125 2023-11-22 20:22:50,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2085246.6666666667, ans=0.5 2023-11-22 20:22:52,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2085246.6666666667, ans=0.0 2023-11-22 20:22:57,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312800 2023-11-22 20:23:22,026 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 200, loss[loss=0.08098, simple_loss=0.1144, pruned_loss=0.01602, audio_tagging_loss=0.007765, over 14643.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09415, pruned_loss=0.01505, audio_tagging_loss=0.01359, over 1936357.43 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:23:26,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2085446.6666666667, ans=0.0 2023-11-22 20:23:30,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2085446.6666666667, ans=0.0 2023-11-22 20:23:50,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2085580.0, ans=0.125 2023-11-22 20:23:53,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.443e+01 8.985e+01 9.584e+01 1.364e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 20:23:57,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2023-11-22 20:24:03,027 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312850 2023-11-22 20:24:28,012 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 250, loss[loss=0.08165, simple_loss=0.1185, pruned_loss=0.01588, audio_tagging_loss=0.006502, over 14512.00 frames. ], tot_loss[loss=0.07433, simple_loss=0.09437, pruned_loss=0.01498, audio_tagging_loss=0.01217, over 2186742.96 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:24:43,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-22 20:24:48,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2085846.6666666667, ans=0.0 2023-11-22 20:25:01,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2085913.3333333333, ans=0.125 2023-11-22 20:25:04,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2085980.0, ans=0.95 2023-11-22 20:25:07,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312900 2023-11-22 20:25:19,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2023-11-22 20:25:31,863 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 300, loss[loss=0.06827, simple_loss=0.08736, pruned_loss=0.01568, audio_tagging_loss=0.008909, over 15958.00 frames. ], tot_loss[loss=0.07396, simple_loss=0.09511, pruned_loss=0.01513, audio_tagging_loss=0.01128, over 2380203.22 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:25:35,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-11-22 20:25:38,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2086113.3333333333, ans=0.125 2023-11-22 20:25:47,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2086180.0, ans=0.125 2023-11-22 20:25:51,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2086180.0, ans=0.0 2023-11-22 20:26:03,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.485e+01 8.248e+01 9.049e+01 9.988e+01 1.263e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-22 20:26:06,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2086246.6666666667, ans=0.09899494936611666 2023-11-22 20:26:12,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 312950 2023-11-22 20:26:15,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=22.5 2023-11-22 20:26:20,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2086313.3333333333, ans=0.1 2023-11-22 20:26:22,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2086380.0, ans=0.125 2023-11-22 20:26:36,486 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 350, loss[loss=0.07885, simple_loss=0.1048, pruned_loss=0.01767, audio_tagging_loss=0.008801, over 15451.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09526, pruned_loss=0.01496, audio_tagging_loss=0.01069, over 2527580.48 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:26:40,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2086446.6666666667, ans=0.125 2023-11-22 20:26:44,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2086446.6666666667, ans=0.125 2023-11-22 20:26:46,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2086446.6666666667, ans=0.0 2023-11-22 20:26:53,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2086513.3333333333, ans=0.125 2023-11-22 20:27:07,781 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:27:14,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-22 20:27:16,463 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313000 2023-11-22 20:27:25,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2086646.6666666667, ans=0.1 2023-11-22 20:27:35,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2086713.3333333333, ans=0.2 2023-11-22 20:27:42,156 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 400, loss[loss=0.06843, simple_loss=0.07701, pruned_loss=0.01512, audio_tagging_loss=0.0148, over 15026.00 frames. ], tot_loss[loss=0.07265, simple_loss=0.095, pruned_loss=0.01483, audio_tagging_loss=0.01032, over 2641607.22 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:27:58,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2086846.6666666667, ans=0.0 2023-11-22 20:27:58,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2086846.6666666667, ans=0.0 2023-11-22 20:28:11,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2086913.3333333333, ans=0.0 2023-11-22 20:28:12,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.142e+01 8.725e+01 9.349e+01 1.086e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 20:28:21,052 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313050 2023-11-22 20:28:24,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2086980.0, ans=0.125 2023-11-22 20:28:40,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2087046.6666666667, ans=0.0 2023-11-22 20:28:45,809 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 450, loss[loss=0.0778, simple_loss=0.1052, pruned_loss=0.01528, audio_tagging_loss=0.009909, over 15088.00 frames. ], tot_loss[loss=0.07194, simple_loss=0.09459, pruned_loss=0.01469, audio_tagging_loss=0.009953, over 2728514.25 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:28:53,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2087113.3333333333, ans=0.125 2023-11-22 20:29:21,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2087246.6666666667, ans=0.0 2023-11-22 20:29:25,387 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313100 2023-11-22 20:29:49,306 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 500, loss[loss=0.07479, simple_loss=0.09954, pruned_loss=0.0133, audio_tagging_loss=0.01173, over 14905.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09422, pruned_loss=0.0146, audio_tagging_loss=0.009744, over 2787819.30 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:29:54,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2087446.6666666667, ans=0.125 2023-11-22 20:29:58,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2087446.6666666667, ans=0.0 2023-11-22 20:29:59,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2087446.6666666667, ans=0.125 2023-11-22 20:30:15,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2087580.0, ans=0.125 2023-11-22 20:30:19,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.170e+01 8.718e+01 9.505e+01 1.331e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-22 20:30:28,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313150 2023-11-22 20:30:46,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-22 20:30:47,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2087713.3333333333, ans=0.0 2023-11-22 20:30:54,212 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 550, loss[loss=0.08248, simple_loss=0.107, pruned_loss=0.01855, audio_tagging_loss=0.01044, over 15114.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.09301, pruned_loss=0.01441, audio_tagging_loss=0.009687, over 2845724.64 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:31:01,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2087780.0, ans=0.0 2023-11-22 20:31:22,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2087913.3333333333, ans=0.2 2023-11-22 20:31:32,779 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313200 2023-11-22 20:31:43,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-11-22 20:31:45,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2088046.6666666667, ans=0.125 2023-11-22 20:31:57,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-22 20:31:57,890 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 600, loss[loss=0.05758, simple_loss=0.07878, pruned_loss=0.009293, audio_tagging_loss=0.0089, over 15044.00 frames. ], tot_loss[loss=0.07055, simple_loss=0.093, pruned_loss=0.01444, audio_tagging_loss=0.00961, over 2891195.73 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:31:59,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2088113.3333333333, ans=0.1 2023-11-22 20:32:05,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-11-22 20:32:28,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.589e+01 9.222e+01 1.004e+02 1.377e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-22 20:32:37,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-22 20:32:37,971 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313250 2023-11-22 20:32:46,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2088313.3333333333, ans=0.125 2023-11-22 20:33:01,057 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 650, loss[loss=0.04921, simple_loss=0.068, pruned_loss=0.009299, audio_tagging_loss=0.00591, over 13998.00 frames. ], tot_loss[loss=0.07058, simple_loss=0.09297, pruned_loss=0.01444, audio_tagging_loss=0.009655, over 2926664.04 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:33:01,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2088446.6666666667, ans=0.125 2023-11-22 20:33:30,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2088580.0, ans=0.125 2023-11-22 20:33:42,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313300 2023-11-22 20:34:06,054 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 700, loss[loss=0.07457, simple_loss=0.1027, pruned_loss=0.01661, audio_tagging_loss=0.006627, over 15136.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09358, pruned_loss=0.01456, audio_tagging_loss=0.009679, over 2958503.98 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:34:36,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.373e+01 8.817e+01 9.744e+01 1.427e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 20:34:41,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2023-11-22 20:34:45,566 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313350 2023-11-22 20:35:02,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=22.5 2023-11-22 20:35:11,618 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 750, loss[loss=0.0679, simple_loss=0.1008, pruned_loss=0.00942, audio_tagging_loss=0.00809, over 15798.00 frames. ], tot_loss[loss=0.07114, simple_loss=0.09399, pruned_loss=0.01456, audio_tagging_loss=0.009589, over 2984397.31 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:35:19,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2089113.3333333333, ans=0.09899494936611666 2023-11-22 20:35:22,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2089180.0, ans=0.0 2023-11-22 20:35:23,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-22 20:35:24,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2089180.0, ans=0.0 2023-11-22 20:35:24,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-22 20:35:52,101 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313400 2023-11-22 20:36:04,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2089380.0, ans=0.125 2023-11-22 20:36:15,350 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 800, loss[loss=0.08926, simple_loss=0.1235, pruned_loss=0.0171, audio_tagging_loss=0.01038, over 14703.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09335, pruned_loss=0.01455, audio_tagging_loss=0.009634, over 2994744.49 frames. ], batch size: 53, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:36:22,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2089446.6666666667, ans=0.04949747468305833 2023-11-22 20:36:24,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-22 20:36:45,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-22 20:36:47,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.197e+01 8.696e+01 9.566e+01 1.219e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-22 20:36:55,831 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313450 2023-11-22 20:36:58,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2089646.6666666667, ans=0.125 2023-11-22 20:36:59,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2089646.6666666667, ans=0.0 2023-11-22 20:37:03,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2089646.6666666667, ans=0.125 2023-11-22 20:37:10,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2089713.3333333333, ans=0.0 2023-11-22 20:37:18,945 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 850, loss[loss=0.06484, simple_loss=0.08751, pruned_loss=0.01014, audio_tagging_loss=0.01095, over 15168.00 frames. ], tot_loss[loss=0.0713, simple_loss=0.09396, pruned_loss=0.01471, audio_tagging_loss=0.009606, over 3005989.18 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:37:25,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2089780.0, ans=0.125 2023-11-22 20:37:42,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:37:59,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313500 2023-11-22 20:38:07,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-22 20:38:12,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2090046.6666666667, ans=0.125 2023-11-22 20:38:13,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-22 20:38:14,626 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:38:21,345 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:38:24,673 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 900, loss[loss=0.05792, simple_loss=0.07255, pruned_loss=0.009517, audio_tagging_loss=0.01213, over 15068.00 frames. ], tot_loss[loss=0.07157, simple_loss=0.0943, pruned_loss=0.01483, audio_tagging_loss=0.009591, over 3019353.53 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:38:26,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2090113.3333333333, ans=0.1 2023-11-22 20:38:35,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2090180.0, ans=0.1 2023-11-22 20:38:38,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2090180.0, ans=0.125 2023-11-22 20:38:39,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2090180.0, ans=0.07 2023-11-22 20:38:44,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2090180.0, ans=0.125 2023-11-22 20:38:46,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-22 20:38:53,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.563e+01 8.151e+01 8.989e+01 9.712e+01 1.407e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-22 20:38:57,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-22 20:39:04,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313550 2023-11-22 20:39:20,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2090380.0, ans=0.2 2023-11-22 20:39:27,630 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 950, loss[loss=0.07351, simple_loss=0.09946, pruned_loss=0.01351, audio_tagging_loss=0.01028, over 15493.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.09454, pruned_loss=0.0148, audio_tagging_loss=0.00952, over 3024734.24 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:39:35,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2090446.6666666667, ans=0.0 2023-11-22 20:40:07,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313600 2023-11-22 20:40:13,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2090646.6666666667, ans=0.0 2023-11-22 20:40:25,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.56 vs. limit=15.0 2023-11-22 20:40:28,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=22.5 2023-11-22 20:40:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2090780.0, ans=0.125 2023-11-22 20:40:31,708 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1000, loss[loss=0.07059, simple_loss=0.1019, pruned_loss=0.01232, audio_tagging_loss=0.007334, over 16033.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09419, pruned_loss=0.01457, audio_tagging_loss=0.009362, over 3039560.86 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:40:45,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2090846.6666666667, ans=0.125 2023-11-22 20:40:50,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2090846.6666666667, ans=0.2 2023-11-22 20:40:51,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-22 20:40:52,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2090846.6666666667, ans=0.125 2023-11-22 20:40:55,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2023-11-22 20:40:57,977 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 20:41:00,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2090913.3333333333, ans=0.2 2023-11-22 20:41:03,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.088e+01 8.617e+01 9.308e+01 1.248e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-22 20:41:11,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313650 2023-11-22 20:41:15,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2090980.0, ans=0.2 2023-11-22 20:41:36,657 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1050, loss[loss=0.04493, simple_loss=0.05788, pruned_loss=0.004894, audio_tagging_loss=0.0111, over 13903.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09468, pruned_loss=0.0147, audio_tagging_loss=0.009226, over 3040087.86 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:41:47,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2091180.0, ans=0.125 2023-11-22 20:42:15,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313700 2023-11-22 20:42:16,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2091313.3333333333, ans=0.125 2023-11-22 20:42:21,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2091313.3333333333, ans=0.5 2023-11-22 20:42:39,779 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1100, loss[loss=0.05722, simple_loss=0.07526, pruned_loss=0.01153, audio_tagging_loss=0.008058, over 14707.00 frames. ], tot_loss[loss=0.07053, simple_loss=0.09349, pruned_loss=0.01461, audio_tagging_loss=0.009174, over 3034251.61 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:42:41,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2091446.6666666667, ans=0.05 2023-11-22 20:42:42,297 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 20:42:48,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2091446.6666666667, ans=0.125 2023-11-22 20:43:12,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.123e+01 8.810e+01 9.567e+01 1.299e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-22 20:43:20,182 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313750 2023-11-22 20:43:37,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2091713.3333333333, ans=10.0 2023-11-22 20:43:41,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-22 20:43:43,606 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1150, loss[loss=0.0763, simple_loss=0.1, pruned_loss=0.01904, audio_tagging_loss=0.007251, over 15317.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09251, pruned_loss=0.01453, audio_tagging_loss=0.009224, over 3030356.88 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:43:49,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2091780.0, ans=0.0 2023-11-22 20:43:51,371 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 20:44:17,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2091913.3333333333, ans=0.125 2023-11-22 20:44:23,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313800 2023-11-22 20:44:30,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2091980.0, ans=0.0 2023-11-22 20:44:31,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2091980.0, ans=0.125 2023-11-22 20:44:42,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.27 vs. limit=10.0 2023-11-22 20:44:49,084 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1200, loss[loss=0.05749, simple_loss=0.07174, pruned_loss=0.008176, audio_tagging_loss=0.01344, over 15269.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09295, pruned_loss=0.01454, audio_tagging_loss=0.009235, over 3035209.55 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:44:56,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2092113.3333333333, ans=0.2 2023-11-22 20:45:01,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2092180.0, ans=0.0 2023-11-22 20:45:04,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-22 20:45:06,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-22 20:45:19,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.331e+01 9.096e+01 9.827e+01 1.345e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-22 20:45:22,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2092246.6666666667, ans=0.2 2023-11-22 20:45:26,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2092313.3333333333, ans=0.125 2023-11-22 20:45:27,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313850 2023-11-22 20:45:30,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2092313.3333333333, ans=0.2 2023-11-22 20:45:48,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2092380.0, ans=0.0 2023-11-22 20:45:52,585 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1250, loss[loss=0.06, simple_loss=0.07522, pruned_loss=0.009009, audio_tagging_loss=0.01338, over 16633.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09191, pruned_loss=0.01428, audio_tagging_loss=0.00922, over 3044551.38 frames. ], batch size: 64, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:45:52,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2092446.6666666667, ans=0.0 2023-11-22 20:46:06,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2092513.3333333333, ans=0.125 2023-11-22 20:46:23,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2092580.0, ans=0.0 2023-11-22 20:46:32,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313900 2023-11-22 20:46:33,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=22.5 2023-11-22 20:46:43,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2023-11-22 20:46:49,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2092713.3333333333, ans=0.1 2023-11-22 20:46:53,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2092713.3333333333, ans=0.025 2023-11-22 20:46:56,818 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1300, loss[loss=0.04972, simple_loss=0.06005, pruned_loss=0.009815, audio_tagging_loss=0.009878, over 14962.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09127, pruned_loss=0.01419, audio_tagging_loss=0.009218, over 3037196.16 frames. ], batch size: 60, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:47:07,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-22 20:47:25,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2092913.3333333333, ans=0.0 2023-11-22 20:47:29,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.264e+01 8.633e+01 9.270e+01 1.307e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-22 20:47:29,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2092913.3333333333, ans=0.125 2023-11-22 20:47:31,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2092913.3333333333, ans=0.2 2023-11-22 20:47:36,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 313950 2023-11-22 20:47:53,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2093046.6666666667, ans=0.0 2023-11-22 20:48:01,425 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1350, loss[loss=0.07901, simple_loss=0.1041, pruned_loss=0.01566, audio_tagging_loss=0.0113, over 14273.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09196, pruned_loss=0.01429, audio_tagging_loss=0.009268, over 3034672.82 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:48:06,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2093113.3333333333, ans=0.1 2023-11-22 20:48:22,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2093180.0, ans=0.0 2023-11-22 20:48:32,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-22 20:48:37,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-11-22 20:48:38,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2093313.3333333333, ans=0.0 2023-11-22 20:48:40,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314000 2023-11-22 20:48:47,388 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 20:49:05,249 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1400, loss[loss=0.08515, simple_loss=0.1117, pruned_loss=0.02128, audio_tagging_loss=0.008033, over 15505.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09153, pruned_loss=0.01422, audio_tagging_loss=0.009346, over 3032922.84 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:49:33,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2093580.0, ans=0.0 2023-11-22 20:49:38,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.151e+01 8.829e+01 9.680e+01 1.142e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 20:49:43,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2093646.6666666667, ans=0.0 2023-11-22 20:49:44,908 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314050 2023-11-22 20:50:00,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2093713.3333333333, ans=0.2 2023-11-22 20:50:07,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2093780.0, ans=0.125 2023-11-22 20:50:08,773 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1450, loss[loss=0.06852, simple_loss=0.09189, pruned_loss=0.01356, audio_tagging_loss=0.009007, over 15589.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09175, pruned_loss=0.01431, audio_tagging_loss=0.009435, over 3039875.12 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:50:49,255 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314100 2023-11-22 20:50:54,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2093980.0, ans=10.0 2023-11-22 20:50:56,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2093980.0, ans=0.125 2023-11-22 20:51:12,965 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1500, loss[loss=0.08535, simple_loss=0.08944, pruned_loss=0.02865, audio_tagging_loss=0.01198, over 14395.00 frames. ], tot_loss[loss=0.06997, simple_loss=0.09212, pruned_loss=0.0145, audio_tagging_loss=0.009413, over 3036946.25 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:51:19,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2023-11-22 20:51:35,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-22 20:51:46,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.611e+01 8.220e+01 8.857e+01 9.629e+01 1.228e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-22 20:51:46,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2094246.6666666667, ans=0.2 2023-11-22 20:51:48,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-22 20:51:52,795 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314150 2023-11-22 20:51:59,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2094313.3333333333, ans=0.04949747468305833 2023-11-22 20:52:17,411 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1550, loss[loss=0.07033, simple_loss=0.08123, pruned_loss=0.01775, audio_tagging_loss=0.01196, over 14185.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09199, pruned_loss=0.01453, audio_tagging_loss=0.009467, over 3033119.13 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 20:52:21,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-22 20:52:45,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2094580.0, ans=0.1 2023-11-22 20:52:45,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.09 vs. limit=10.0 2023-11-22 20:52:46,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2094580.0, ans=0.1 2023-11-22 20:52:51,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2094580.0, ans=0.125 2023-11-22 20:52:55,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2094646.6666666667, ans=0.125 2023-11-22 20:52:57,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314200 2023-11-22 20:53:03,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2094646.6666666667, ans=0.2 2023-11-22 20:53:03,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-22 20:53:07,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2094713.3333333333, ans=0.95 2023-11-22 20:53:18,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2094713.3333333333, ans=0.0 2023-11-22 20:53:21,762 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1600, loss[loss=0.0612, simple_loss=0.08358, pruned_loss=0.01216, audio_tagging_loss=0.00724, over 16094.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09288, pruned_loss=0.01449, audio_tagging_loss=0.009405, over 3041332.92 frames. ], batch size: 61, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:53:54,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.862e+01 8.130e+01 8.806e+01 9.580e+01 1.162e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-22 20:54:01,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314250 2023-11-22 20:54:01,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2094980.0, ans=0.0 2023-11-22 20:54:21,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2095046.6666666667, ans=0.125 2023-11-22 20:54:25,227 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1650, loss[loss=0.08114, simple_loss=0.1107, pruned_loss=0.01652, audio_tagging_loss=0.009254, over 14925.00 frames. ], tot_loss[loss=0.06995, simple_loss=0.09217, pruned_loss=0.01437, audio_tagging_loss=0.009496, over 3041135.52 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:54:34,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2095113.3333333333, ans=0.125 2023-11-22 20:54:54,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2095246.6666666667, ans=0.125 2023-11-22 20:55:05,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314300 2023-11-22 20:55:13,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2095313.3333333333, ans=0.0 2023-11-22 20:55:30,014 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1700, loss[loss=0.06408, simple_loss=0.0914, pruned_loss=0.01013, audio_tagging_loss=0.008254, over 15221.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09287, pruned_loss=0.01436, audio_tagging_loss=0.009492, over 3046340.06 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:55:30,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2095446.6666666667, ans=0.125 2023-11-22 20:55:37,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2095446.6666666667, ans=0.125 2023-11-22 20:55:54,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2095580.0, ans=0.1 2023-11-22 20:55:57,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2095580.0, ans=0.125 2023-11-22 20:55:58,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-22 20:56:02,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.274e+01 8.793e+01 9.384e+01 1.139e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-22 20:56:10,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314350 2023-11-22 20:56:15,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2095646.6666666667, ans=0.125 2023-11-22 20:56:22,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2095713.3333333333, ans=0.0 2023-11-22 20:56:33,705 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1750, loss[loss=0.0582, simple_loss=0.07643, pruned_loss=0.01086, audio_tagging_loss=0.009121, over 14139.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09296, pruned_loss=0.01434, audio_tagging_loss=0.009439, over 3045760.78 frames. ], batch size: 53, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:56:48,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2095846.6666666667, ans=0.125 2023-11-22 20:56:49,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2095846.6666666667, ans=0.125 2023-11-22 20:56:50,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2095846.6666666667, ans=0.125 2023-11-22 20:56:53,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2095846.6666666667, ans=0.125 2023-11-22 20:57:00,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2095913.3333333333, ans=0.2 2023-11-22 20:57:00,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2095913.3333333333, ans=0.125 2023-11-22 20:57:14,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314400 2023-11-22 20:57:21,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2095980.0, ans=0.1 2023-11-22 20:57:22,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-22 20:57:38,151 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1800, loss[loss=0.08633, simple_loss=0.1229, pruned_loss=0.01881, audio_tagging_loss=0.006051, over 16237.00 frames. ], tot_loss[loss=0.07048, simple_loss=0.09354, pruned_loss=0.01439, audio_tagging_loss=0.009326, over 3049337.43 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:58:04,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2096246.6666666667, ans=0.125 2023-11-22 20:58:11,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.025e+01 8.684e+01 9.480e+01 1.547e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-22 20:58:17,926 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314450 2023-11-22 20:58:29,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2096380.0, ans=0.025 2023-11-22 20:58:42,919 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1850, loss[loss=0.05111, simple_loss=0.06898, pruned_loss=0.006427, audio_tagging_loss=0.01019, over 14762.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09366, pruned_loss=0.01456, audio_tagging_loss=0.009236, over 3050944.59 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:58:56,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2096513.3333333333, ans=0.125 2023-11-22 20:59:19,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2096646.6666666667, ans=0.125 2023-11-22 20:59:22,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314500 2023-11-22 20:59:37,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2096713.3333333333, ans=0.0 2023-11-22 20:59:41,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2096713.3333333333, ans=0.0 2023-11-22 20:59:46,119 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1900, loss[loss=0.06307, simple_loss=0.08655, pruned_loss=0.01064, audio_tagging_loss=0.009146, over 15510.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09298, pruned_loss=0.01448, audio_tagging_loss=0.009228, over 3053024.40 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 20:59:55,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2096780.0, ans=0.1 2023-11-22 21:00:14,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2096913.3333333333, ans=0.1 2023-11-22 21:00:19,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 7.994e+01 8.463e+01 9.067e+01 1.382e+02, threshold=1.693e+02, percent-clipped=0.0 2023-11-22 21:00:26,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314550 2023-11-22 21:00:38,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2097046.6666666667, ans=0.125 2023-11-22 21:00:43,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2097046.6666666667, ans=0.125 2023-11-22 21:00:49,515 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 1950, loss[loss=0.0662, simple_loss=0.08991, pruned_loss=0.01347, audio_tagging_loss=0.007776, over 15391.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09252, pruned_loss=0.01423, audio_tagging_loss=0.009213, over 3053374.36 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:00:50,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2097113.3333333333, ans=0.1 2023-11-22 21:01:03,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2097180.0, ans=0.125 2023-11-22 21:01:23,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2097246.6666666665, ans=0.125 2023-11-22 21:01:28,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314600 2023-11-22 21:01:31,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2097313.3333333335, ans=0.2 2023-11-22 21:01:32,972 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:01:54,198 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2000, loss[loss=0.06856, simple_loss=0.09243, pruned_loss=0.01462, audio_tagging_loss=0.007731, over 13339.00 frames. ], tot_loss[loss=0.06993, simple_loss=0.09256, pruned_loss=0.01435, audio_tagging_loss=0.009293, over 3053409.68 frames. ], batch size: 52, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:02:04,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2097446.6666666665, ans=0.125 2023-11-22 21:02:07,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-22 21:02:25,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2097580.0, ans=0.125 2023-11-22 21:02:26,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.293e+01 8.864e+01 9.588e+01 1.359e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 21:02:33,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314650 2023-11-22 21:02:35,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2097646.6666666665, ans=0.125 2023-11-22 21:02:40,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2097646.6666666665, ans=0.125 2023-11-22 21:02:42,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2023-11-22 21:02:54,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2097713.3333333335, ans=0.5 2023-11-22 21:02:56,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2097780.0, ans=0.0 2023-11-22 21:02:57,053 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2050, loss[loss=0.08681, simple_loss=0.1122, pruned_loss=0.02126, audio_tagging_loss=0.009422, over 14897.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09285, pruned_loss=0.01451, audio_tagging_loss=0.009278, over 3053914.22 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:03:29,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2097913.3333333335, ans=0.1 2023-11-22 21:03:37,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314700 2023-11-22 21:03:38,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2097980.0, ans=0.125 2023-11-22 21:03:42,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2097980.0, ans=0.0 2023-11-22 21:03:48,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2098046.6666666665, ans=0.0 2023-11-22 21:03:49,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2098046.6666666665, ans=0.0 2023-11-22 21:04:00,576 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2100, loss[loss=0.04187, simple_loss=0.05308, pruned_loss=0.006969, audio_tagging_loss=0.008356, over 15211.00 frames. ], tot_loss[loss=0.06957, simple_loss=0.09219, pruned_loss=0.01425, audio_tagging_loss=0.009226, over 3051539.73 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:04:26,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-22 21:04:27,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2023-11-22 21:04:28,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2098246.6666666665, ans=0.2 2023-11-22 21:04:34,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.684e+01 9.132e+01 9.760e+01 1.245e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-22 21:04:40,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314750 2023-11-22 21:04:42,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2098313.3333333335, ans=0.0 2023-11-22 21:04:45,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2098313.3333333335, ans=0.09899494936611666 2023-11-22 21:04:57,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2098380.0, ans=0.0 2023-11-22 21:05:05,787 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2150, loss[loss=0.07542, simple_loss=0.1048, pruned_loss=0.01517, audio_tagging_loss=0.007873, over 15939.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09286, pruned_loss=0.01441, audio_tagging_loss=0.009166, over 3051286.89 frames. ], batch size: 57, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:05:23,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2098513.3333333335, ans=0.125 2023-11-22 21:05:35,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-22 21:05:36,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2023-11-22 21:05:36,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2098580.0, ans=0.125 2023-11-22 21:05:42,535 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:05:43,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314800 2023-11-22 21:05:49,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2098646.6666666665, ans=0.1 2023-11-22 21:06:08,663 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2200, loss[loss=0.0636, simple_loss=0.09206, pruned_loss=0.009668, audio_tagging_loss=0.007899, over 15219.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09164, pruned_loss=0.01423, audio_tagging_loss=0.009183, over 3052479.68 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:06:13,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2098780.0, ans=0.2 2023-11-22 21:06:24,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2098846.6666666665, ans=0.2 2023-11-22 21:06:42,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.202e+01 8.834e+01 9.532e+01 1.382e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-22 21:06:42,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2098913.3333333335, ans=0.125 2023-11-22 21:06:48,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314850 2023-11-22 21:07:00,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2099046.6666666665, ans=0.125 2023-11-22 21:07:11,838 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2250, loss[loss=0.08032, simple_loss=0.1095, pruned_loss=0.01881, audio_tagging_loss=0.006744, over 14139.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09268, pruned_loss=0.01439, audio_tagging_loss=0.009154, over 3045619.15 frames. ], batch size: 53, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:07:12,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2099113.3333333335, ans=0.2 2023-11-22 21:07:35,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2099180.0, ans=0.0 2023-11-22 21:07:51,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314900 2023-11-22 21:08:16,696 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2300, loss[loss=0.06715, simple_loss=0.08748, pruned_loss=0.01513, audio_tagging_loss=0.008279, over 15462.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.09247, pruned_loss=0.01446, audio_tagging_loss=0.009173, over 3043471.32 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 21:08:31,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2099513.3333333335, ans=0.0 2023-11-22 21:08:50,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.386e+01 8.086e+01 8.698e+01 9.161e+01 1.218e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-22 21:08:50,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2099580.0, ans=0.025 2023-11-22 21:08:55,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 314950 2023-11-22 21:08:58,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2099646.6666666665, ans=0.0 2023-11-22 21:09:01,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2099646.6666666665, ans=0.2 2023-11-22 21:09:13,523 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:09:15,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2099713.3333333335, ans=0.1 2023-11-22 21:09:20,817 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2350, loss[loss=0.07931, simple_loss=0.1157, pruned_loss=0.01249, audio_tagging_loss=0.008946, over 16180.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09197, pruned_loss=0.0143, audio_tagging_loss=0.009332, over 3046985.04 frames. ], batch size: 59, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 21:09:34,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2099846.6666666665, ans=0.1 2023-11-22 21:09:40,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2099846.6666666665, ans=0.0 2023-11-22 21:09:52,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2099913.3333333335, ans=0.0 2023-11-22 21:09:56,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-22 21:10:00,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315000 2023-11-22 21:10:10,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2099980.0, ans=0.2 2023-11-22 21:10:13,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2100046.6666666665, ans=0.125 2023-11-22 21:10:24,661 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2400, loss[loss=0.05718, simple_loss=0.06954, pruned_loss=0.01056, audio_tagging_loss=0.01185, over 16247.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09199, pruned_loss=0.01425, audio_tagging_loss=0.009328, over 3047671.70 frames. ], batch size: 62, lr: 2.54e-03, grad_scale: 32.0 2023-11-22 21:11:00,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.417e+01 9.057e+01 9.650e+01 1.239e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-22 21:11:04,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315050 2023-11-22 21:11:06,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2100313.3333333335, ans=0.0 2023-11-22 21:11:28,994 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2450, loss[loss=0.05997, simple_loss=0.07811, pruned_loss=0.01053, audio_tagging_loss=0.01039, over 16396.00 frames. ], tot_loss[loss=0.0699, simple_loss=0.09246, pruned_loss=0.01429, audio_tagging_loss=0.009378, over 3054761.49 frames. ], batch size: 62, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 21:11:42,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2100513.3333333335, ans=0.0 2023-11-22 21:11:47,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2100513.3333333335, ans=0.0 2023-11-22 21:12:08,056 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315100 2023-11-22 21:12:33,362 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2500, loss[loss=0.07856, simple_loss=0.1021, pruned_loss=0.01771, audio_tagging_loss=0.009813, over 14772.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09262, pruned_loss=0.01436, audio_tagging_loss=0.009473, over 3055203.85 frames. ], batch size: 54, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 21:12:45,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-22 21:13:08,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.214e+01 8.930e+01 9.863e+01 1.228e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 21:13:13,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315150 2023-11-22 21:13:28,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2101046.6666666665, ans=0.125 2023-11-22 21:13:36,994 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2550, loss[loss=0.06921, simple_loss=0.09875, pruned_loss=0.01269, audio_tagging_loss=0.007153, over 15654.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09233, pruned_loss=0.01439, audio_tagging_loss=0.009441, over 3051999.75 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 16.0 2023-11-22 21:13:41,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2023-11-22 21:13:54,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2101180.0, ans=0.125 2023-11-22 21:13:57,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2101180.0, ans=0.125 2023-11-22 21:13:57,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2101180.0, ans=0.125 2023-11-22 21:14:17,282 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315200 2023-11-22 21:14:23,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-22 21:14:36,009 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:14:40,763 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2600, loss[loss=0.05757, simple_loss=0.07084, pruned_loss=0.00981, audio_tagging_loss=0.01234, over 15455.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09252, pruned_loss=0.01447, audio_tagging_loss=0.009271, over 3048539.51 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:15:05,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2101513.3333333335, ans=0.1 2023-11-22 21:15:17,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.989e+01 8.300e+01 8.753e+01 9.403e+01 1.364e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-22 21:15:21,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315250 2023-11-22 21:15:46,458 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2650, loss[loss=0.06038, simple_loss=0.07642, pruned_loss=0.0128, audio_tagging_loss=0.00937, over 14722.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.09285, pruned_loss=0.01449, audio_tagging_loss=0.009188, over 3053218.52 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:15:47,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2023-11-22 21:16:18,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2101913.3333333335, ans=0.125 2023-11-22 21:16:26,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315300 2023-11-22 21:16:27,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2101980.0, ans=0.125 2023-11-22 21:16:27,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=15.0 2023-11-22 21:16:30,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2101980.0, ans=0.025 2023-11-22 21:16:47,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2102046.6666666665, ans=0.125 2023-11-22 21:16:50,638 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2700, loss[loss=0.07786, simple_loss=0.1073, pruned_loss=0.01817, audio_tagging_loss=0.006056, over 15090.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09234, pruned_loss=0.01439, audio_tagging_loss=0.009189, over 3050511.34 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:17:03,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2102180.0, ans=0.0 2023-11-22 21:17:04,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2102180.0, ans=0.0 2023-11-22 21:17:26,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.121e+01 8.686e+01 9.401e+01 1.913e+02, threshold=1.737e+02, percent-clipped=1.0 2023-11-22 21:17:30,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315350 2023-11-22 21:17:42,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2102380.0, ans=0.125 2023-11-22 21:17:53,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:17:54,235 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2750, loss[loss=0.0671, simple_loss=0.09687, pruned_loss=0.01078, audio_tagging_loss=0.00788, over 14525.00 frames. ], tot_loss[loss=0.06997, simple_loss=0.09268, pruned_loss=0.01443, audio_tagging_loss=0.009209, over 3044272.77 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:18:06,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2102446.6666666665, ans=0.125 2023-11-22 21:18:23,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2023-11-22 21:18:24,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2102580.0, ans=0.125 2023-11-22 21:18:34,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315400 2023-11-22 21:18:40,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-22 21:18:45,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2102713.3333333335, ans=0.05 2023-11-22 21:18:50,946 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:18:57,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2102713.3333333335, ans=0.1 2023-11-22 21:19:00,004 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2800, loss[loss=0.05666, simple_loss=0.07459, pruned_loss=0.007291, audio_tagging_loss=0.01207, over 15755.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09303, pruned_loss=0.01454, audio_tagging_loss=0.009234, over 3051249.54 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:19:10,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2102780.0, ans=0.2 2023-11-22 21:19:23,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2102846.6666666665, ans=0.125 2023-11-22 21:19:26,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2102913.3333333335, ans=0.125 2023-11-22 21:19:36,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.174e+01 8.854e+01 9.674e+01 2.365e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-22 21:19:40,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315450 2023-11-22 21:19:40,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:19:52,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2103046.6666666665, ans=0.125 2023-11-22 21:19:54,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2103046.6666666665, ans=0.125 2023-11-22 21:20:03,955 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2850, loss[loss=0.06153, simple_loss=0.09276, pruned_loss=0.009701, audio_tagging_loss=0.005444, over 15193.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.09171, pruned_loss=0.01429, audio_tagging_loss=0.009172, over 3051772.46 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:20:11,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2103113.3333333335, ans=0.0 2023-11-22 21:20:15,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2103180.0, ans=0.0 2023-11-22 21:20:44,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315500 2023-11-22 21:20:46,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2103313.3333333335, ans=0.125 2023-11-22 21:21:03,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2103380.0, ans=0.2 2023-11-22 21:21:08,307 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2900, loss[loss=0.07324, simple_loss=0.09212, pruned_loss=0.01878, audio_tagging_loss=0.008395, over 15121.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09209, pruned_loss=0.01456, audio_tagging_loss=0.009096, over 3055748.87 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:21:09,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2103446.6666666665, ans=0.125 2023-11-22 21:21:23,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2103513.3333333335, ans=15.0 2023-11-22 21:21:29,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:21:45,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.599e+01 8.319e+01 9.006e+01 9.775e+01 1.156e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 21:21:47,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2023-11-22 21:21:48,833 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315550 2023-11-22 21:21:56,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2103646.6666666665, ans=0.0 2023-11-22 21:22:13,396 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 2950, loss[loss=0.05741, simple_loss=0.07555, pruned_loss=0.009193, audio_tagging_loss=0.01044, over 15430.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09185, pruned_loss=0.01451, audio_tagging_loss=0.009179, over 3054698.35 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:22:26,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2103846.6666666665, ans=0.125 2023-11-22 21:22:30,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2103846.6666666665, ans=15.0 2023-11-22 21:22:30,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2103846.6666666665, ans=0.0 2023-11-22 21:22:53,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315600 2023-11-22 21:23:04,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-22 21:23:13,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-22 21:23:17,722 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3000, loss[loss=0.06092, simple_loss=0.075, pruned_loss=0.01092, audio_tagging_loss=0.01251, over 14914.00 frames. ], tot_loss[loss=0.07068, simple_loss=0.09317, pruned_loss=0.01487, audio_tagging_loss=0.00922, over 3058703.15 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:23:17,723 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 21:23:46,248 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1069, 3.0426, 3.2854, 2.9376, 3.7963, 3.8049, 3.3495, 3.1854], device='cuda:1') 2023-11-22 21:23:59,568 INFO [train_asr.py:1253] (1/4) Epoch 27, validation: loss=0.058, simple_loss=0.05133, pruned_loss=0.005079, audio_tagging_loss=0.02726, over 4681554.00 frames. 2023-11-22 21:23:59,569 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 21:24:01,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2104113.3333333335, ans=0.1 2023-11-22 21:24:05,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2104113.3333333335, ans=0.125 2023-11-22 21:24:06,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2104113.3333333335, ans=0.0 2023-11-22 21:24:06,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2023-11-22 21:24:07,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-22 21:24:15,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-22 21:24:36,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.281e+01 8.900e+01 9.558e+01 2.196e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-22 21:24:39,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315650 2023-11-22 21:24:45,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2104313.3333333335, ans=0.09899494936611666 2023-11-22 21:24:50,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2104380.0, ans=0.0 2023-11-22 21:24:52,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2104380.0, ans=0.125 2023-11-22 21:25:04,742 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3050, loss[loss=0.0726, simple_loss=0.08558, pruned_loss=0.01793, audio_tagging_loss=0.01188, over 14422.00 frames. ], tot_loss[loss=0.0712, simple_loss=0.09393, pruned_loss=0.01503, audio_tagging_loss=0.009203, over 3055540.69 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:25:30,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2104580.0, ans=0.125 2023-11-22 21:25:41,324 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:25:41,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2104646.6666666665, ans=0.1 2023-11-22 21:25:43,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315700 2023-11-22 21:26:07,718 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3100, loss[loss=0.06686, simple_loss=0.08544, pruned_loss=0.01252, audio_tagging_loss=0.01162, over 15155.00 frames. ], tot_loss[loss=0.0718, simple_loss=0.09437, pruned_loss=0.01529, audio_tagging_loss=0.009325, over 3051828.18 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:26:21,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-22 21:26:35,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2104913.3333333335, ans=0.1 2023-11-22 21:26:38,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2104913.3333333335, ans=0.125 2023-11-22 21:26:43,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2104913.3333333335, ans=0.125 2023-11-22 21:26:44,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.175e+01 8.886e+01 9.764e+01 1.279e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 21:26:46,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315750 2023-11-22 21:26:56,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2105046.6666666665, ans=0.2 2023-11-22 21:27:09,958 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3150, loss[loss=0.0807, simple_loss=0.1031, pruned_loss=0.01961, audio_tagging_loss=0.009546, over 15495.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.09444, pruned_loss=0.01517, audio_tagging_loss=0.009381, over 3050275.50 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:27:31,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-22 21:27:45,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2023-11-22 21:27:46,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2105246.6666666665, ans=0.0 2023-11-22 21:27:50,026 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315800 2023-11-22 21:28:16,011 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3200, loss[loss=0.09723, simple_loss=0.1284, pruned_loss=0.02162, audio_tagging_loss=0.0114, over 14991.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.09541, pruned_loss=0.01515, audio_tagging_loss=0.009448, over 3052243.15 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:28:18,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2105446.6666666665, ans=0.0 2023-11-22 21:28:26,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2105446.6666666665, ans=0.125 2023-11-22 21:28:40,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2105580.0, ans=0.125 2023-11-22 21:28:43,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-22 21:28:44,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2023-11-22 21:28:53,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.364e+01 8.971e+01 9.873e+01 1.649e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-22 21:28:55,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315850 2023-11-22 21:28:56,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2105646.6666666665, ans=10.0 2023-11-22 21:29:13,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2105713.3333333335, ans=0.125 2023-11-22 21:29:20,012 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3250, loss[loss=0.0725, simple_loss=0.1048, pruned_loss=0.01112, audio_tagging_loss=0.00898, over 16251.00 frames. ], tot_loss[loss=0.07198, simple_loss=0.09506, pruned_loss=0.01496, audio_tagging_loss=0.009494, over 3051423.45 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:29:20,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2105780.0, ans=0.125 2023-11-22 21:29:32,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2105846.6666666665, ans=0.125 2023-11-22 21:29:39,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-22 21:29:40,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2105846.6666666665, ans=0.1 2023-11-22 21:29:45,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-22 21:29:48,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2105913.3333333335, ans=0.2 2023-11-22 21:30:00,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315900 2023-11-22 21:30:03,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2105980.0, ans=0.125 2023-11-22 21:30:03,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2023-11-22 21:30:04,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-22 21:30:09,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2105980.0, ans=0.1 2023-11-22 21:30:15,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2106046.6666666665, ans=0.0 2023-11-22 21:30:20,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2106046.6666666665, ans=0.125 2023-11-22 21:30:24,045 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3300, loss[loss=0.08139, simple_loss=0.1137, pruned_loss=0.01682, audio_tagging_loss=0.007727, over 14377.00 frames. ], tot_loss[loss=0.07167, simple_loss=0.09423, pruned_loss=0.01493, audio_tagging_loss=0.009629, over 3055236.22 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:30:34,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2106113.3333333335, ans=0.125 2023-11-22 21:30:35,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2106113.3333333335, ans=0.125 2023-11-22 21:31:02,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.403e+01 9.072e+01 9.867e+01 1.352e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-22 21:31:03,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 315950 2023-11-22 21:31:04,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2106313.3333333335, ans=0.2 2023-11-22 21:31:06,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-22 21:31:07,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2106313.3333333335, ans=0.0 2023-11-22 21:31:12,827 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:31:23,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2106380.0, ans=0.125 2023-11-22 21:31:25,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2023-11-22 21:31:27,784 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3350, loss[loss=0.1049, simple_loss=0.1374, pruned_loss=0.02739, audio_tagging_loss=0.008805, over 15950.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.09414, pruned_loss=0.015, audio_tagging_loss=0.009472, over 3055377.80 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:31:57,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2106580.0, ans=0.0 2023-11-22 21:31:58,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2106580.0, ans=0.0 2023-11-22 21:32:07,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316000 2023-11-22 21:32:14,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2106646.6666666665, ans=0.125 2023-11-22 21:32:15,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2023-11-22 21:32:25,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-22 21:32:35,460 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3400, loss[loss=0.06562, simple_loss=0.08017, pruned_loss=0.01313, audio_tagging_loss=0.0124, over 14220.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09444, pruned_loss=0.01497, audio_tagging_loss=0.00931, over 3057446.52 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:32:40,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2106780.0, ans=0.0 2023-11-22 21:33:00,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2106913.3333333335, ans=0.125 2023-11-22 21:33:01,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2023-11-22 21:33:13,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2023-11-22 21:33:14,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.214e+01 8.729e+01 9.303e+01 1.198e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-22 21:33:15,711 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316050 2023-11-22 21:33:39,066 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3450, loss[loss=0.05005, simple_loss=0.05858, pruned_loss=0.01012, audio_tagging_loss=0.01063, over 15577.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09412, pruned_loss=0.01481, audio_tagging_loss=0.009207, over 3059850.33 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:33:44,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2107113.3333333335, ans=10.0 2023-11-22 21:34:02,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2107180.0, ans=0.0 2023-11-22 21:34:07,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2107246.6666666665, ans=0.0 2023-11-22 21:34:18,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316100 2023-11-22 21:34:21,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2107313.3333333335, ans=0.0 2023-11-22 21:34:23,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2107313.3333333335, ans=0.125 2023-11-22 21:34:33,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-22 21:34:43,092 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3500, loss[loss=0.07438, simple_loss=0.1037, pruned_loss=0.0157, audio_tagging_loss=0.006811, over 15893.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09412, pruned_loss=0.01482, audio_tagging_loss=0.009193, over 3058537.91 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:34:43,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2023-11-22 21:34:55,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2023-11-22 21:35:13,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2107580.0, ans=0.035 2023-11-22 21:35:14,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2107580.0, ans=0.0 2023-11-22 21:35:16,215 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:35:20,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.375e+01 8.106e+01 8.631e+01 9.458e+01 1.678e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-22 21:35:22,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316150 2023-11-22 21:35:47,501 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3550, loss[loss=0.05512, simple_loss=0.06704, pruned_loss=0.01019, audio_tagging_loss=0.01141, over 15175.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09414, pruned_loss=0.01481, audio_tagging_loss=0.009088, over 3059573.46 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:35:49,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-22 21:36:00,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2107846.6666666665, ans=0.09899494936611666 2023-11-22 21:36:03,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2107846.6666666665, ans=0.0 2023-11-22 21:36:26,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2107980.0, ans=0.025 2023-11-22 21:36:27,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316200 2023-11-22 21:36:31,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-11-22 21:36:51,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2023-11-22 21:36:51,768 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3600, loss[loss=0.1059, simple_loss=0.1466, pruned_loss=0.02555, audio_tagging_loss=0.007029, over 14771.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09468, pruned_loss=0.01489, audio_tagging_loss=0.00903, over 3055888.57 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:36:55,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2108113.3333333335, ans=0.0 2023-11-22 21:36:59,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2108113.3333333335, ans=0.0 2023-11-22 21:37:22,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2108246.6666666665, ans=0.125 2023-11-22 21:37:30,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.329e+01 8.258e+01 8.715e+01 9.542e+01 1.505e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-22 21:37:31,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316250 2023-11-22 21:37:31,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2108313.3333333335, ans=0.125 2023-11-22 21:37:37,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2108313.3333333335, ans=0.125 2023-11-22 21:37:41,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2108380.0, ans=0.125 2023-11-22 21:37:56,481 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3650, loss[loss=0.07931, simple_loss=0.098, pruned_loss=0.0206, audio_tagging_loss=0.009716, over 15291.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09382, pruned_loss=0.01475, audio_tagging_loss=0.009072, over 3051446.25 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:38:02,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-22 21:38:04,790 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:38:21,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2108580.0, ans=0.125 2023-11-22 21:38:28,767 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:38:32,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2108580.0, ans=0.125 2023-11-22 21:38:34,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2108646.6666666665, ans=0.1 2023-11-22 21:38:35,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316300 2023-11-22 21:38:50,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2108713.3333333335, ans=0.2 2023-11-22 21:38:59,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2108780.0, ans=0.05 2023-11-22 21:39:00,750 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3700, loss[loss=0.06049, simple_loss=0.07764, pruned_loss=0.01197, audio_tagging_loss=0.009704, over 16418.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09268, pruned_loss=0.01465, audio_tagging_loss=0.009194, over 3047515.49 frames. ], batch size: 62, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:39:02,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2108780.0, ans=0.125 2023-11-22 21:39:08,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-22 21:39:32,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2023-11-22 21:39:40,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.416e+01 9.143e+01 1.024e+02 1.510e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-22 21:39:40,634 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316350 2023-11-22 21:39:45,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2108980.0, ans=0.0 2023-11-22 21:39:46,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2023-11-22 21:39:47,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2108980.0, ans=0.125 2023-11-22 21:40:05,175 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3750, loss[loss=0.05709, simple_loss=0.07857, pruned_loss=0.008481, audio_tagging_loss=0.009328, over 14981.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.09338, pruned_loss=0.01465, audio_tagging_loss=0.00917, over 3054236.38 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:40:24,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-11-22 21:40:28,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-11-22 21:40:34,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2109246.6666666665, ans=0.035 2023-11-22 21:40:37,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-22 21:40:37,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2109246.6666666665, ans=0.125 2023-11-22 21:40:45,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316400 2023-11-22 21:40:45,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-22 21:40:50,968 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:40:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2109313.3333333335, ans=0.125 2023-11-22 21:40:59,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2109380.0, ans=0.125 2023-11-22 21:41:08,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2109380.0, ans=0.2 2023-11-22 21:41:10,540 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3800, loss[loss=0.05956, simple_loss=0.07323, pruned_loss=0.01214, audio_tagging_loss=0.01081, over 14343.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09321, pruned_loss=0.01448, audio_tagging_loss=0.009254, over 3059338.21 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:41:14,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2109446.6666666665, ans=0.0 2023-11-22 21:41:26,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2109513.3333333335, ans=0.125 2023-11-22 21:41:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2109580.0, ans=0.125 2023-11-22 21:41:50,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.525e+01 8.963e+01 9.873e+01 1.311e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-22 21:41:50,271 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316450 2023-11-22 21:41:50,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2109646.6666666665, ans=0.125 2023-11-22 21:42:03,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2109713.3333333335, ans=0.2 2023-11-22 21:42:03,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-22 21:42:12,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-22 21:42:14,505 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3850, loss[loss=0.06977, simple_loss=0.08634, pruned_loss=0.01659, audio_tagging_loss=0.01001, over 14648.00 frames. ], tot_loss[loss=0.07018, simple_loss=0.09277, pruned_loss=0.01449, audio_tagging_loss=0.009313, over 3053711.56 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:42:41,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2109913.3333333335, ans=0.125 2023-11-22 21:42:48,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-11-22 21:42:53,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316500 2023-11-22 21:42:55,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2109980.0, ans=0.125 2023-11-22 21:43:00,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2109980.0, ans=0.1 2023-11-22 21:43:16,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2110113.3333333335, ans=0.015 2023-11-22 21:43:17,578 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3900, loss[loss=0.07641, simple_loss=0.1009, pruned_loss=0.01589, audio_tagging_loss=0.01007, over 15599.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09283, pruned_loss=0.01452, audio_tagging_loss=0.009329, over 3053587.36 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:43:18,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-22 21:43:20,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2110113.3333333335, ans=0.125 2023-11-22 21:43:22,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2110113.3333333335, ans=0.125 2023-11-22 21:43:32,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110180.0, ans=0.1 2023-11-22 21:43:33,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2110180.0, ans=0.0 2023-11-22 21:43:36,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-22 21:43:38,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2110180.0, ans=0.0 2023-11-22 21:43:43,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2110246.6666666665, ans=0.0 2023-11-22 21:43:57,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.359e+01 8.893e+01 9.456e+01 1.423e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-22 21:43:57,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316550 2023-11-22 21:43:57,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-22 21:44:10,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2110380.0, ans=0.125 2023-11-22 21:44:21,477 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 3950, loss[loss=0.09005, simple_loss=0.1096, pruned_loss=0.02282, audio_tagging_loss=0.01245, over 15208.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09231, pruned_loss=0.01436, audio_tagging_loss=0.00948, over 3046557.40 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:44:26,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2110446.6666666665, ans=0.125 2023-11-22 21:44:29,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2110446.6666666665, ans=0.1 2023-11-22 21:44:40,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2110513.3333333335, ans=0.125 2023-11-22 21:44:43,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2110513.3333333335, ans=0.125 2023-11-22 21:44:44,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2110513.3333333335, ans=0.125 2023-11-22 21:44:50,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2110580.0, ans=0.125 2023-11-22 21:45:00,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316600 2023-11-22 21:45:00,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2110646.6666666665, ans=0.125 2023-11-22 21:45:25,356 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4000, loss[loss=0.0671, simple_loss=0.08828, pruned_loss=0.01313, audio_tagging_loss=0.009829, over 16861.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09359, pruned_loss=0.0146, audio_tagging_loss=0.00937, over 3051908.45 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:45:32,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2023-11-22 21:45:34,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2110780.0, ans=0.125 2023-11-22 21:45:40,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2110846.6666666665, ans=0.2 2023-11-22 21:45:45,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2110846.6666666665, ans=0.125 2023-11-22 21:45:46,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2110846.6666666665, ans=0.5 2023-11-22 21:46:00,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110913.3333333335, ans=0.1 2023-11-22 21:46:02,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2110980.0, ans=0.0 2023-11-22 21:46:02,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2023-11-22 21:46:04,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.309e+01 8.921e+01 9.654e+01 1.157e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-22 21:46:04,445 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316650 2023-11-22 21:46:13,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-22 21:46:13,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2110980.0, ans=0.125 2023-11-22 21:46:21,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2111046.6666666665, ans=0.07 2023-11-22 21:46:25,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2111046.6666666665, ans=0.2 2023-11-22 21:46:28,455 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4050, loss[loss=0.06516, simple_loss=0.08683, pruned_loss=0.01368, audio_tagging_loss=0.008062, over 15567.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09399, pruned_loss=0.01463, audio_tagging_loss=0.009461, over 3050660.52 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:46:31,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2111113.3333333335, ans=0.0 2023-11-22 21:46:32,065 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:46:51,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2111180.0, ans=0.125 2023-11-22 21:47:01,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2111246.6666666665, ans=0.0 2023-11-22 21:47:07,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316700 2023-11-22 21:47:17,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2111380.0, ans=0.0 2023-11-22 21:47:31,752 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4100, loss[loss=0.05583, simple_loss=0.07406, pruned_loss=0.008397, audio_tagging_loss=0.0104, over 16106.00 frames. ], tot_loss[loss=0.07159, simple_loss=0.0949, pruned_loss=0.01476, audio_tagging_loss=0.009378, over 3050733.60 frames. ], batch size: 62, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:47:40,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2111446.6666666665, ans=0.0 2023-11-22 21:47:40,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2111446.6666666665, ans=0.125 2023-11-22 21:47:41,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-22 21:48:10,807 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.312e+01 9.101e+01 1.011e+02 1.374e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-22 21:48:10,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316750 2023-11-22 21:48:32,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2111713.3333333335, ans=0.125 2023-11-22 21:48:36,916 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4150, loss[loss=0.08221, simple_loss=0.109, pruned_loss=0.01878, audio_tagging_loss=0.008922, over 15141.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09382, pruned_loss=0.01462, audio_tagging_loss=0.009301, over 3047428.41 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:48:38,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-22 21:48:52,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2111846.6666666665, ans=0.0 2023-11-22 21:49:16,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316800 2023-11-22 21:49:23,167 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:49:23,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2111980.0, ans=0.015 2023-11-22 21:49:26,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2111980.0, ans=0.025 2023-11-22 21:49:26,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2111980.0, ans=0.125 2023-11-22 21:49:40,866 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4200, loss[loss=0.03273, simple_loss=0.03994, pruned_loss=0.002152, audio_tagging_loss=0.01061, over 16299.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09302, pruned_loss=0.01442, audio_tagging_loss=0.009212, over 3047933.96 frames. ], batch size: 65, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:49:42,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2112113.3333333335, ans=0.05 2023-11-22 21:49:47,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-22 21:49:57,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2112180.0, ans=15.0 2023-11-22 21:50:02,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2112180.0, ans=0.0 2023-11-22 21:50:15,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2112246.6666666665, ans=0.125 2023-11-22 21:50:19,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2023-11-22 21:50:20,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.143e+01 8.631e+01 9.392e+01 1.233e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-22 21:50:20,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316850 2023-11-22 21:50:20,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2112313.3333333335, ans=0.125 2023-11-22 21:50:23,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2112313.3333333335, ans=0.0 2023-11-22 21:50:43,599 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4250, loss[loss=0.08669, simple_loss=0.1167, pruned_loss=0.01849, audio_tagging_loss=0.009857, over 15114.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.09405, pruned_loss=0.01441, audio_tagging_loss=0.009053, over 3049923.45 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:50:58,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2112513.3333333335, ans=0.07 2023-11-22 21:50:59,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2112513.3333333335, ans=0.125 2023-11-22 21:51:10,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2112580.0, ans=0.2 2023-11-22 21:51:15,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-22 21:51:16,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2112580.0, ans=0.0 2023-11-22 21:51:23,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316900 2023-11-22 21:51:33,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2112713.3333333335, ans=0.04949747468305833 2023-11-22 21:51:42,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-22 21:51:42,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2112713.3333333335, ans=0.125 2023-11-22 21:51:47,701 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4300, loss[loss=0.08842, simple_loss=0.1219, pruned_loss=0.0201, audio_tagging_loss=0.007389, over 15623.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09384, pruned_loss=0.01443, audio_tagging_loss=0.009027, over 3048990.80 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:51:54,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2112780.0, ans=0.125 2023-11-22 21:51:55,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2112780.0, ans=0.125 2023-11-22 21:52:10,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2112846.6666666665, ans=0.125 2023-11-22 21:52:26,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.241e+01 8.941e+01 9.538e+01 1.182e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-22 21:52:26,588 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 316950 2023-11-22 21:52:39,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2113046.6666666665, ans=0.0 2023-11-22 21:52:40,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-22 21:52:51,083 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4350, loss[loss=0.06526, simple_loss=0.09468, pruned_loss=0.01036, audio_tagging_loss=0.007553, over 15461.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09404, pruned_loss=0.01438, audio_tagging_loss=0.009066, over 3046452.35 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:52:53,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-22 21:52:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2113113.3333333335, ans=0.125 2023-11-22 21:52:58,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2113113.3333333335, ans=0.0 2023-11-22 21:53:01,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2113113.3333333335, ans=0.1 2023-11-22 21:53:06,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2113180.0, ans=0.0 2023-11-22 21:53:06,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2113180.0, ans=0.125 2023-11-22 21:53:09,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2113180.0, ans=0.2 2023-11-22 21:53:12,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2113180.0, ans=0.125 2023-11-22 21:53:28,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2113246.6666666665, ans=0.125 2023-11-22 21:53:31,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317000 2023-11-22 21:53:42,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2113380.0, ans=0.0 2023-11-22 21:53:55,216 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4400, loss[loss=0.08326, simple_loss=0.1103, pruned_loss=0.01699, audio_tagging_loss=0.0111, over 15799.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09505, pruned_loss=0.01443, audio_tagging_loss=0.009048, over 3040004.26 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:53:55,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2113446.6666666665, ans=0.125 2023-11-22 21:53:57,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2113446.6666666665, ans=0.125 2023-11-22 21:54:15,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-11-22 21:54:19,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-22 21:54:20,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2113580.0, ans=0.1 2023-11-22 21:54:34,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.698e+01 8.507e+01 9.216e+01 9.885e+01 1.286e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-22 21:54:35,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317050 2023-11-22 21:54:45,111 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:54:48,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:54:52,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2113713.3333333335, ans=0.125 2023-11-22 21:54:54,821 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:54:59,485 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4450, loss[loss=0.08053, simple_loss=0.1039, pruned_loss=0.01945, audio_tagging_loss=0.009136, over 16088.00 frames. ], tot_loss[loss=0.07122, simple_loss=0.09532, pruned_loss=0.01457, audio_tagging_loss=0.008988, over 3045688.35 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:55:05,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2113780.0, ans=0.1 2023-11-22 21:55:06,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2113780.0, ans=0.0 2023-11-22 21:55:14,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2113846.6666666665, ans=0.125 2023-11-22 21:55:39,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317100 2023-11-22 21:55:42,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2113980.0, ans=0.0 2023-11-22 21:55:53,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2114046.6666666665, ans=0.0 2023-11-22 21:55:59,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:56:04,066 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4500, loss[loss=0.08053, simple_loss=0.1128, pruned_loss=0.01556, audio_tagging_loss=0.008548, over 15849.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09436, pruned_loss=0.01441, audio_tagging_loss=0.008965, over 3047638.12 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:56:44,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.228e+01 8.995e+01 9.708e+01 1.219e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 21:56:45,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317150 2023-11-22 21:56:46,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-22 21:57:01,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2114380.0, ans=0.0 2023-11-22 21:57:08,294 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4550, loss[loss=0.1001, simple_loss=0.1411, pruned_loss=0.02328, audio_tagging_loss=0.006259, over 15761.00 frames. ], tot_loss[loss=0.07092, simple_loss=0.09477, pruned_loss=0.01447, audio_tagging_loss=0.009063, over 3042838.15 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 21:57:29,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-22 21:57:48,850 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317200 2023-11-22 21:57:50,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-22 21:57:57,939 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 21:57:58,096 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 21:58:00,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2114713.3333333335, ans=0.125 2023-11-22 21:58:13,824 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4600, loss[loss=0.06703, simple_loss=0.09119, pruned_loss=0.0123, audio_tagging_loss=0.009134, over 14414.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.09268, pruned_loss=0.01428, audio_tagging_loss=0.009246, over 3043337.24 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:58:46,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2114913.3333333335, ans=0.2 2023-11-22 21:58:53,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317250 2023-11-22 21:58:54,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.032e+01 8.605e+01 9.322e+01 1.270e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-22 21:59:02,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2114980.0, ans=0.1 2023-11-22 21:59:05,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2115046.6666666665, ans=0.125 2023-11-22 21:59:18,579 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4650, loss[loss=0.06488, simple_loss=0.0746, pruned_loss=0.01319, audio_tagging_loss=0.01439, over 14738.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09228, pruned_loss=0.01421, audio_tagging_loss=0.009311, over 3037314.14 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 21:59:27,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2115113.3333333335, ans=0.2 2023-11-22 21:59:55,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2115313.3333333335, ans=0.1 2023-11-22 21:59:57,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317300 2023-11-22 22:00:00,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2115313.3333333335, ans=0.1 2023-11-22 22:00:04,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2023-11-22 22:00:09,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2115380.0, ans=0.2 2023-11-22 22:00:15,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2115380.0, ans=0.125 2023-11-22 22:00:20,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2115446.6666666665, ans=0.07 2023-11-22 22:00:21,163 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4700, loss[loss=0.0868, simple_loss=0.1169, pruned_loss=0.0194, audio_tagging_loss=0.008936, over 15834.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09223, pruned_loss=0.0142, audio_tagging_loss=0.009406, over 3043809.86 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:00:27,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2115446.6666666665, ans=0.125 2023-11-22 22:00:35,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2115513.3333333335, ans=0.0 2023-11-22 22:00:57,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2115580.0, ans=0.0 2023-11-22 22:01:00,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317350 2023-11-22 22:01:02,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.187e+01 8.736e+01 9.527e+01 1.154e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 22:01:03,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2115646.6666666665, ans=0.0 2023-11-22 22:01:14,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2115713.3333333335, ans=0.1 2023-11-22 22:01:19,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2115713.3333333335, ans=0.1 2023-11-22 22:01:24,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2115780.0, ans=0.0 2023-11-22 22:01:25,034 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4750, loss[loss=0.06631, simple_loss=0.08071, pruned_loss=0.01268, audio_tagging_loss=0.01328, over 14159.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09235, pruned_loss=0.01439, audio_tagging_loss=0.009466, over 3042938.30 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:01:38,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2115846.6666666665, ans=0.0 2023-11-22 22:01:47,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2115846.6666666665, ans=0.0 2023-11-22 22:01:58,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2115913.3333333335, ans=0.0 2023-11-22 22:02:03,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317400 2023-11-22 22:02:07,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-22 22:02:29,389 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4800, loss[loss=0.07471, simple_loss=0.09855, pruned_loss=0.01676, audio_tagging_loss=0.008675, over 15426.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.0928, pruned_loss=0.01438, audio_tagging_loss=0.009615, over 3042300.56 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 22:02:47,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2116180.0, ans=0.0 2023-11-22 22:03:07,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2116313.3333333335, ans=0.035 2023-11-22 22:03:09,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317450 2023-11-22 22:03:10,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.616e+01 8.423e+01 9.043e+01 9.752e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-22 22:03:10,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2116313.3333333335, ans=0.1 2023-11-22 22:03:16,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2116313.3333333335, ans=0.125 2023-11-22 22:03:20,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2116380.0, ans=0.125 2023-11-22 22:03:33,796 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4850, loss[loss=0.05837, simple_loss=0.07526, pruned_loss=0.01096, audio_tagging_loss=0.009784, over 14106.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09164, pruned_loss=0.01407, audio_tagging_loss=0.009696, over 3048523.38 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 32.0 2023-11-22 22:03:38,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-22 22:03:38,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2023-11-22 22:03:43,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2116446.6666666665, ans=0.125 2023-11-22 22:03:52,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2116513.3333333335, ans=0.0 2023-11-22 22:04:13,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317500 2023-11-22 22:04:21,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2116646.6666666665, ans=0.125 2023-11-22 22:04:31,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2116713.3333333335, ans=0.0 2023-11-22 22:04:38,119 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4900, loss[loss=0.06829, simple_loss=0.08549, pruned_loss=0.01417, audio_tagging_loss=0.01138, over 15786.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09238, pruned_loss=0.01432, audio_tagging_loss=0.009628, over 3052221.19 frames. ], batch size: 62, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:04:53,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-22 22:05:18,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317550 2023-11-22 22:05:20,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.184e+01 8.772e+01 9.580e+01 1.151e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-22 22:05:30,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2117046.6666666665, ans=0.0 2023-11-22 22:05:34,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2117046.6666666665, ans=0.025 2023-11-22 22:05:43,001 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 4950, loss[loss=0.04876, simple_loss=0.05776, pruned_loss=0.01051, audio_tagging_loss=0.009364, over 15759.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09195, pruned_loss=0.01425, audio_tagging_loss=0.009495, over 3049996.10 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:05:46,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117113.3333333335, ans=0.1 2023-11-22 22:05:54,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-11-22 22:06:07,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2023-11-22 22:06:12,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2117246.6666666665, ans=10.0 2023-11-22 22:06:13,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2117246.6666666665, ans=0.1 2023-11-22 22:06:22,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317600 2023-11-22 22:06:38,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2117380.0, ans=0.2 2023-11-22 22:06:46,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2117446.6666666665, ans=0.0 2023-11-22 22:06:46,915 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5000, loss[loss=0.09434, simple_loss=0.1262, pruned_loss=0.0224, audio_tagging_loss=0.008837, over 15033.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09204, pruned_loss=0.01427, audio_tagging_loss=0.009405, over 3043775.68 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:07:07,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2117513.3333333335, ans=0.2 2023-11-22 22:07:27,298 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317650 2023-11-22 22:07:28,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2117646.6666666665, ans=0.125 2023-11-22 22:07:29,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.065e+01 8.747e+01 9.520e+01 1.397e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 22:07:32,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2117646.6666666665, ans=0.1 2023-11-22 22:07:47,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-22 22:07:51,027 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5050, loss[loss=0.08707, simple_loss=0.122, pruned_loss=0.01917, audio_tagging_loss=0.006904, over 15486.00 frames. ], tot_loss[loss=0.0696, simple_loss=0.09199, pruned_loss=0.01431, audio_tagging_loss=0.009292, over 3044161.64 frames. ], batch size: 58, lr: 2.53e-03, grad_scale: 16.0 2023-11-22 22:08:31,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317700 2023-11-22 22:08:32,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2117980.0, ans=0.125 2023-11-22 22:08:42,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2118046.6666666665, ans=0.125 2023-11-22 22:08:42,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-22 22:08:50,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2118046.6666666665, ans=0.125 2023-11-22 22:08:56,509 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5100, loss[loss=0.04437, simple_loss=0.0514, pruned_loss=0.008009, audio_tagging_loss=0.01066, over 14743.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09159, pruned_loss=0.01416, audio_tagging_loss=0.009237, over 3038251.78 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:08:58,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2118113.3333333335, ans=0.07 2023-11-22 22:09:09,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2118180.0, ans=0.1 2023-11-22 22:09:11,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2118180.0, ans=0.1 2023-11-22 22:09:18,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2118180.0, ans=0.125 2023-11-22 22:09:26,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2118246.6666666665, ans=0.0 2023-11-22 22:09:36,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317750 2023-11-22 22:09:38,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.209e+01 8.781e+01 9.360e+01 1.201e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 22:09:39,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2118313.3333333335, ans=0.0 2023-11-22 22:09:51,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2118380.0, ans=0.2 2023-11-22 22:09:53,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-22 22:10:00,177 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5150, loss[loss=0.06676, simple_loss=0.08059, pruned_loss=0.01703, audio_tagging_loss=0.009444, over 14433.00 frames. ], tot_loss[loss=0.06879, simple_loss=0.09083, pruned_loss=0.0141, audio_tagging_loss=0.009271, over 3039031.15 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:10:17,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2118513.3333333335, ans=0.125 2023-11-22 22:10:23,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-22 22:10:40,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317800 2023-11-22 22:10:43,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2118646.6666666665, ans=0.125 2023-11-22 22:10:54,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2118713.3333333335, ans=0.2 2023-11-22 22:10:59,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2118713.3333333335, ans=0.5 2023-11-22 22:11:04,852 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5200, loss[loss=0.04631, simple_loss=0.05716, pruned_loss=0.007373, audio_tagging_loss=0.01035, over 15494.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09167, pruned_loss=0.01419, audio_tagging_loss=0.009258, over 3043721.61 frames. ], batch size: 60, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:11:27,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2118846.6666666665, ans=0.07 2023-11-22 22:11:43,254 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:11:44,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317850 2023-11-22 22:11:44,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2118980.0, ans=0.125 2023-11-22 22:11:46,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.652e+01 9.262e+01 9.962e+01 1.336e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-22 22:12:09,253 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5250, loss[loss=0.07489, simple_loss=0.107, pruned_loss=0.01234, audio_tagging_loss=0.009045, over 15403.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09228, pruned_loss=0.01434, audio_tagging_loss=0.009237, over 3045459.68 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:12:22,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2119180.0, ans=0.1 2023-11-22 22:12:27,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2119180.0, ans=0.125 2023-11-22 22:12:28,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2119180.0, ans=0.0 2023-11-22 22:12:32,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2119180.0, ans=0.1 2023-11-22 22:12:36,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-22 22:12:48,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317900 2023-11-22 22:12:57,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2119313.3333333335, ans=0.125 2023-11-22 22:13:12,500 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5300, loss[loss=0.06706, simple_loss=0.09707, pruned_loss=0.01265, audio_tagging_loss=0.00587, over 15672.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.09342, pruned_loss=0.01453, audio_tagging_loss=0.009164, over 3049271.96 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:13:12,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2119446.6666666665, ans=0.05 2023-11-22 22:13:53,105 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 317950 2023-11-22 22:13:56,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.353e+01 8.983e+01 9.482e+01 1.198e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 22:14:16,242 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5350, loss[loss=0.05251, simple_loss=0.06547, pruned_loss=0.01249, audio_tagging_loss=0.007279, over 14189.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09461, pruned_loss=0.01473, audio_tagging_loss=0.008986, over 3047116.22 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:14:28,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2023-11-22 22:14:52,750 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:14:56,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318000 2023-11-22 22:15:21,779 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5400, loss[loss=0.06695, simple_loss=0.08925, pruned_loss=0.01248, audio_tagging_loss=0.009852, over 14729.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09329, pruned_loss=0.01454, audio_tagging_loss=0.009091, over 3047319.98 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:15:26,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2120113.3333333335, ans=0.0 2023-11-22 22:15:28,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2120113.3333333335, ans=0.125 2023-11-22 22:15:35,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2120180.0, ans=0.125 2023-11-22 22:15:36,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2120180.0, ans=0.125 2023-11-22 22:16:00,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318050 2023-11-22 22:16:01,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-22 22:16:04,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.392e+01 8.997e+01 9.617e+01 1.276e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 22:16:06,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2120313.3333333335, ans=0.1 2023-11-22 22:16:06,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-11-22 22:16:25,801 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5450, loss[loss=0.08771, simple_loss=0.1139, pruned_loss=0.02282, audio_tagging_loss=0.007967, over 15757.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.09387, pruned_loss=0.01473, audio_tagging_loss=0.009098, over 3051003.10 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:17:06,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318100 2023-11-22 22:17:29,792 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5500, loss[loss=0.0586, simple_loss=0.07206, pruned_loss=0.01331, audio_tagging_loss=0.009263, over 15532.00 frames. ], tot_loss[loss=0.07048, simple_loss=0.0934, pruned_loss=0.01463, audio_tagging_loss=0.009143, over 3051254.92 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:17:34,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2120780.0, ans=0.1 2023-11-22 22:17:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2120846.6666666665, ans=0.125 2023-11-22 22:18:04,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2120913.3333333335, ans=0.2 2023-11-22 22:18:05,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2120913.3333333335, ans=0.125 2023-11-22 22:18:08,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2120980.0, ans=0.125 2023-11-22 22:18:09,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318150 2023-11-22 22:18:13,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.175e+01 8.992e+01 9.514e+01 1.142e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-22 22:18:14,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-22 22:18:17,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2120980.0, ans=0.1 2023-11-22 22:18:26,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2023-11-22 22:18:33,988 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5550, loss[loss=0.0995, simple_loss=0.1237, pruned_loss=0.02835, audio_tagging_loss=0.009295, over 14089.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09235, pruned_loss=0.01438, audio_tagging_loss=0.009321, over 3048356.35 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:18:45,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2121113.3333333335, ans=0.0 2023-11-22 22:18:56,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2121180.0, ans=0.125 2023-11-22 22:19:03,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2121246.6666666665, ans=0.2 2023-11-22 22:19:03,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2121246.6666666665, ans=0.2 2023-11-22 22:19:13,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318200 2023-11-22 22:19:22,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-22 22:19:23,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2121313.3333333335, ans=0.2 2023-11-22 22:19:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2121380.0, ans=0.125 2023-11-22 22:19:39,028 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5600, loss[loss=0.05111, simple_loss=0.06069, pruned_loss=0.007564, audio_tagging_loss=0.0132, over 15879.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.093, pruned_loss=0.01453, audio_tagging_loss=0.009371, over 3051553.99 frames. ], batch size: 63, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:19:52,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2121513.3333333335, ans=0.04949747468305833 2023-11-22 22:20:18,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318250 2023-11-22 22:20:19,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2121646.6666666665, ans=0.0 2023-11-22 22:20:22,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.364e+01 9.057e+01 9.975e+01 1.357e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-22 22:20:25,265 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 22:20:42,424 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5650, loss[loss=0.08917, simple_loss=0.1154, pruned_loss=0.02297, audio_tagging_loss=0.008501, over 15532.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09359, pruned_loss=0.01474, audio_tagging_loss=0.009418, over 3052151.55 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:20:45,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2121780.0, ans=0.125 2023-11-22 22:20:46,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-22 22:20:47,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2121780.0, ans=0.1 2023-11-22 22:20:47,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2121780.0, ans=0.0 2023-11-22 22:20:55,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2121846.6666666665, ans=0.1 2023-11-22 22:21:04,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-22 22:21:07,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2121913.3333333335, ans=0.125 2023-11-22 22:21:22,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318300 2023-11-22 22:21:22,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2121980.0, ans=0.0 2023-11-22 22:21:24,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2121980.0, ans=0.125 2023-11-22 22:21:32,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-22 22:21:35,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2122046.6666666665, ans=0.125 2023-11-22 22:21:38,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2122046.6666666665, ans=0.0 2023-11-22 22:21:39,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2122046.6666666665, ans=0.0 2023-11-22 22:21:43,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2122046.6666666665, ans=0.0 2023-11-22 22:21:46,024 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5700, loss[loss=0.06646, simple_loss=0.08982, pruned_loss=0.0135, audio_tagging_loss=0.008052, over 15372.00 frames. ], tot_loss[loss=0.07089, simple_loss=0.0932, pruned_loss=0.01487, audio_tagging_loss=0.009416, over 3054085.59 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:22:22,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2122246.6666666665, ans=0.125 2023-11-22 22:22:25,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318350 2023-11-22 22:22:28,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2122313.3333333335, ans=0.125 2023-11-22 22:22:28,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-22 22:22:29,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.246e+01 8.801e+01 9.549e+01 1.194e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-22 22:22:40,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2122380.0, ans=0.0 2023-11-22 22:22:50,409 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5750, loss[loss=0.1022, simple_loss=0.1439, pruned_loss=0.02586, audio_tagging_loss=0.004387, over 15088.00 frames. ], tot_loss[loss=0.07105, simple_loss=0.09365, pruned_loss=0.01492, audio_tagging_loss=0.009306, over 3056501.90 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:22:52,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2122446.6666666665, ans=0.125 2023-11-22 22:22:56,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=22.5 2023-11-22 22:23:01,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2122446.6666666665, ans=12.0 2023-11-22 22:23:24,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2023-11-22 22:23:25,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2122580.0, ans=0.0 2023-11-22 22:23:28,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2122646.6666666665, ans=0.1 2023-11-22 22:23:29,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318400 2023-11-22 22:23:39,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=22.5 2023-11-22 22:23:43,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2122713.3333333335, ans=0.1 2023-11-22 22:23:52,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=22.5 2023-11-22 22:23:53,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2122780.0, ans=0.125 2023-11-22 22:23:54,259 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5800, loss[loss=0.07638, simple_loss=0.1066, pruned_loss=0.0144, audio_tagging_loss=0.008663, over 14680.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.09298, pruned_loss=0.0148, audio_tagging_loss=0.009215, over 3052976.02 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:24:05,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2122846.6666666665, ans=0.125 2023-11-22 22:24:20,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2122913.3333333335, ans=0.2 2023-11-22 22:24:34,766 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318450 2023-11-22 22:24:36,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2122980.0, ans=0.0 2023-11-22 22:24:36,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-22 22:24:38,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.593e+01 8.198e+01 8.735e+01 9.439e+01 1.113e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-22 22:24:46,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-22 22:24:53,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2123046.6666666665, ans=0.125 2023-11-22 22:24:58,469 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5850, loss[loss=0.06793, simple_loss=0.09173, pruned_loss=0.01359, audio_tagging_loss=0.008473, over 15022.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.09329, pruned_loss=0.01473, audio_tagging_loss=0.009225, over 3053740.70 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:25:23,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2123246.6666666665, ans=0.0 2023-11-22 22:25:38,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318500 2023-11-22 22:25:56,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2123380.0, ans=0.0 2023-11-22 22:26:03,124 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5900, loss[loss=0.07718, simple_loss=0.1057, pruned_loss=0.01841, audio_tagging_loss=0.005946, over 15181.00 frames. ], tot_loss[loss=0.07135, simple_loss=0.09435, pruned_loss=0.015, audio_tagging_loss=0.009182, over 3049848.14 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 22:26:11,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2123446.6666666665, ans=0.125 2023-11-22 22:26:15,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=22.5 2023-11-22 22:26:42,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318550 2023-11-22 22:26:43,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2123646.6666666665, ans=0.05 2023-11-22 22:26:48,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.253e+01 8.842e+01 9.499e+01 1.797e+02, threshold=1.768e+02, percent-clipped=1.0 2023-11-22 22:26:54,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2123713.3333333335, ans=0.125 2023-11-22 22:26:54,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2123713.3333333335, ans=0.125 2023-11-22 22:27:06,795 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 5950, loss[loss=0.05897, simple_loss=0.07186, pruned_loss=0.01103, audio_tagging_loss=0.01201, over 14793.00 frames. ], tot_loss[loss=0.0705, simple_loss=0.09327, pruned_loss=0.01467, audio_tagging_loss=0.009187, over 3049693.16 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 22:27:35,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2123913.3333333335, ans=0.125 2023-11-22 22:27:46,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318600 2023-11-22 22:27:48,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2123980.0, ans=0.0 2023-11-22 22:28:10,916 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6000, loss[loss=0.05429, simple_loss=0.06779, pruned_loss=0.009878, audio_tagging_loss=0.01052, over 14552.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09292, pruned_loss=0.01451, audio_tagging_loss=0.009141, over 3046565.88 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:28:10,917 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 22:28:32,066 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5872, 2.9119, 2.8261, 2.6013, 3.1015, 3.0782, 3.2310, 3.1666], device='cuda:1') 2023-11-22 22:28:54,207 INFO [train_asr.py:1253] (1/4) Epoch 27, validation: loss=0.05853, simple_loss=0.05134, pruned_loss=0.005103, audio_tagging_loss=0.02775, over 4681554.00 frames. 2023-11-22 22:28:54,208 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 22:29:34,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318650 2023-11-22 22:29:39,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2124313.3333333335, ans=0.0 2023-11-22 22:29:40,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.303e+01 8.890e+01 9.466e+01 1.522e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-22 22:29:41,915 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 22:29:50,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2124380.0, ans=0.1 2023-11-22 22:29:54,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2124380.0, ans=0.2 2023-11-22 22:29:57,683 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6050, loss[loss=0.05637, simple_loss=0.07058, pruned_loss=0.0125, audio_tagging_loss=0.008572, over 14798.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09271, pruned_loss=0.01449, audio_tagging_loss=0.0091, over 3045643.48 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:30:17,797 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:30:23,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2124580.0, ans=0.1 2023-11-22 22:30:25,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2023-11-22 22:30:38,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318700 2023-11-22 22:31:02,046 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6100, loss[loss=0.08435, simple_loss=0.1203, pruned_loss=0.01654, audio_tagging_loss=0.007644, over 16762.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09255, pruned_loss=0.01434, audio_tagging_loss=0.009079, over 3049774.62 frames. ], batch size: 61, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:31:18,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-11-22 22:31:29,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2124913.3333333335, ans=0.125 2023-11-22 22:31:33,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2124913.3333333335, ans=0.125 2023-11-22 22:31:35,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-11-22 22:31:39,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=22.5 2023-11-22 22:31:41,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318750 2023-11-22 22:31:44,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2124980.0, ans=0.0 2023-11-22 22:31:48,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.282e+01 8.867e+01 9.593e+01 1.255e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-22 22:32:06,852 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6150, loss[loss=0.05512, simple_loss=0.07294, pruned_loss=0.008042, audio_tagging_loss=0.01061, over 14579.00 frames. ], tot_loss[loss=0.06992, simple_loss=0.09308, pruned_loss=0.0143, audio_tagging_loss=0.009077, over 3047009.05 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:32:14,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2125113.3333333335, ans=0.1 2023-11-22 22:32:17,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2125113.3333333335, ans=0.2 2023-11-22 22:32:47,443 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318800 2023-11-22 22:32:47,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2125313.3333333335, ans=0.125 2023-11-22 22:33:11,342 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6200, loss[loss=0.07523, simple_loss=0.08706, pruned_loss=0.01838, audio_tagging_loss=0.01332, over 14790.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09292, pruned_loss=0.01451, audio_tagging_loss=0.009261, over 3042571.70 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:33:21,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2023-11-22 22:33:30,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2125513.3333333335, ans=0.125 2023-11-22 22:33:31,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-22 22:33:47,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-22 22:33:52,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318850 2023-11-22 22:33:57,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.282e+01 8.300e+01 8.948e+01 9.929e+01 3.016e+02, threshold=1.790e+02, percent-clipped=1.0 2023-11-22 22:34:02,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2125713.3333333335, ans=0.125 2023-11-22 22:34:13,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:34:15,988 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6250, loss[loss=0.04427, simple_loss=0.05206, pruned_loss=0.006608, audio_tagging_loss=0.01164, over 14696.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.09246, pruned_loss=0.0144, audio_tagging_loss=0.009339, over 3037576.99 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:34:44,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2125913.3333333335, ans=0.0 2023-11-22 22:34:48,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=8.0 2023-11-22 22:34:55,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318900 2023-11-22 22:34:59,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2125980.0, ans=0.0 2023-11-22 22:35:02,256 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:35:13,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:35:20,690 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6300, loss[loss=0.06629, simple_loss=0.08667, pruned_loss=0.01316, audio_tagging_loss=0.009788, over 15073.00 frames. ], tot_loss[loss=0.06991, simple_loss=0.09233, pruned_loss=0.01436, audio_tagging_loss=0.009393, over 3043794.19 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:35:24,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2126113.3333333335, ans=0.125 2023-11-22 22:35:33,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-22 22:35:34,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-22 22:35:42,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2126180.0, ans=0.04949747468305833 2023-11-22 22:35:50,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-22 22:35:53,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2126246.6666666665, ans=0.125 2023-11-22 22:35:55,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2126246.6666666665, ans=0.2 2023-11-22 22:36:01,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 318950 2023-11-22 22:36:07,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.041e+01 8.820e+01 9.703e+01 1.205e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 22:36:12,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-22 22:36:13,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2126380.0, ans=0.0 2023-11-22 22:36:25,212 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6350, loss[loss=0.07108, simple_loss=0.09554, pruned_loss=0.01519, audio_tagging_loss=0.00811, over 15020.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.0933, pruned_loss=0.0145, audio_tagging_loss=0.009498, over 3040933.87 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:36:50,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2126580.0, ans=0.125 2023-11-22 22:37:05,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319000 2023-11-22 22:37:28,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2126780.0, ans=0.1 2023-11-22 22:37:29,128 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6400, loss[loss=0.07498, simple_loss=0.09671, pruned_loss=0.01592, audio_tagging_loss=0.0107, over 14003.00 frames. ], tot_loss[loss=0.07037, simple_loss=0.09273, pruned_loss=0.01446, audio_tagging_loss=0.009543, over 3037970.39 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:37:35,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2126780.0, ans=0.125 2023-11-22 22:37:39,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2126780.0, ans=0.125 2023-11-22 22:37:49,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-22 22:37:56,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2126913.3333333335, ans=0.1 2023-11-22 22:38:08,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319050 2023-11-22 22:38:15,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.479e+01 8.884e+01 9.505e+01 1.366e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 22:38:32,760 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6450, loss[loss=0.07929, simple_loss=0.1108, pruned_loss=0.01654, audio_tagging_loss=0.007365, over 14324.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09245, pruned_loss=0.01445, audio_tagging_loss=0.009651, over 3033756.45 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:38:37,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-22 22:38:50,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2127180.0, ans=0.125 2023-11-22 22:39:07,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2127246.6666666665, ans=0.125 2023-11-22 22:39:08,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2127313.3333333335, ans=0.2 2023-11-22 22:39:11,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319100 2023-11-22 22:39:20,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2127313.3333333335, ans=0.0 2023-11-22 22:39:36,758 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6500, loss[loss=0.05868, simple_loss=0.0835, pruned_loss=0.009114, audio_tagging_loss=0.007819, over 15691.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09217, pruned_loss=0.01431, audio_tagging_loss=0.009635, over 3037652.31 frames. ], batch size: 60, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:39:41,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2127446.6666666665, ans=10.0 2023-11-22 22:39:42,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-22 22:39:44,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2127446.6666666665, ans=0.5 2023-11-22 22:39:46,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2127446.6666666665, ans=0.05 2023-11-22 22:39:49,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2127513.3333333335, ans=0.125 2023-11-22 22:39:57,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2127513.3333333335, ans=0.125 2023-11-22 22:40:08,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2127580.0, ans=0.2 2023-11-22 22:40:17,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319150 2023-11-22 22:40:17,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2127646.6666666665, ans=0.125 2023-11-22 22:40:19,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2127646.6666666665, ans=0.035 2023-11-22 22:40:23,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2127646.6666666665, ans=0.2 2023-11-22 22:40:24,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.470e+01 8.995e+01 9.586e+01 1.282e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-22 22:40:25,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2127646.6666666665, ans=0.0 2023-11-22 22:40:40,342 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6550, loss[loss=0.04841, simple_loss=0.06253, pruned_loss=0.007037, audio_tagging_loss=0.0101, over 14707.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09245, pruned_loss=0.01436, audio_tagging_loss=0.009466, over 3046672.27 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:40:43,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2127780.0, ans=0.95 2023-11-22 22:40:58,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2127846.6666666665, ans=0.1 2023-11-22 22:41:12,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2127913.3333333335, ans=0.0 2023-11-22 22:41:21,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319200 2023-11-22 22:41:28,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2127980.0, ans=0.07 2023-11-22 22:41:35,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2128046.6666666665, ans=0.125 2023-11-22 22:41:38,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-22 22:41:45,842 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6600, loss[loss=0.09601, simple_loss=0.136, pruned_loss=0.01824, audio_tagging_loss=0.00977, over 16365.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09238, pruned_loss=0.01428, audio_tagging_loss=0.009385, over 3048804.41 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:42:01,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2128180.0, ans=0.125 2023-11-22 22:42:11,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-22 22:42:18,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2128246.6666666665, ans=0.0 2023-11-22 22:42:23,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2128313.3333333335, ans=0.125 2023-11-22 22:42:25,566 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319250 2023-11-22 22:42:33,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.485e+01 9.023e+01 9.901e+01 1.531e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-22 22:42:49,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-22 22:42:49,966 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6650, loss[loss=0.06422, simple_loss=0.08816, pruned_loss=0.01188, audio_tagging_loss=0.008263, over 15022.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09164, pruned_loss=0.01422, audio_tagging_loss=0.009498, over 3047870.09 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:42:53,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2128446.6666666665, ans=0.0 2023-11-22 22:42:56,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2128446.6666666665, ans=0.125 2023-11-22 22:43:00,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2128446.6666666665, ans=0.125 2023-11-22 22:43:29,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319300 2023-11-22 22:43:53,675 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6700, loss[loss=0.08206, simple_loss=0.1057, pruned_loss=0.01883, audio_tagging_loss=0.01036, over 15533.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09206, pruned_loss=0.01443, audio_tagging_loss=0.009347, over 3049680.22 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:43:59,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2023-11-22 22:44:01,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2023-11-22 22:44:01,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2023-11-22 22:44:03,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2128780.0, ans=0.0 2023-11-22 22:44:07,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-22 22:44:34,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319350 2023-11-22 22:44:41,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.124e+01 8.630e+01 9.282e+01 1.213e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-22 22:44:54,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2129046.6666666665, ans=0.0 2023-11-22 22:44:58,814 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6750, loss[loss=0.0632, simple_loss=0.08501, pruned_loss=0.009783, audio_tagging_loss=0.01091, over 15356.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09247, pruned_loss=0.01445, audio_tagging_loss=0.009355, over 3042677.80 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 22:45:08,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2129113.3333333335, ans=0.125 2023-11-22 22:45:08,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-22 22:45:30,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2129246.6666666665, ans=0.2 2023-11-22 22:45:37,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319400 2023-11-22 22:45:42,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2129313.3333333335, ans=0.125 2023-11-22 22:46:03,689 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6800, loss[loss=0.05669, simple_loss=0.07691, pruned_loss=0.008255, audio_tagging_loss=0.00998, over 15519.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09291, pruned_loss=0.01455, audio_tagging_loss=0.009219, over 3044619.19 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:46:10,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2129446.6666666665, ans=0.125 2023-11-22 22:46:27,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2129580.0, ans=0.125 2023-11-22 22:46:28,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2129580.0, ans=0.125 2023-11-22 22:46:28,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-22 22:46:43,183 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319450 2023-11-22 22:46:50,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.253e+01 9.028e+01 9.739e+01 1.351e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-22 22:46:52,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2129646.6666666665, ans=0.0 2023-11-22 22:46:55,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2129713.3333333335, ans=0.125 2023-11-22 22:46:56,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2129713.3333333335, ans=0.125 2023-11-22 22:47:00,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-22 22:47:07,609 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6850, loss[loss=0.05667, simple_loss=0.07886, pruned_loss=0.00948, audio_tagging_loss=0.007756, over 15426.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09294, pruned_loss=0.01454, audio_tagging_loss=0.009134, over 3044275.42 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:47:10,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2129780.0, ans=0.0 2023-11-22 22:47:46,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2129980.0, ans=0.125 2023-11-22 22:47:47,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319500 2023-11-22 22:47:54,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2129980.0, ans=0.125 2023-11-22 22:47:55,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2129980.0, ans=0.125 2023-11-22 22:47:57,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-22 22:48:11,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=8.0 2023-11-22 22:48:11,920 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6900, loss[loss=0.08597, simple_loss=0.1193, pruned_loss=0.01879, audio_tagging_loss=0.007528, over 15576.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.09295, pruned_loss=0.01453, audio_tagging_loss=0.009074, over 3041590.34 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:48:22,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2130113.3333333335, ans=0.1 2023-11-22 22:48:23,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-22 22:48:34,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2130180.0, ans=0.2 2023-11-22 22:48:38,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2130246.6666666665, ans=0.0 2023-11-22 22:48:40,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2130246.6666666665, ans=0.125 2023-11-22 22:48:51,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319550 2023-11-22 22:48:59,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.191e+01 8.934e+01 9.580e+01 1.102e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-22 22:49:02,185 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 22:49:15,974 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 6950, loss[loss=0.08861, simple_loss=0.1259, pruned_loss=0.01932, audio_tagging_loss=0.006344, over 14705.00 frames. ], tot_loss[loss=0.07035, simple_loss=0.09324, pruned_loss=0.01457, audio_tagging_loss=0.009162, over 3042540.80 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:49:45,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2130580.0, ans=0.125 2023-11-22 22:49:52,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-11-22 22:49:55,216 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319600 2023-11-22 22:50:04,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2130646.6666666665, ans=0.125 2023-11-22 22:50:17,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-22 22:50:19,643 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7000, loss[loss=0.05744, simple_loss=0.07239, pruned_loss=0.01218, audio_tagging_loss=0.009071, over 15361.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.09354, pruned_loss=0.01472, audio_tagging_loss=0.009191, over 3040778.99 frames. ], batch size: 60, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:50:22,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2130780.0, ans=0.2 2023-11-22 22:50:34,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2130846.6666666665, ans=0.125 2023-11-22 22:50:42,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2023-11-22 22:50:43,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2130846.6666666665, ans=0.2 2023-11-22 22:50:45,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2130913.3333333335, ans=0.2 2023-11-22 22:50:59,627 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319650 2023-11-22 22:51:07,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.087e+01 8.954e+01 9.728e+01 1.359e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 22:51:19,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2131046.6666666665, ans=0.0 2023-11-22 22:51:20,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2131046.6666666665, ans=0.025 2023-11-22 22:51:23,710 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7050, loss[loss=0.06874, simple_loss=0.09539, pruned_loss=0.01298, audio_tagging_loss=0.008067, over 15259.00 frames. ], tot_loss[loss=0.07039, simple_loss=0.0931, pruned_loss=0.01456, audio_tagging_loss=0.009287, over 3047463.78 frames. ], batch size: 57, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:51:47,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2131180.0, ans=0.0 2023-11-22 22:51:51,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-11-22 22:52:03,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319700 2023-11-22 22:52:11,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-22 22:52:16,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2131380.0, ans=0.2 2023-11-22 22:52:21,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2023-11-22 22:52:24,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2131380.0, ans=0.2 2023-11-22 22:52:28,751 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7100, loss[loss=0.06591, simple_loss=0.08332, pruned_loss=0.01383, audio_tagging_loss=0.01042, over 14401.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09266, pruned_loss=0.01447, audio_tagging_loss=0.009358, over 3041024.88 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:52:33,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2131446.6666666665, ans=0.0 2023-11-22 22:52:34,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-22 22:52:35,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-22 22:52:39,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2131513.3333333335, ans=0.0 2023-11-22 22:52:46,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2131513.3333333335, ans=0.125 2023-11-22 22:52:55,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2131580.0, ans=0.0 2023-11-22 22:53:07,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319750 2023-11-22 22:53:15,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.625e+01 7.978e+01 8.785e+01 9.427e+01 1.365e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-22 22:53:17,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-22 22:53:31,759 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7150, loss[loss=0.06491, simple_loss=0.08974, pruned_loss=0.009745, audio_tagging_loss=0.01029, over 15631.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09296, pruned_loss=0.01441, audio_tagging_loss=0.009383, over 3046274.34 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:53:43,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2131846.6666666665, ans=0.5 2023-11-22 22:54:00,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2131913.3333333335, ans=0.125 2023-11-22 22:54:07,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2131913.3333333335, ans=0.0 2023-11-22 22:54:09,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2131980.0, ans=0.125 2023-11-22 22:54:11,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319800 2023-11-22 22:54:12,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2131980.0, ans=0.0 2023-11-22 22:54:18,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2131980.0, ans=0.05 2023-11-22 22:54:24,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2132046.6666666665, ans=0.125 2023-11-22 22:54:35,964 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7200, loss[loss=0.08704, simple_loss=0.1303, pruned_loss=0.01696, audio_tagging_loss=0.004945, over 15696.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09365, pruned_loss=0.01442, audio_tagging_loss=0.009424, over 3049299.60 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:55:14,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=22.5 2023-11-22 22:55:15,839 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319850 2023-11-22 22:55:19,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2132313.3333333335, ans=0.125 2023-11-22 22:55:23,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.207e+01 8.744e+01 9.394e+01 1.258e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-22 22:55:24,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2132313.3333333335, ans=0.125 2023-11-22 22:55:39,869 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7250, loss[loss=0.08811, simple_loss=0.1074, pruned_loss=0.02447, audio_tagging_loss=0.009958, over 14629.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09414, pruned_loss=0.01466, audio_tagging_loss=0.009453, over 3053839.65 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 32.0 2023-11-22 22:55:51,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2132513.3333333335, ans=10.0 2023-11-22 22:55:59,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2132513.3333333335, ans=0.2 2023-11-22 22:56:01,456 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 22:56:18,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319900 2023-11-22 22:56:25,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2132646.6666666665, ans=0.125 2023-11-22 22:56:39,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2132713.3333333335, ans=0.125 2023-11-22 22:56:43,080 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7300, loss[loss=0.07222, simple_loss=0.0939, pruned_loss=0.01354, audio_tagging_loss=0.01172, over 14605.00 frames. ], tot_loss[loss=0.07113, simple_loss=0.0941, pruned_loss=0.01477, audio_tagging_loss=0.009309, over 3050777.94 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 22:56:44,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-22 22:57:09,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2132913.3333333335, ans=15.0 2023-11-22 22:57:10,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2023-11-22 22:57:21,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2132980.0, ans=0.125 2023-11-22 22:57:21,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2132980.0, ans=0.07 2023-11-22 22:57:22,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 319950 2023-11-22 22:57:23,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2132980.0, ans=0.125 2023-11-22 22:57:24,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2132980.0, ans=0.2 2023-11-22 22:57:32,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.257e+01 8.750e+01 9.620e+01 1.168e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-22 22:57:44,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-22 22:57:46,960 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7350, loss[loss=0.06524, simple_loss=0.08468, pruned_loss=0.01617, audio_tagging_loss=0.00673, over 15309.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09334, pruned_loss=0.01464, audio_tagging_loss=0.009163, over 3053486.21 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 22:57:51,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2133113.3333333335, ans=0.0 2023-11-22 22:58:02,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2133180.0, ans=0.1 2023-11-22 22:58:03,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2133180.0, ans=0.0 2023-11-22 22:58:03,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2133180.0, ans=0.07 2023-11-22 22:58:16,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-22 22:58:20,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2133246.6666666665, ans=0.1 2023-11-22 22:58:23,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2133313.3333333335, ans=0.05 2023-11-22 22:58:26,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320000 2023-11-22 22:58:26,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-22 22:58:43,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2133380.0, ans=0.5 2023-11-22 22:58:53,948 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7400, loss[loss=0.08476, simple_loss=0.1192, pruned_loss=0.01742, audio_tagging_loss=0.007735, over 14780.00 frames. ], tot_loss[loss=0.07079, simple_loss=0.09388, pruned_loss=0.01475, audio_tagging_loss=0.009096, over 3055658.96 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 22:59:03,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2133446.6666666665, ans=0.125 2023-11-22 22:59:09,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2133513.3333333335, ans=0.125 2023-11-22 22:59:12,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2133513.3333333335, ans=0.125 2023-11-22 22:59:28,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2133580.0, ans=0.0 2023-11-22 22:59:33,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320050 2023-11-22 22:59:33,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2023-11-22 22:59:34,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2133646.6666666665, ans=0.0 2023-11-22 22:59:43,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.110e+01 8.793e+01 9.338e+01 3.637e+02, threshold=1.759e+02, percent-clipped=1.0 2023-11-22 22:59:57,615 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7450, loss[loss=0.05813, simple_loss=0.07146, pruned_loss=0.01322, audio_tagging_loss=0.009179, over 15282.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09367, pruned_loss=0.0147, audio_tagging_loss=0.009182, over 3050415.10 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 23:00:01,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2133780.0, ans=0.5 2023-11-22 23:00:13,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2133846.6666666665, ans=0.0 2023-11-22 23:00:38,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320100 2023-11-22 23:01:01,246 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7500, loss[loss=0.07971, simple_loss=0.106, pruned_loss=0.01747, audio_tagging_loss=0.009255, over 14366.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.09362, pruned_loss=0.01482, audio_tagging_loss=0.009202, over 3050880.05 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 23:01:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2134113.3333333335, ans=0.1 2023-11-22 23:01:11,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-22 23:01:20,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2134180.0, ans=0.0 2023-11-22 23:01:36,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2134246.6666666665, ans=0.0 2023-11-22 23:01:41,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320150 2023-11-22 23:01:42,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-22 23:01:44,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2134313.3333333335, ans=0.125 2023-11-22 23:01:45,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-22 23:01:51,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.201e+01 8.778e+01 9.335e+01 1.224e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 23:01:54,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2134380.0, ans=0.0 2023-11-22 23:02:04,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-22 23:02:05,931 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7550, loss[loss=0.07374, simple_loss=0.1005, pruned_loss=0.01386, audio_tagging_loss=0.009627, over 15942.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09288, pruned_loss=0.01477, audio_tagging_loss=0.009174, over 3057093.53 frames. ], batch size: 61, lr: 2.52e-03, grad_scale: 8.0 2023-11-22 23:02:25,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2134513.3333333335, ans=0.0 2023-11-22 23:02:28,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2023-11-22 23:02:45,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320200 2023-11-22 23:02:49,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2134646.6666666665, ans=0.125 2023-11-22 23:03:10,699 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7600, loss[loss=0.07659, simple_loss=0.103, pruned_loss=0.01483, audio_tagging_loss=0.01028, over 14884.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.09323, pruned_loss=0.01471, audio_tagging_loss=0.009098, over 3060205.97 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 16.0 2023-11-22 23:03:50,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320250 2023-11-22 23:03:51,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2134980.0, ans=0.0 2023-11-22 23:03:52,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2134980.0, ans=0.0 2023-11-22 23:03:56,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=15.0 2023-11-22 23:04:00,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-22 23:04:00,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.089e+01 8.781e+01 9.511e+01 1.232e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-22 23:04:06,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2135046.6666666665, ans=0.125 2023-11-22 23:04:09,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2135046.6666666665, ans=0.125 2023-11-22 23:04:13,917 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7650, loss[loss=0.06566, simple_loss=0.08689, pruned_loss=0.01301, audio_tagging_loss=0.009204, over 15954.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09295, pruned_loss=0.01468, audio_tagging_loss=0.009098, over 3055615.94 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:04:19,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-11-22 23:04:27,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2135180.0, ans=0.2 2023-11-22 23:04:53,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320300 2023-11-22 23:04:53,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2135313.3333333335, ans=0.125 2023-11-22 23:04:55,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-22 23:04:57,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2135313.3333333335, ans=0.0 2023-11-22 23:05:01,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2135313.3333333335, ans=0.0 2023-11-22 23:05:18,677 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7700, loss[loss=0.06671, simple_loss=0.08788, pruned_loss=0.0136, audio_tagging_loss=0.009165, over 14894.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.09383, pruned_loss=0.01463, audio_tagging_loss=0.009059, over 3052706.49 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:05:26,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2135446.6666666665, ans=0.125 2023-11-22 23:05:57,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2135646.6666666665, ans=0.0 2023-11-22 23:05:57,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2135646.6666666665, ans=0.1 2023-11-22 23:05:59,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320350 2023-11-22 23:06:10,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.326e+01 9.020e+01 9.879e+01 1.267e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-22 23:06:15,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-22 23:06:25,306 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7750, loss[loss=0.05295, simple_loss=0.06966, pruned_loss=0.01074, audio_tagging_loss=0.007373, over 14680.00 frames. ], tot_loss[loss=0.07075, simple_loss=0.09395, pruned_loss=0.01468, audio_tagging_loss=0.009098, over 3057920.08 frames. ], batch size: 55, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:06:42,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2135846.6666666665, ans=22.5 2023-11-22 23:06:50,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2135913.3333333335, ans=0.1 2023-11-22 23:07:05,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320400 2023-11-22 23:07:28,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2136113.3333333335, ans=0.125 2023-11-22 23:07:29,748 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7800, loss[loss=0.0672, simple_loss=0.08414, pruned_loss=0.01311, audio_tagging_loss=0.01201, over 15293.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.09493, pruned_loss=0.0149, audio_tagging_loss=0.009134, over 3052834.76 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:07:58,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2136246.6666666665, ans=0.0 2023-11-22 23:07:59,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2136246.6666666665, ans=0.2 2023-11-22 23:07:59,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2023-11-22 23:08:10,010 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320450 2023-11-22 23:08:14,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-11-22 23:08:17,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2136313.3333333335, ans=0.125 2023-11-22 23:08:19,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.253e+01 8.929e+01 9.795e+01 1.214e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 23:08:32,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2023-11-22 23:08:33,294 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7850, loss[loss=0.06616, simple_loss=0.09225, pruned_loss=0.01177, audio_tagging_loss=0.008276, over 16408.00 frames. ], tot_loss[loss=0.0709, simple_loss=0.09393, pruned_loss=0.01465, audio_tagging_loss=0.009286, over 3055619.58 frames. ], batch size: 62, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:08:35,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-11-22 23:08:39,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2136446.6666666665, ans=0.125 2023-11-22 23:09:12,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-11-22 23:09:14,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320500 2023-11-22 23:09:25,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2136713.3333333335, ans=0.0 2023-11-22 23:09:25,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2136713.3333333335, ans=0.0 2023-11-22 23:09:37,535 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 23:09:39,726 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7900, loss[loss=0.08394, simple_loss=0.1143, pruned_loss=0.01962, audio_tagging_loss=0.007153, over 15594.00 frames. ], tot_loss[loss=0.07172, simple_loss=0.09498, pruned_loss=0.01488, audio_tagging_loss=0.009342, over 3048483.05 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:09:51,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-22 23:09:56,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2136846.6666666665, ans=0.125 2023-11-22 23:09:59,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2136846.6666666665, ans=0.125 2023-11-22 23:10:09,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2136913.3333333335, ans=0.125 2023-11-22 23:10:11,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2136913.3333333335, ans=0.0 2023-11-22 23:10:13,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2136913.3333333335, ans=0.1 2023-11-22 23:10:19,151 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320550 2023-11-22 23:10:25,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2136980.0, ans=0.125 2023-11-22 23:10:27,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2136980.0, ans=0.125 2023-11-22 23:10:29,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.096e+01 8.900e+01 9.590e+01 1.237e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-22 23:10:41,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2137046.6666666665, ans=0.0 2023-11-22 23:10:42,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2137113.3333333335, ans=0.0 2023-11-22 23:10:43,603 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 7950, loss[loss=0.06617, simple_loss=0.09028, pruned_loss=0.01097, audio_tagging_loss=0.01005, over 15756.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09407, pruned_loss=0.0147, audio_tagging_loss=0.009472, over 3040874.20 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:10:44,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2137113.3333333335, ans=0.125 2023-11-22 23:10:45,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-22 23:10:52,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2137113.3333333335, ans=0.125 2023-11-22 23:10:58,098 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:11:00,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2137180.0, ans=0.2 2023-11-22 23:11:04,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2137180.0, ans=0.125 2023-11-22 23:11:23,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320600 2023-11-22 23:11:39,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2137380.0, ans=0.125 2023-11-22 23:11:39,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2137380.0, ans=0.125 2023-11-22 23:11:39,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-22 23:11:47,659 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8000, loss[loss=0.06584, simple_loss=0.08848, pruned_loss=0.01231, audio_tagging_loss=0.009298, over 14890.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09329, pruned_loss=0.01465, audio_tagging_loss=0.009563, over 3038693.24 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:12:12,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-22 23:12:13,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2137580.0, ans=0.05 2023-11-22 23:12:15,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2137580.0, ans=0.1 2023-11-22 23:12:27,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320650 2023-11-22 23:12:30,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2137646.6666666665, ans=0.0 2023-11-22 23:12:35,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2137646.6666666665, ans=0.1 2023-11-22 23:12:37,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.271e+01 8.710e+01 9.655e+01 1.155e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-22 23:12:37,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2137713.3333333335, ans=0.0 2023-11-22 23:12:51,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2137780.0, ans=0.0 2023-11-22 23:12:52,990 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8050, loss[loss=0.08819, simple_loss=0.1143, pruned_loss=0.02026, audio_tagging_loss=0.01079, over 15467.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.09376, pruned_loss=0.01483, audio_tagging_loss=0.009548, over 3038239.29 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:12:56,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-22 23:13:02,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2137780.0, ans=0.0 2023-11-22 23:13:32,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320700 2023-11-22 23:13:35,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-22 23:13:36,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2137980.0, ans=0.1 2023-11-22 23:13:40,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2137980.0, ans=0.07 2023-11-22 23:13:51,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2138046.6666666665, ans=0.1 2023-11-22 23:13:57,279 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8100, loss[loss=0.07731, simple_loss=0.1057, pruned_loss=0.01645, audio_tagging_loss=0.007987, over 15814.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.09361, pruned_loss=0.01466, audio_tagging_loss=0.009466, over 3040578.90 frames. ], batch size: 61, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:14:01,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2138113.3333333335, ans=0.1 2023-11-22 23:14:01,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2138113.3333333335, ans=0.95 2023-11-22 23:14:37,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320750 2023-11-22 23:14:38,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2138313.3333333335, ans=0.125 2023-11-22 23:14:47,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.289e+01 8.980e+01 9.464e+01 1.714e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-22 23:14:49,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2138380.0, ans=0.1 2023-11-22 23:15:01,311 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8150, loss[loss=0.05746, simple_loss=0.07663, pruned_loss=0.01059, audio_tagging_loss=0.008553, over 14987.00 frames. ], tot_loss[loss=0.07106, simple_loss=0.09409, pruned_loss=0.01473, audio_tagging_loss=0.009295, over 3045571.94 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:15:05,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2138446.6666666665, ans=0.1 2023-11-22 23:15:21,594 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 23:15:41,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320800 2023-11-22 23:15:44,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2138646.6666666665, ans=0.125 2023-11-22 23:15:52,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2138713.3333333335, ans=0.2 2023-11-22 23:16:05,064 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 23:16:06,071 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8200, loss[loss=0.0507, simple_loss=0.06342, pruned_loss=0.009044, audio_tagging_loss=0.009949, over 15143.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09445, pruned_loss=0.01465, audio_tagging_loss=0.009203, over 3038285.09 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:16:06,132 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:16:08,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=12.0 2023-11-22 23:16:12,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.46 vs. limit=22.5 2023-11-22 23:16:27,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2138846.6666666665, ans=0.0 2023-11-22 23:16:28,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2138846.6666666665, ans=0.125 2023-11-22 23:16:33,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2138913.3333333335, ans=0.2 2023-11-22 23:16:43,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2138980.0, ans=0.2 2023-11-22 23:16:45,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320850 2023-11-22 23:16:56,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.281e+01 9.046e+01 9.469e+01 1.170e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-22 23:16:59,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2139046.6666666665, ans=0.1 2023-11-22 23:17:10,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2139113.3333333335, ans=0.125 2023-11-22 23:17:11,154 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8250, loss[loss=0.05429, simple_loss=0.06654, pruned_loss=0.0105, audio_tagging_loss=0.01051, over 15967.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09342, pruned_loss=0.0145, audio_tagging_loss=0.009128, over 3039642.54 frames. ], batch size: 62, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:17:20,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2139113.3333333335, ans=0.125 2023-11-22 23:17:20,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2139113.3333333335, ans=0.1 2023-11-22 23:17:23,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2139180.0, ans=0.07 2023-11-22 23:17:34,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2139180.0, ans=0.0 2023-11-22 23:17:51,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320900 2023-11-22 23:17:58,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2139313.3333333335, ans=0.125 2023-11-22 23:18:15,858 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8300, loss[loss=0.08017, simple_loss=0.1024, pruned_loss=0.01893, audio_tagging_loss=0.01004, over 16055.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09344, pruned_loss=0.01445, audio_tagging_loss=0.009159, over 3044167.77 frames. ], batch size: 61, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:18:19,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2139446.6666666665, ans=0.0 2023-11-22 23:18:22,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2023-11-22 23:18:31,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2139513.3333333335, ans=0.125 2023-11-22 23:18:47,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2139580.0, ans=0.125 2023-11-22 23:18:50,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2139580.0, ans=0.0 2023-11-22 23:18:56,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 320950 2023-11-22 23:19:00,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2139646.6666666665, ans=0.1 2023-11-22 23:19:05,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.400e+01 9.007e+01 9.767e+01 1.180e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-22 23:19:11,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2139713.3333333335, ans=0.0 2023-11-22 23:19:19,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2139780.0, ans=0.04949747468305833 2023-11-22 23:19:20,895 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8350, loss[loss=0.06611, simple_loss=0.0825, pruned_loss=0.01409, audio_tagging_loss=0.01077, over 14652.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09344, pruned_loss=0.01455, audio_tagging_loss=0.009131, over 3040118.46 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:19:36,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2139846.6666666665, ans=0.125 2023-11-22 23:19:39,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-22 23:19:59,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321000 2023-11-22 23:20:05,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2139980.0, ans=0.0 2023-11-22 23:20:24,980 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8400, loss[loss=0.06421, simple_loss=0.08211, pruned_loss=0.01229, audio_tagging_loss=0.01086, over 15043.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.09402, pruned_loss=0.01469, audio_tagging_loss=0.009113, over 3042499.00 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:20:26,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2140113.3333333335, ans=0.0 2023-11-22 23:20:39,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2140180.0, ans=0.1 2023-11-22 23:21:00,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2140246.6666666665, ans=0.125 2023-11-22 23:21:06,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321050 2023-11-22 23:21:14,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-22 23:21:14,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-22 23:21:15,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2140313.3333333335, ans=0.1 2023-11-22 23:21:17,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.749e+01 8.024e+01 8.884e+01 9.515e+01 1.445e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-22 23:21:19,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2140380.0, ans=0.2 2023-11-22 23:21:20,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2140380.0, ans=0.0 2023-11-22 23:21:30,749 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8450, loss[loss=0.06417, simple_loss=0.07916, pruned_loss=0.01208, audio_tagging_loss=0.01251, over 15456.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09335, pruned_loss=0.01449, audio_tagging_loss=0.009106, over 3044256.36 frames. ], batch size: 61, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:21:32,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2140446.6666666665, ans=0.2 2023-11-22 23:21:37,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-22 23:21:42,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2140513.3333333335, ans=0.2 2023-11-22 23:21:43,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2140513.3333333335, ans=0.1 2023-11-22 23:21:50,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2140513.3333333335, ans=0.0 2023-11-22 23:21:56,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2140580.0, ans=0.125 2023-11-22 23:21:58,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-22 23:22:06,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2140580.0, ans=0.2 2023-11-22 23:22:11,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321100 2023-11-22 23:22:11,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2140646.6666666665, ans=15.0 2023-11-22 23:22:36,212 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8500, loss[loss=0.04559, simple_loss=0.06187, pruned_loss=0.005903, audio_tagging_loss=0.008753, over 15080.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.0939, pruned_loss=0.01473, audio_tagging_loss=0.009048, over 3041614.26 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:22:50,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2140846.6666666665, ans=0.0 2023-11-22 23:23:05,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2023-11-22 23:23:15,174 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321150 2023-11-22 23:23:27,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.243e+01 8.956e+01 9.576e+01 1.335e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-22 23:23:37,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-22 23:23:40,548 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8550, loss[loss=0.06954, simple_loss=0.09421, pruned_loss=0.01299, audio_tagging_loss=0.009441, over 14916.00 frames. ], tot_loss[loss=0.07078, simple_loss=0.0942, pruned_loss=0.01459, audio_tagging_loss=0.009084, over 3050334.43 frames. ], batch size: 55, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:23:49,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2141113.3333333335, ans=0.1 2023-11-22 23:24:01,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-22 23:24:03,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2141180.0, ans=0.07 2023-11-22 23:24:06,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-22 23:24:20,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321200 2023-11-22 23:24:32,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2141380.0, ans=0.125 2023-11-22 23:24:41,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2141380.0, ans=0.1 2023-11-22 23:24:44,482 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8600, loss[loss=0.06958, simple_loss=0.1033, pruned_loss=0.01065, audio_tagging_loss=0.007296, over 15161.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.09369, pruned_loss=0.01457, audio_tagging_loss=0.009085, over 3051898.65 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:25:01,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2141513.3333333335, ans=0.125 2023-11-22 23:25:17,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-22 23:25:25,087 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321250 2023-11-22 23:25:27,743 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-22 23:25:36,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.395e+01 8.985e+01 9.646e+01 1.201e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-22 23:25:36,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2141713.3333333335, ans=0.0 2023-11-22 23:25:49,425 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8650, loss[loss=0.08634, simple_loss=0.1207, pruned_loss=0.01575, audio_tagging_loss=0.01022, over 16049.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09375, pruned_loss=0.01453, audio_tagging_loss=0.009238, over 3048895.99 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:26:06,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2141846.6666666665, ans=0.2 2023-11-22 23:26:08,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2023-11-22 23:26:28,835 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321300 2023-11-22 23:26:37,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2023-11-22 23:26:54,808 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8700, loss[loss=0.08627, simple_loss=0.119, pruned_loss=0.01735, audio_tagging_loss=0.00943, over 15505.00 frames. ], tot_loss[loss=0.07018, simple_loss=0.09274, pruned_loss=0.01449, audio_tagging_loss=0.009314, over 3046803.55 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:26:55,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-11-22 23:27:01,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-11-22 23:27:02,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2142113.3333333335, ans=0.125 2023-11-22 23:27:12,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2142180.0, ans=0.125 2023-11-22 23:27:12,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2142180.0, ans=0.0 2023-11-22 23:27:23,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2142246.6666666665, ans=0.05 2023-11-22 23:27:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2142246.6666666665, ans=0.0 2023-11-22 23:27:31,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2142246.6666666665, ans=0.125 2023-11-22 23:27:34,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321350 2023-11-22 23:27:42,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2142313.3333333335, ans=0.2 2023-11-22 23:27:45,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.323e+01 8.903e+01 9.633e+01 3.250e+02, threshold=1.781e+02, percent-clipped=1.0 2023-11-22 23:27:53,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2142380.0, ans=0.125 2023-11-22 23:27:58,157 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8750, loss[loss=0.079, simple_loss=0.102, pruned_loss=0.01869, audio_tagging_loss=0.009323, over 16742.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09378, pruned_loss=0.0147, audio_tagging_loss=0.009404, over 3049582.63 frames. ], batch size: 63, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:28:05,693 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-22 23:28:39,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321400 2023-11-22 23:28:43,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-22 23:28:44,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2142646.6666666665, ans=0.125 2023-11-22 23:29:03,169 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8800, loss[loss=0.07911, simple_loss=0.104, pruned_loss=0.02015, audio_tagging_loss=0.006959, over 14823.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.0945, pruned_loss=0.01479, audio_tagging_loss=0.009439, over 3055493.66 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:29:37,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2023-11-22 23:29:44,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321450 2023-11-22 23:29:55,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.814e+01 8.244e+01 8.962e+01 9.489e+01 1.229e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-22 23:30:10,460 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8850, loss[loss=0.06194, simple_loss=0.08216, pruned_loss=0.01199, audio_tagging_loss=0.008868, over 15101.00 frames. ], tot_loss[loss=0.07119, simple_loss=0.09422, pruned_loss=0.01467, audio_tagging_loss=0.009422, over 3056331.95 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:30:19,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2143113.3333333335, ans=0.0 2023-11-22 23:30:21,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2023-11-22 23:30:21,727 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:30:25,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2143180.0, ans=0.0 2023-11-22 23:30:31,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2023-11-22 23:30:34,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2143246.6666666665, ans=0.125 2023-11-22 23:30:50,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321500 2023-11-22 23:31:01,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2143380.0, ans=0.125 2023-11-22 23:31:14,526 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8900, loss[loss=0.0868, simple_loss=0.1225, pruned_loss=0.01942, audio_tagging_loss=0.00611, over 14718.00 frames. ], tot_loss[loss=0.07155, simple_loss=0.09489, pruned_loss=0.0148, audio_tagging_loss=0.009303, over 3058859.90 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:31:19,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2143446.6666666665, ans=0.125 2023-11-22 23:31:23,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2143446.6666666665, ans=0.0 2023-11-22 23:31:29,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.06 vs. limit=10.0 2023-11-22 23:31:29,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2143513.3333333335, ans=0.125 2023-11-22 23:31:42,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2143580.0, ans=0.125 2023-11-22 23:31:45,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2143580.0, ans=0.125 2023-11-22 23:31:55,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321550 2023-11-22 23:32:05,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-22 23:32:05,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.181e+01 8.758e+01 9.474e+01 1.266e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-22 23:32:18,168 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 8950, loss[loss=0.08138, simple_loss=0.1152, pruned_loss=0.01623, audio_tagging_loss=0.007566, over 14483.00 frames. ], tot_loss[loss=0.0716, simple_loss=0.09528, pruned_loss=0.01484, audio_tagging_loss=0.009125, over 3053942.32 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:32:36,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2143846.6666666665, ans=0.125 2023-11-22 23:33:00,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321600 2023-11-22 23:33:00,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2143980.0, ans=0.0 2023-11-22 23:33:08,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2143980.0, ans=0.2 2023-11-22 23:33:17,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2144046.6666666665, ans=0.125 2023-11-22 23:33:26,996 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9000, loss[loss=0.05884, simple_loss=0.07202, pruned_loss=0.009721, audio_tagging_loss=0.01311, over 15502.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09456, pruned_loss=0.01486, audio_tagging_loss=0.009111, over 3051116.06 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:33:26,997 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-22 23:34:08,299 INFO [train_asr.py:1253] (1/4) Epoch 27, validation: loss=0.05906, simple_loss=0.05129, pruned_loss=0.005052, audio_tagging_loss=0.02836, over 4681554.00 frames. 2023-11-22 23:34:08,300 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-22 23:34:09,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2144113.3333333335, ans=0.1 2023-11-22 23:34:15,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-22 23:34:35,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-11-22 23:34:41,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2144246.6666666665, ans=0.1 2023-11-22 23:34:50,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321650 2023-11-22 23:35:01,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.419e+01 9.018e+01 9.927e+01 1.306e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-22 23:35:06,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2144380.0, ans=0.04949747468305833 2023-11-22 23:35:13,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2144446.6666666665, ans=0.125 2023-11-22 23:35:13,843 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9050, loss[loss=0.06115, simple_loss=0.08004, pruned_loss=0.01096, audio_tagging_loss=0.01017, over 15470.00 frames. ], tot_loss[loss=0.07129, simple_loss=0.09466, pruned_loss=0.01488, audio_tagging_loss=0.009081, over 3051204.09 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:35:19,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2144446.6666666665, ans=0.0 2023-11-22 23:35:30,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2144513.3333333335, ans=0.04949747468305833 2023-11-22 23:35:33,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2144513.3333333335, ans=0.1 2023-11-22 23:35:39,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2144580.0, ans=0.125 2023-11-22 23:35:47,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2144580.0, ans=0.2 2023-11-22 23:35:54,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321700 2023-11-22 23:36:17,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-22 23:36:19,461 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9100, loss[loss=0.06163, simple_loss=0.0812, pruned_loss=0.01064, audio_tagging_loss=0.01039, over 15760.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09438, pruned_loss=0.01479, audio_tagging_loss=0.009111, over 3048060.03 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:36:19,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2144780.0, ans=0.5 2023-11-22 23:36:21,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2144780.0, ans=0.125 2023-11-22 23:36:58,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321750 2023-11-22 23:37:04,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2144980.0, ans=0.2 2023-11-22 23:37:11,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.202e+01 8.707e+01 9.347e+01 1.081e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-22 23:37:21,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2145113.3333333335, ans=0.05 2023-11-22 23:37:22,629 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9150, loss[loss=0.07995, simple_loss=0.1015, pruned_loss=0.01713, audio_tagging_loss=0.01205, over 15682.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.09489, pruned_loss=0.01497, audio_tagging_loss=0.00903, over 3049310.04 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:37:30,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2145113.3333333335, ans=0.125 2023-11-22 23:37:52,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2145246.6666666665, ans=0.0 2023-11-22 23:38:02,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321800 2023-11-22 23:38:03,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-22 23:38:21,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2145380.0, ans=0.025 2023-11-22 23:38:22,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-22 23:38:26,073 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9200, loss[loss=0.08271, simple_loss=0.1162, pruned_loss=0.018, audio_tagging_loss=0.006602, over 15817.00 frames. ], tot_loss[loss=0.07151, simple_loss=0.0953, pruned_loss=0.01496, audio_tagging_loss=0.008905, over 3054984.29 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:38:31,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2145446.6666666665, ans=0.125 2023-11-22 23:39:06,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321850 2023-11-22 23:39:18,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.155e+01 8.815e+01 9.566e+01 1.382e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-22 23:39:21,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-22 23:39:23,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2145713.3333333335, ans=0.125 2023-11-22 23:39:23,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-22 23:39:24,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-22 23:39:25,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2145713.3333333335, ans=0.05 2023-11-22 23:39:30,950 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9250, loss[loss=0.05631, simple_loss=0.07577, pruned_loss=0.006693, audio_tagging_loss=0.01173, over 15666.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09437, pruned_loss=0.01471, audio_tagging_loss=0.00892, over 3047367.71 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:39:47,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2145846.6666666665, ans=0.1 2023-11-22 23:39:49,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-22 23:39:51,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2145846.6666666665, ans=0.2 2023-11-22 23:40:00,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-11-22 23:40:03,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2145913.3333333335, ans=0.5 2023-11-22 23:40:05,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2023-11-22 23:40:10,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321900 2023-11-22 23:40:17,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-11-22 23:40:29,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2146046.6666666665, ans=0.07 2023-11-22 23:40:34,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2023-11-22 23:40:35,080 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9300, loss[loss=0.05953, simple_loss=0.07382, pruned_loss=0.01169, audio_tagging_loss=0.01093, over 15561.00 frames. ], tot_loss[loss=0.0706, simple_loss=0.09394, pruned_loss=0.01456, audio_tagging_loss=0.009071, over 3049063.87 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:40:40,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2146113.3333333335, ans=0.125 2023-11-22 23:40:57,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2146180.0, ans=0.125 2023-11-22 23:40:58,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2146246.6666666665, ans=0.0 2023-11-22 23:41:00,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2146246.6666666665, ans=0.0 2023-11-22 23:41:14,785 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 321950 2023-11-22 23:41:17,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2146313.3333333335, ans=0.125 2023-11-22 23:41:23,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2146313.3333333335, ans=0.0 2023-11-22 23:41:26,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.225e+01 8.828e+01 9.380e+01 1.148e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-22 23:41:29,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2023-11-22 23:41:33,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2146380.0, ans=0.0 2023-11-22 23:41:37,951 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9350, loss[loss=0.06953, simple_loss=0.09134, pruned_loss=0.01431, audio_tagging_loss=0.009545, over 15347.00 frames. ], tot_loss[loss=0.07039, simple_loss=0.0937, pruned_loss=0.01454, audio_tagging_loss=0.009, over 3056730.62 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:41:46,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2146446.6666666665, ans=0.125 2023-11-22 23:42:04,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2146580.0, ans=0.125 2023-11-22 23:42:11,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-22 23:42:17,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322000 2023-11-22 23:42:43,222 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9400, loss[loss=0.0394, simple_loss=0.04541, pruned_loss=0.003536, audio_tagging_loss=0.01316, over 13513.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.0932, pruned_loss=0.01437, audio_tagging_loss=0.009157, over 3063588.84 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:43:00,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2146846.6666666665, ans=0.125 2023-11-22 23:43:03,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2146846.6666666665, ans=0.125 2023-11-22 23:43:17,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2146913.3333333335, ans=0.1 2023-11-22 23:43:22,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2146980.0, ans=0.125 2023-11-22 23:43:23,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322050 2023-11-22 23:43:26,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2146980.0, ans=0.1 2023-11-22 23:43:34,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2147046.6666666665, ans=0.125 2023-11-22 23:43:36,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.391e+01 8.949e+01 9.601e+01 1.196e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-22 23:43:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2147046.6666666665, ans=0.1 2023-11-22 23:43:46,450 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:43:49,030 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9450, loss[loss=0.0611, simple_loss=0.07821, pruned_loss=0.01237, audio_tagging_loss=0.009627, over 14968.00 frames. ], tot_loss[loss=0.07032, simple_loss=0.09352, pruned_loss=0.0143, audio_tagging_loss=0.00926, over 3058039.06 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:43:59,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2147113.3333333335, ans=0.95 2023-11-22 23:43:59,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2147113.3333333335, ans=0.125 2023-11-22 23:44:00,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2147180.0, ans=0.125 2023-11-22 23:44:29,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322100 2023-11-22 23:44:31,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2147313.3333333335, ans=0.2 2023-11-22 23:44:52,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2147446.6666666665, ans=0.0 2023-11-22 23:44:53,037 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9500, loss[loss=0.06019, simple_loss=0.08414, pruned_loss=0.0104, audio_tagging_loss=0.007727, over 15293.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09264, pruned_loss=0.01419, audio_tagging_loss=0.009376, over 3051409.39 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:45:04,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2147446.6666666665, ans=0.0 2023-11-22 23:45:09,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2147513.3333333335, ans=0.125 2023-11-22 23:45:11,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2147513.3333333335, ans=0.0 2023-11-22 23:45:33,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322150 2023-11-22 23:45:44,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2147713.3333333335, ans=0.125 2023-11-22 23:45:45,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.349e+01 8.991e+01 9.634e+01 1.651e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-22 23:45:45,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2147713.3333333335, ans=0.125 2023-11-22 23:45:48,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2147713.3333333335, ans=22.5 2023-11-22 23:45:57,750 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9550, loss[loss=0.07456, simple_loss=0.1003, pruned_loss=0.01503, audio_tagging_loss=0.009358, over 15005.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09269, pruned_loss=0.01427, audio_tagging_loss=0.009432, over 3045554.07 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:46:18,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-22 23:46:37,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322200 2023-11-22 23:46:39,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2147980.0, ans=0.2 2023-11-22 23:46:42,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2147980.0, ans=0.07 2023-11-22 23:46:47,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2147980.0, ans=0.125 2023-11-22 23:46:50,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2148046.6666666665, ans=0.2 2023-11-22 23:46:54,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2148046.6666666665, ans=0.125 2023-11-22 23:46:55,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2148046.6666666665, ans=0.1 2023-11-22 23:46:56,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2148046.6666666665, ans=0.2 2023-11-22 23:47:03,439 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9600, loss[loss=0.04989, simple_loss=0.06695, pruned_loss=0.008429, audio_tagging_loss=0.00798, over 13825.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09312, pruned_loss=0.0144, audio_tagging_loss=0.00942, over 3049316.33 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:47:42,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2148313.3333333335, ans=0.0 2023-11-22 23:47:42,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2148313.3333333335, ans=0.0 2023-11-22 23:47:43,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322250 2023-11-22 23:47:53,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2148380.0, ans=0.125 2023-11-22 23:47:57,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.150e+01 8.727e+01 9.513e+01 1.269e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-22 23:47:58,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2148380.0, ans=0.1 2023-11-22 23:48:07,658 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9650, loss[loss=0.07733, simple_loss=0.1023, pruned_loss=0.01567, audio_tagging_loss=0.01052, over 14433.00 frames. ], tot_loss[loss=0.06991, simple_loss=0.09259, pruned_loss=0.01427, audio_tagging_loss=0.009346, over 3038156.24 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:48:12,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2148446.6666666665, ans=0.125 2023-11-22 23:48:19,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2148513.3333333335, ans=0.07 2023-11-22 23:48:30,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2148513.3333333335, ans=0.0 2023-11-22 23:48:47,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322300 2023-11-22 23:48:59,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2148713.3333333335, ans=0.125 2023-11-22 23:49:12,267 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9700, loss[loss=0.08366, simple_loss=0.1144, pruned_loss=0.01742, audio_tagging_loss=0.009025, over 14992.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09221, pruned_loss=0.01426, audio_tagging_loss=0.009244, over 3027776.05 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:49:38,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-22 23:49:39,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2148913.3333333335, ans=0.1 2023-11-22 23:49:44,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2148913.3333333335, ans=0.125 2023-11-22 23:49:52,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322350 2023-11-22 23:50:07,997 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.743e+01 8.299e+01 9.052e+01 9.780e+01 1.137e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-22 23:50:08,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2149046.6666666665, ans=0.125 2023-11-22 23:50:14,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2149046.6666666665, ans=0.125 2023-11-22 23:50:16,575 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9750, loss[loss=0.1118, simple_loss=0.1497, pruned_loss=0.02957, audio_tagging_loss=0.007369, over 16093.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09083, pruned_loss=0.01412, audio_tagging_loss=0.009311, over 3020545.64 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:50:18,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-22 23:50:39,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2149180.0, ans=0.1 2023-11-22 23:50:42,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2149246.6666666665, ans=0.125 2023-11-22 23:50:54,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2149313.3333333335, ans=0.125 2023-11-22 23:50:56,595 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322400 2023-11-22 23:51:09,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2149380.0, ans=0.125 2023-11-22 23:51:20,649 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9800, loss[loss=0.07271, simple_loss=0.09914, pruned_loss=0.01719, audio_tagging_loss=0.00595, over 14547.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09165, pruned_loss=0.01437, audio_tagging_loss=0.00923, over 3036990.14 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:51:53,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2149580.0, ans=0.07 2023-11-22 23:52:01,163 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322450 2023-11-22 23:52:10,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-22 23:52:15,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2149713.3333333335, ans=0.2 2023-11-22 23:52:16,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.583e+01 9.205e+01 9.862e+01 1.172e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-22 23:52:17,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-22 23:52:17,931 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:52:25,271 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9850, loss[loss=0.05793, simple_loss=0.07045, pruned_loss=0.01252, audio_tagging_loss=0.01019, over 16528.00 frames. ], tot_loss[loss=0.06964, simple_loss=0.09211, pruned_loss=0.01445, audio_tagging_loss=0.00913, over 3042903.66 frames. ], batch size: 64, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:52:31,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-22 23:52:41,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2149846.6666666665, ans=0.1 2023-11-22 23:53:01,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2149913.3333333335, ans=0.0 2023-11-22 23:53:06,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322500 2023-11-22 23:53:23,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2150046.6666666665, ans=0.125 2023-11-22 23:53:25,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2150046.6666666665, ans=0.09899494936611666 2023-11-22 23:53:31,655 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9900, loss[loss=0.07013, simple_loss=0.09377, pruned_loss=0.01516, audio_tagging_loss=0.008089, over 15020.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09227, pruned_loss=0.01438, audio_tagging_loss=0.008945, over 3041534.24 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:53:46,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2023-11-22 23:53:52,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-22 23:54:00,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2023-11-22 23:54:12,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322550 2023-11-22 23:54:27,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.162e+01 8.931e+01 9.625e+01 1.127e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-22 23:54:29,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-22 23:54:36,332 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 9950, loss[loss=0.06537, simple_loss=0.09018, pruned_loss=0.01038, audio_tagging_loss=0.009908, over 14985.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09194, pruned_loss=0.01435, audio_tagging_loss=0.008887, over 3034530.77 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2023-11-22 23:54:43,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2150446.6666666665, ans=0.125 2023-11-22 23:55:16,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322600 2023-11-22 23:55:20,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-22 23:55:25,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2150646.6666666665, ans=0.0 2023-11-22 23:55:29,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2150713.3333333335, ans=0.125 2023-11-22 23:55:33,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2150713.3333333335, ans=0.1 2023-11-22 23:55:41,018 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10000, loss[loss=0.0585, simple_loss=0.07846, pruned_loss=0.01067, audio_tagging_loss=0.00859, over 15544.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09186, pruned_loss=0.01439, audio_tagging_loss=0.008858, over 3038154.78 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:55:55,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2150846.6666666665, ans=0.125 2023-11-22 23:56:05,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2150846.6666666665, ans=0.0 2023-11-22 23:56:10,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2150913.3333333335, ans=0.0 2023-11-22 23:56:21,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322650 2023-11-22 23:56:24,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2150980.0, ans=0.1 2023-11-22 23:56:28,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2150980.0, ans=0.2 2023-11-22 23:56:38,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 8.099e+01 8.716e+01 9.532e+01 1.465e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-22 23:56:42,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-22 23:56:47,924 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10050, loss[loss=0.06701, simple_loss=0.09539, pruned_loss=0.01142, audio_tagging_loss=0.00789, over 15499.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09135, pruned_loss=0.01423, audio_tagging_loss=0.009038, over 3043714.66 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:57:19,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2151246.6666666665, ans=0.2 2023-11-22 23:57:27,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322700 2023-11-22 23:57:31,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-22 23:57:52,598 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10100, loss[loss=0.07792, simple_loss=0.1091, pruned_loss=0.01361, audio_tagging_loss=0.009763, over 15102.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09198, pruned_loss=0.01437, audio_tagging_loss=0.009174, over 3045932.60 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:57:52,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2151446.6666666665, ans=0.0 2023-11-22 23:57:54,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-11-22 23:58:19,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2151580.0, ans=0.1 2023-11-22 23:58:33,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322750 2023-11-22 23:58:44,965 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:58:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2151713.3333333335, ans=0.0 2023-11-22 23:58:48,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 8.081e+01 8.819e+01 9.576e+01 1.489e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-22 23:58:48,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2151713.3333333335, ans=0.125 2023-11-22 23:58:57,823 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10150, loss[loss=0.07055, simple_loss=0.08946, pruned_loss=0.01401, audio_tagging_loss=0.01181, over 14412.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09238, pruned_loss=0.01448, audio_tagging_loss=0.009332, over 3042715.87 frames. ], batch size: 55, lr: 2.51e-03, grad_scale: 32.0 2023-11-22 23:58:59,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2023-11-22 23:59:28,828 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-22 23:59:38,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322800 2023-11-22 23:59:40,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2151980.0, ans=0.125 2023-11-22 23:59:41,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2151980.0, ans=0.125 2023-11-22 23:59:42,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2151980.0, ans=0.0 2023-11-22 23:59:50,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2152046.6666666665, ans=0.125 2023-11-23 00:00:04,818 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10200, loss[loss=0.06346, simple_loss=0.08318, pruned_loss=0.008341, audio_tagging_loss=0.01353, over 13824.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09225, pruned_loss=0.01426, audio_tagging_loss=0.009423, over 3041247.83 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:00:07,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2152113.3333333335, ans=0.04949747468305833 2023-11-23 00:00:22,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-23 00:00:26,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2152180.0, ans=0.09899494936611666 2023-11-23 00:00:27,264 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 00:00:43,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322850 2023-11-23 00:00:54,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2152313.3333333335, ans=0.2 2023-11-23 00:00:59,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.282e+01 8.917e+01 9.345e+01 1.171e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 00:01:07,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2152446.6666666665, ans=0.0 2023-11-23 00:01:08,754 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10250, loss[loss=0.05473, simple_loss=0.07123, pruned_loss=0.007984, audio_tagging_loss=0.01113, over 14103.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09177, pruned_loss=0.01435, audio_tagging_loss=0.0095, over 3043396.21 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:01:15,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2152446.6666666665, ans=0.125 2023-11-23 00:01:21,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2152513.3333333335, ans=0.2 2023-11-23 00:01:23,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-23 00:01:24,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2152513.3333333335, ans=0.0 2023-11-23 00:01:26,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2152513.3333333335, ans=0.125 2023-11-23 00:01:31,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2152513.3333333335, ans=0.0 2023-11-23 00:01:33,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2152580.0, ans=0.1 2023-11-23 00:01:37,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2152580.0, ans=0.0 2023-11-23 00:01:39,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2023-11-23 00:01:49,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322900 2023-11-23 00:01:53,979 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:02:06,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2152713.3333333335, ans=0.0 2023-11-23 00:02:13,727 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10300, loss[loss=0.08384, simple_loss=0.1169, pruned_loss=0.01538, audio_tagging_loss=0.01002, over 15061.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09225, pruned_loss=0.01444, audio_tagging_loss=0.00946, over 3043053.79 frames. ], batch size: 58, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:02:14,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2152780.0, ans=0.0 2023-11-23 00:02:20,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2152780.0, ans=0.125 2023-11-23 00:02:21,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2152780.0, ans=0.0 2023-11-23 00:02:31,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2152846.6666666665, ans=0.1 2023-11-23 00:02:53,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 322950 2023-11-23 00:03:08,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2153046.6666666665, ans=10.0 2023-11-23 00:03:10,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.589e+01 8.288e+01 8.993e+01 9.603e+01 1.175e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 00:03:19,123 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10350, loss[loss=0.07359, simple_loss=0.09729, pruned_loss=0.01573, audio_tagging_loss=0.00922, over 16599.00 frames. ], tot_loss[loss=0.0703, simple_loss=0.09283, pruned_loss=0.01448, audio_tagging_loss=0.009408, over 3047166.57 frames. ], batch size: 62, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:03:24,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-11-23 00:03:32,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2153180.0, ans=0.5 2023-11-23 00:03:34,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2153180.0, ans=0.1 2023-11-23 00:03:37,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2153180.0, ans=0.2 2023-11-23 00:03:39,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2153180.0, ans=0.1 2023-11-23 00:03:58,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323000 2023-11-23 00:04:24,481 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10400, loss[loss=0.08268, simple_loss=0.1081, pruned_loss=0.01944, audio_tagging_loss=0.009201, over 15499.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09223, pruned_loss=0.01427, audio_tagging_loss=0.009466, over 3045399.72 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:04:48,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2153580.0, ans=0.035 2023-11-23 00:04:49,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2153580.0, ans=0.125 2023-11-23 00:04:58,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2153580.0, ans=0.125 2023-11-23 00:05:03,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2153646.6666666665, ans=0.0 2023-11-23 00:05:05,643 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323050 2023-11-23 00:05:13,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-23 00:05:14,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2153646.6666666665, ans=0.2 2023-11-23 00:05:19,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2153713.3333333335, ans=0.2 2023-11-23 00:05:21,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.434e+01 9.021e+01 9.797e+01 1.156e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 00:05:28,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2153780.0, ans=0.2 2023-11-23 00:05:28,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2153780.0, ans=0.125 2023-11-23 00:05:29,017 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10450, loss[loss=0.05733, simple_loss=0.07923, pruned_loss=0.01059, audio_tagging_loss=0.00712, over 15684.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.09294, pruned_loss=0.01453, audio_tagging_loss=0.009384, over 3047683.10 frames. ], batch size: 60, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:05:34,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2153780.0, ans=0.1 2023-11-23 00:05:39,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2153780.0, ans=0.0 2023-11-23 00:05:43,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2153846.6666666665, ans=0.0 2023-11-23 00:06:03,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2153913.3333333335, ans=0.07 2023-11-23 00:06:09,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323100 2023-11-23 00:06:14,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2153980.0, ans=0.0 2023-11-23 00:06:16,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-23 00:06:20,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2154046.6666666665, ans=0.0 2023-11-23 00:06:20,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2154046.6666666665, ans=0.125 2023-11-23 00:06:34,663 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10500, loss[loss=0.05941, simple_loss=0.07175, pruned_loss=0.01094, audio_tagging_loss=0.01259, over 13643.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.0924, pruned_loss=0.01439, audio_tagging_loss=0.009389, over 3039042.09 frames. ], batch size: 54, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:06:43,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2154113.3333333335, ans=0.0 2023-11-23 00:06:54,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2154180.0, ans=0.125 2023-11-23 00:07:14,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323150 2023-11-23 00:07:22,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2154313.3333333335, ans=0.0 2023-11-23 00:07:32,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.185e+01 8.836e+01 9.458e+01 1.353e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 00:07:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2154380.0, ans=0.0 2023-11-23 00:07:39,212 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10550, loss[loss=0.06979, simple_loss=0.09517, pruned_loss=0.01594, audio_tagging_loss=0.006262, over 14894.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09212, pruned_loss=0.01427, audio_tagging_loss=0.00919, over 3040152.30 frames. ], batch size: 58, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:08:19,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323200 2023-11-23 00:08:22,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2154646.6666666665, ans=0.125 2023-11-23 00:08:35,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2154713.3333333335, ans=0.2 2023-11-23 00:08:43,881 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10600, loss[loss=0.0583, simple_loss=0.07463, pruned_loss=0.0122, audio_tagging_loss=0.008793, over 14977.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09206, pruned_loss=0.01429, audio_tagging_loss=0.009169, over 3044128.62 frames. ], batch size: 60, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:08:51,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2154780.0, ans=0.2 2023-11-23 00:08:56,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2154846.6666666665, ans=0.5 2023-11-23 00:09:10,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-11-23 00:09:18,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2154913.3333333335, ans=0.125 2023-11-23 00:09:24,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323250 2023-11-23 00:09:27,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2023-11-23 00:09:33,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2154980.0, ans=0.125 2023-11-23 00:09:36,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-23 00:09:41,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.748e+01 7.982e+01 8.705e+01 9.514e+01 1.291e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-23 00:09:48,205 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10650, loss[loss=0.06983, simple_loss=0.09476, pruned_loss=0.01461, audio_tagging_loss=0.007844, over 15292.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09197, pruned_loss=0.01432, audio_tagging_loss=0.009198, over 3035402.43 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:09:57,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2155113.3333333335, ans=0.0 2023-11-23 00:09:57,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.65 vs. limit=15.0 2023-11-23 00:10:25,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-23 00:10:28,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323300 2023-11-23 00:10:28,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2155313.3333333335, ans=0.125 2023-11-23 00:10:40,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-23 00:10:41,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2155380.0, ans=0.2 2023-11-23 00:10:48,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-23 00:10:52,975 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10700, loss[loss=0.08482, simple_loss=0.111, pruned_loss=0.01947, audio_tagging_loss=0.009875, over 15482.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09281, pruned_loss=0.01438, audio_tagging_loss=0.00911, over 3036832.74 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:11:12,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2155513.3333333335, ans=0.07 2023-11-23 00:11:33,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323350 2023-11-23 00:11:50,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.154e+01 8.901e+01 9.766e+01 1.298e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 00:11:57,403 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10750, loss[loss=0.06678, simple_loss=0.09051, pruned_loss=0.01401, audio_tagging_loss=0.007518, over 15035.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09269, pruned_loss=0.0143, audio_tagging_loss=0.009158, over 3041109.83 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:11:58,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2155780.0, ans=0.1 2023-11-23 00:12:01,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2155780.0, ans=0.0 2023-11-23 00:12:20,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2155846.6666666665, ans=0.2 2023-11-23 00:12:37,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323400 2023-11-23 00:12:45,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2155980.0, ans=0.0 2023-11-23 00:12:54,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-23 00:13:01,572 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10800, loss[loss=0.04509, simple_loss=0.05134, pruned_loss=0.008279, audio_tagging_loss=0.01114, over 14353.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.0924, pruned_loss=0.01408, audio_tagging_loss=0.009143, over 3044068.84 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:13:01,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2156113.3333333335, ans=0.2 2023-11-23 00:13:03,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2156113.3333333335, ans=0.125 2023-11-23 00:13:15,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2156180.0, ans=0.125 2023-11-23 00:13:29,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2156246.6666666665, ans=0.0 2023-11-23 00:13:37,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-23 00:13:38,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2156246.6666666665, ans=0.1 2023-11-23 00:13:41,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323450 2023-11-23 00:13:49,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2156313.3333333335, ans=0.125 2023-11-23 00:13:52,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2156380.0, ans=0.125 2023-11-23 00:13:56,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2156380.0, ans=0.125 2023-11-23 00:14:00,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.240e+01 8.825e+01 9.425e+01 1.162e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 00:14:06,738 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10850, loss[loss=0.07902, simple_loss=0.1088, pruned_loss=0.01559, audio_tagging_loss=0.009019, over 14132.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.0928, pruned_loss=0.01427, audio_tagging_loss=0.009143, over 3042556.30 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:14:23,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2156513.3333333335, ans=0.125 2023-11-23 00:14:46,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323500 2023-11-23 00:14:51,078 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:14:53,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2156646.6666666665, ans=0.0 2023-11-23 00:15:04,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2156713.3333333335, ans=0.125 2023-11-23 00:15:06,849 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 00:15:10,530 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10900, loss[loss=0.07739, simple_loss=0.1137, pruned_loss=0.01308, audio_tagging_loss=0.007466, over 16153.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09299, pruned_loss=0.01436, audio_tagging_loss=0.009135, over 3043018.10 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:15:10,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2156780.0, ans=0.125 2023-11-23 00:15:17,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0 2023-11-23 00:15:28,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2023-11-23 00:15:51,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323550 2023-11-23 00:16:10,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.063e+01 8.887e+01 9.492e+01 1.128e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-23 00:16:10,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2157046.6666666665, ans=0.05 2023-11-23 00:16:15,463 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 10950, loss[loss=0.06333, simple_loss=0.0765, pruned_loss=0.01381, audio_tagging_loss=0.01127, over 15259.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09364, pruned_loss=0.01447, audio_tagging_loss=0.009151, over 3044789.53 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:16:17,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2157113.3333333335, ans=0.125 2023-11-23 00:16:22,583 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:16:28,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2157180.0, ans=0.0 2023-11-23 00:16:55,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323600 2023-11-23 00:17:17,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2157380.0, ans=0.2 2023-11-23 00:17:20,239 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11000, loss[loss=0.08383, simple_loss=0.1131, pruned_loss=0.02033, audio_tagging_loss=0.006969, over 14644.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09356, pruned_loss=0.01444, audio_tagging_loss=0.009114, over 3043483.00 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:17:24,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2157446.6666666665, ans=0.2 2023-11-23 00:17:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2157446.6666666665, ans=0.09899494936611666 2023-11-23 00:17:30,164 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 00:17:37,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2157513.3333333335, ans=0.125 2023-11-23 00:17:41,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=22.5 2023-11-23 00:17:43,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2023-11-23 00:17:46,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2157580.0, ans=0.125 2023-11-23 00:17:53,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2157580.0, ans=0.125 2023-11-23 00:17:59,834 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323650 2023-11-23 00:18:03,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2157646.6666666665, ans=0.0 2023-11-23 00:18:18,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.419e+01 9.027e+01 1.014e+02 1.259e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 00:18:24,009 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11050, loss[loss=0.06641, simple_loss=0.08636, pruned_loss=0.01355, audio_tagging_loss=0.009684, over 14594.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09358, pruned_loss=0.0145, audio_tagging_loss=0.009137, over 3050333.20 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:18:37,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2157846.6666666665, ans=0.2 2023-11-23 00:18:40,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2157846.6666666665, ans=0.125 2023-11-23 00:18:52,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-23 00:19:04,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323700 2023-11-23 00:19:16,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-23 00:19:22,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2158046.6666666665, ans=0.2 2023-11-23 00:19:28,755 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11100, loss[loss=0.0596, simple_loss=0.07246, pruned_loss=0.01165, audio_tagging_loss=0.01172, over 13641.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09311, pruned_loss=0.01439, audio_tagging_loss=0.009265, over 3047413.51 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:19:42,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-23 00:19:56,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2158246.6666666665, ans=0.125 2023-11-23 00:20:04,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2023-11-23 00:20:07,624 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:20:08,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323750 2023-11-23 00:20:13,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2158313.3333333335, ans=0.125 2023-11-23 00:20:28,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.340e+01 9.049e+01 9.831e+01 1.206e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 00:20:29,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2158380.0, ans=0.2 2023-11-23 00:20:34,611 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11150, loss[loss=0.06362, simple_loss=0.08175, pruned_loss=0.0131, audio_tagging_loss=0.009643, over 15000.00 frames. ], tot_loss[loss=0.06992, simple_loss=0.09235, pruned_loss=0.0143, audio_tagging_loss=0.009439, over 3046200.74 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:20:36,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2158446.6666666665, ans=0.2 2023-11-23 00:20:48,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2158513.3333333335, ans=0.125 2023-11-23 00:21:04,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-23 00:21:14,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323800 2023-11-23 00:21:24,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2158646.6666666665, ans=0.2 2023-11-23 00:21:24,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2158646.6666666665, ans=0.2 2023-11-23 00:21:29,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2158713.3333333335, ans=0.125 2023-11-23 00:21:39,494 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11200, loss[loss=0.06158, simple_loss=0.07662, pruned_loss=0.01289, audio_tagging_loss=0.01038, over 13893.00 frames. ], tot_loss[loss=0.07017, simple_loss=0.09249, pruned_loss=0.01437, audio_tagging_loss=0.009548, over 3050471.69 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:21:49,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2158780.0, ans=0.125 2023-11-23 00:22:20,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323850 2023-11-23 00:22:21,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2158980.0, ans=0.5 2023-11-23 00:22:36,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2159046.6666666665, ans=0.0 2023-11-23 00:22:39,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 8.191e+01 8.566e+01 9.520e+01 1.333e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-23 00:22:40,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2023-11-23 00:22:44,582 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11250, loss[loss=0.08, simple_loss=0.1001, pruned_loss=0.02135, audio_tagging_loss=0.008618, over 16065.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09247, pruned_loss=0.01434, audio_tagging_loss=0.00943, over 3045923.18 frames. ], batch size: 58, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:23:25,467 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323900 2023-11-23 00:23:26,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2159313.3333333335, ans=0.2 2023-11-23 00:23:34,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2159313.3333333335, ans=0.1 2023-11-23 00:23:50,775 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11300, loss[loss=0.06473, simple_loss=0.08172, pruned_loss=0.01124, audio_tagging_loss=0.01264, over 13628.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09293, pruned_loss=0.01434, audio_tagging_loss=0.009236, over 3045891.66 frames. ], batch size: 52, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:24:25,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2159580.0, ans=0.0 2023-11-23 00:24:29,403 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 323950 2023-11-23 00:24:41,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2159713.3333333335, ans=0.2 2023-11-23 00:24:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2159713.3333333335, ans=0.125 2023-11-23 00:24:51,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.358e+01 9.001e+01 9.797e+01 1.218e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-23 00:24:51,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2159713.3333333335, ans=0.125 2023-11-23 00:24:55,601 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11350, loss[loss=0.06833, simple_loss=0.09486, pruned_loss=0.01286, audio_tagging_loss=0.008036, over 15600.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09338, pruned_loss=0.01437, audio_tagging_loss=0.009174, over 3042884.08 frames. ], batch size: 60, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:25:06,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2159780.0, ans=0.125 2023-11-23 00:25:08,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2159846.6666666665, ans=0.125 2023-11-23 00:25:35,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2159980.0, ans=0.125 2023-11-23 00:25:36,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324000 2023-11-23 00:26:03,569 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11400, loss[loss=0.06588, simple_loss=0.08879, pruned_loss=0.01323, audio_tagging_loss=0.008247, over 15469.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09362, pruned_loss=0.01438, audio_tagging_loss=0.009066, over 3046285.96 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:26:15,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2160180.0, ans=0.125 2023-11-23 00:26:28,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2160180.0, ans=0.125 2023-11-23 00:26:44,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324050 2023-11-23 00:26:47,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2160313.3333333335, ans=0.125 2023-11-23 00:26:56,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2160380.0, ans=0.125 2023-11-23 00:26:56,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-23 00:27:05,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.089e+01 8.744e+01 9.515e+01 1.376e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-23 00:27:08,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-23 00:27:09,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2160446.6666666665, ans=0.05 2023-11-23 00:27:10,218 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11450, loss[loss=0.09724, simple_loss=0.1274, pruned_loss=0.02666, audio_tagging_loss=0.006881, over 15021.00 frames. ], tot_loss[loss=0.0707, simple_loss=0.09421, pruned_loss=0.01454, audio_tagging_loss=0.009052, over 3043486.90 frames. ], batch size: 55, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:27:35,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2160580.0, ans=0.0 2023-11-23 00:27:38,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-23 00:27:50,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324100 2023-11-23 00:27:54,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2160646.6666666665, ans=0.125 2023-11-23 00:28:16,247 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11500, loss[loss=0.06744, simple_loss=0.09106, pruned_loss=0.01136, audio_tagging_loss=0.01055, over 15052.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09278, pruned_loss=0.01422, audio_tagging_loss=0.009221, over 3040472.26 frames. ], batch size: 55, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:28:17,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2160780.0, ans=0.1 2023-11-23 00:28:24,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2160780.0, ans=0.1 2023-11-23 00:28:57,833 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324150 2023-11-23 00:29:04,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2160980.0, ans=0.1 2023-11-23 00:29:18,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 7.992e+01 8.677e+01 9.342e+01 1.172e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-23 00:29:22,170 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11550, loss[loss=0.07309, simple_loss=0.101, pruned_loss=0.01383, audio_tagging_loss=0.008764, over 15334.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09211, pruned_loss=0.01415, audio_tagging_loss=0.009218, over 3039207.17 frames. ], batch size: 58, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:29:39,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-23 00:29:52,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-11-23 00:29:57,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2161246.6666666665, ans=0.05 2023-11-23 00:30:03,372 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 00:30:04,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324200 2023-11-23 00:30:12,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2161313.3333333335, ans=0.125 2023-11-23 00:30:26,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2161380.0, ans=0.125 2023-11-23 00:30:29,747 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11600, loss[loss=0.06542, simple_loss=0.09494, pruned_loss=0.01231, audio_tagging_loss=0.005648, over 15143.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09198, pruned_loss=0.01414, audio_tagging_loss=0.009191, over 3041383.77 frames. ], batch size: 58, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:30:36,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2161446.6666666665, ans=0.125 2023-11-23 00:30:59,667 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:31:03,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2161580.0, ans=0.125 2023-11-23 00:31:10,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324250 2023-11-23 00:31:17,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2023-11-23 00:31:26,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-23 00:31:26,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2161713.3333333335, ans=0.125 2023-11-23 00:31:29,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2161713.3333333335, ans=0.0 2023-11-23 00:31:32,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.394e+01 8.884e+01 9.610e+01 1.253e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-23 00:31:36,390 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11650, loss[loss=0.06334, simple_loss=0.07839, pruned_loss=0.01527, audio_tagging_loss=0.008871, over 15213.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09297, pruned_loss=0.0143, audio_tagging_loss=0.009152, over 3043342.04 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:31:42,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2023-11-23 00:32:04,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2161913.3333333335, ans=0.125 2023-11-23 00:32:04,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-23 00:32:08,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-23 00:32:11,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2161913.3333333335, ans=0.0 2023-11-23 00:32:12,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2161913.3333333335, ans=0.125 2023-11-23 00:32:17,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324300 2023-11-23 00:32:17,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2161980.0, ans=0.1 2023-11-23 00:32:35,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2162046.6666666665, ans=0.125 2023-11-23 00:32:36,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2023-11-23 00:32:41,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2162113.3333333335, ans=0.125 2023-11-23 00:32:42,007 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11700, loss[loss=0.07206, simple_loss=0.09155, pruned_loss=0.01399, audio_tagging_loss=0.01229, over 15425.00 frames. ], tot_loss[loss=0.07078, simple_loss=0.09414, pruned_loss=0.01454, audio_tagging_loss=0.009169, over 3048388.29 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:32:44,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2162113.3333333335, ans=0.2 2023-11-23 00:33:06,703 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:33:09,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-23 00:33:20,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-23 00:33:23,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324350 2023-11-23 00:33:37,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2162380.0, ans=0.125 2023-11-23 00:33:40,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-23 00:33:43,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.270e+01 8.836e+01 9.259e+01 1.366e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 00:33:46,998 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11750, loss[loss=0.06006, simple_loss=0.0806, pruned_loss=0.01187, audio_tagging_loss=0.007893, over 16555.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09347, pruned_loss=0.0143, audio_tagging_loss=0.009112, over 3054928.30 frames. ], batch size: 62, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:33:52,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2162446.6666666665, ans=0.125 2023-11-23 00:33:53,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2162446.6666666665, ans=0.1 2023-11-23 00:34:01,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2162513.3333333335, ans=0.0 2023-11-23 00:34:15,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2162580.0, ans=0.125 2023-11-23 00:34:22,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2162580.0, ans=0.125 2023-11-23 00:34:28,617 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324400 2023-11-23 00:34:44,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.24 vs. limit=10.0 2023-11-23 00:34:54,177 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11800, loss[loss=0.07317, simple_loss=0.09719, pruned_loss=0.01531, audio_tagging_loss=0.00927, over 14739.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09201, pruned_loss=0.01412, audio_tagging_loss=0.009198, over 3053240.82 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:35:12,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2162846.6666666665, ans=0.0 2023-11-23 00:35:19,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2162913.3333333335, ans=0.125 2023-11-23 00:35:34,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324450 2023-11-23 00:35:53,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-23 00:35:57,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.225e+01 9.023e+01 9.525e+01 1.272e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 00:35:59,717 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11850, loss[loss=0.08093, simple_loss=0.1021, pruned_loss=0.02047, audio_tagging_loss=0.009399, over 14576.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09277, pruned_loss=0.01424, audio_tagging_loss=0.009179, over 3051526.73 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:36:00,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2163113.3333333335, ans=0.125 2023-11-23 00:36:05,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-11-23 00:36:06,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2163113.3333333335, ans=0.0 2023-11-23 00:36:11,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2163180.0, ans=0.1 2023-11-23 00:36:26,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2163246.6666666665, ans=0.125 2023-11-23 00:36:28,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2163246.6666666665, ans=0.1 2023-11-23 00:36:28,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-23 00:36:32,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-23 00:36:33,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2163246.6666666665, ans=0.025 2023-11-23 00:36:34,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2163246.6666666665, ans=0.125 2023-11-23 00:36:40,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324500 2023-11-23 00:37:04,525 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11900, loss[loss=0.05901, simple_loss=0.06508, pruned_loss=0.01432, audio_tagging_loss=0.01215, over 14289.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09217, pruned_loss=0.01411, audio_tagging_loss=0.009332, over 3045717.91 frames. ], batch size: 55, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:37:20,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-23 00:37:45,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324550 2023-11-23 00:38:03,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2163713.3333333335, ans=0.125 2023-11-23 00:38:08,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.374e+01 8.930e+01 9.704e+01 2.155e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-23 00:38:10,884 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 11950, loss[loss=0.05994, simple_loss=0.08089, pruned_loss=0.009533, audio_tagging_loss=0.009962, over 15006.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09168, pruned_loss=0.01391, audio_tagging_loss=0.009362, over 3037750.60 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 16.0 2023-11-23 00:38:13,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2163780.0, ans=0.0 2023-11-23 00:38:36,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2163913.3333333335, ans=0.125 2023-11-23 00:38:49,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324600 2023-11-23 00:38:56,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2163980.0, ans=0.125 2023-11-23 00:39:02,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-23 00:39:03,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2164046.6666666665, ans=0.1 2023-11-23 00:39:14,347 INFO [train_asr.py:1221] (1/4) Epoch 27, batch 12000, loss[loss=0.05168, simple_loss=0.06126, pruned_loss=0.009587, audio_tagging_loss=0.01146, over 14656.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09138, pruned_loss=0.01393, audio_tagging_loss=0.009505, over 3040614.30 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2023-11-23 00:39:14,348 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 00:39:57,328 INFO [train_asr.py:1253] (1/4) Epoch 27, validation: loss=0.05869, simple_loss=0.05138, pruned_loss=0.005099, audio_tagging_loss=0.0279, over 4681554.00 frames. 2023-11-23 00:39:57,329 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 00:41:02,699 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 0, loss[loss=0.09839, simple_loss=0.115, pruned_loss=0.02282, audio_tagging_loss=0.01808, over 15424.00 frames. ], tot_loss[loss=0.09839, simple_loss=0.115, pruned_loss=0.02282, audio_tagging_loss=0.01808, over 15424.00 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:41:02,700 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 00:41:39,757 INFO [train_asr.py:1253] (1/4) Epoch 28, validation: loss=0.0583, simple_loss=0.05139, pruned_loss=0.005161, audio_tagging_loss=0.02744, over 4681554.00 frames. 2023-11-23 00:41:39,758 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 00:41:46,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2164280.0, ans=0.125 2023-11-23 00:41:47,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324650 2023-11-23 00:41:55,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2164346.6666666665, ans=0.125 2023-11-23 00:41:55,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2164346.6666666665, ans=0.0 2023-11-23 00:42:09,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.610e+01 9.593e+01 1.048e+02 1.343e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-23 00:42:32,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-23 00:42:35,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2164546.6666666665, ans=0.1 2023-11-23 00:42:43,133 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 50, loss[loss=0.09236, simple_loss=0.116, pruned_loss=0.01701, audio_tagging_loss=0.01734, over 15778.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.09272, pruned_loss=0.01541, audio_tagging_loss=0.01764, over 684204.71 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:42:43,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=22.5 2023-11-23 00:42:50,588 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324700 2023-11-23 00:42:53,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-11-23 00:42:56,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2164680.0, ans=0.125 2023-11-23 00:42:57,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-23 00:43:01,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-11-23 00:43:28,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2164813.3333333335, ans=0.0 2023-11-23 00:43:28,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2164813.3333333335, ans=0.125 2023-11-23 00:43:29,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2164813.3333333335, ans=0.125 2023-11-23 00:43:30,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2164813.3333333335, ans=0.1 2023-11-23 00:43:35,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-23 00:43:46,726 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 100, loss[loss=0.07853, simple_loss=0.09511, pruned_loss=0.01758, audio_tagging_loss=0.01339, over 14338.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09334, pruned_loss=0.01522, audio_tagging_loss=0.01713, over 1213961.31 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:43:54,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324750 2023-11-23 00:44:17,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 9.043e+01 9.747e+01 1.056e+02 1.211e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-23 00:44:18,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2165080.0, ans=0.2 2023-11-23 00:44:19,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2165080.0, ans=0.1 2023-11-23 00:44:21,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-23 00:44:28,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-23 00:44:38,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2165213.3333333335, ans=0.125 2023-11-23 00:44:49,961 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 150, loss[loss=0.07885, simple_loss=0.1018, pruned_loss=0.01568, audio_tagging_loss=0.01228, over 17151.00 frames. ], tot_loss[loss=0.07755, simple_loss=0.09487, pruned_loss=0.01505, audio_tagging_loss=0.01506, over 1620027.46 frames. ], batch size: 64, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:44:52,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2165280.0, ans=0.125 2023-11-23 00:44:57,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-23 00:44:58,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324800 2023-11-23 00:45:11,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-11-23 00:45:15,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2165413.3333333335, ans=0.125 2023-11-23 00:45:49,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2165546.6666666665, ans=0.0 2023-11-23 00:45:55,047 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 200, loss[loss=0.06337, simple_loss=0.08123, pruned_loss=0.01533, audio_tagging_loss=0.00742, over 15755.00 frames. ], tot_loss[loss=0.0768, simple_loss=0.09624, pruned_loss=0.01535, audio_tagging_loss=0.01333, over 1934516.66 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:46:00,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2165613.3333333335, ans=0.1 2023-11-23 00:46:02,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324850 2023-11-23 00:46:16,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2165680.0, ans=0.1 2023-11-23 00:46:25,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.466e+01 9.149e+01 9.933e+01 1.935e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 00:46:50,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2165880.0, ans=0.125 2023-11-23 00:46:58,570 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 250, loss[loss=0.0599, simple_loss=0.07421, pruned_loss=0.01288, audio_tagging_loss=0.009917, over 14248.00 frames. ], tot_loss[loss=0.07524, simple_loss=0.09549, pruned_loss=0.01532, audio_tagging_loss=0.01217, over 2177922.54 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:47:06,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324900 2023-11-23 00:47:15,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2166013.3333333335, ans=0.05 2023-11-23 00:47:21,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-23 00:47:32,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2166080.0, ans=0.2 2023-11-23 00:47:41,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2166146.6666666665, ans=0.0 2023-11-23 00:47:43,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2166146.6666666665, ans=0.2 2023-11-23 00:47:56,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2166213.3333333335, ans=0.125 2023-11-23 00:48:03,097 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 300, loss[loss=0.08515, simple_loss=0.1139, pruned_loss=0.01823, audio_tagging_loss=0.009947, over 15123.00 frames. ], tot_loss[loss=0.07416, simple_loss=0.09545, pruned_loss=0.01515, audio_tagging_loss=0.01129, over 2367996.94 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:48:03,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2166280.0, ans=0.125 2023-11-23 00:48:11,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 324950 2023-11-23 00:48:12,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2166280.0, ans=0.125 2023-11-23 00:48:18,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-23 00:48:34,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.323e+01 9.001e+01 1.002e+02 1.826e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-23 00:48:43,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2166480.0, ans=0.2 2023-11-23 00:48:45,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2166480.0, ans=0.0 2023-11-23 00:48:47,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-11-23 00:49:09,335 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 350, loss[loss=0.0691, simple_loss=0.09233, pruned_loss=0.01178, audio_tagging_loss=0.01115, over 15031.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.0961, pruned_loss=0.0151, audio_tagging_loss=0.01055, over 2515812.71 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 00:49:10,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2166613.3333333335, ans=0.125 2023-11-23 00:49:16,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325000 2023-11-23 00:49:18,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2166613.3333333335, ans=0.125 2023-11-23 00:49:53,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-23 00:50:01,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2166880.0, ans=0.5 2023-11-23 00:50:02,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2166880.0, ans=0.0 2023-11-23 00:50:12,987 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 400, loss[loss=0.05241, simple_loss=0.06342, pruned_loss=0.01039, audio_tagging_loss=0.01031, over 14281.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09587, pruned_loss=0.01503, audio_tagging_loss=0.01016, over 2628276.57 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:50:20,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325050 2023-11-23 00:50:23,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2166946.6666666665, ans=0.0 2023-11-23 00:50:26,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2167013.3333333335, ans=0.125 2023-11-23 00:50:37,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2167013.3333333335, ans=0.07 2023-11-23 00:50:43,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2167080.0, ans=0.2 2023-11-23 00:50:45,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.024e+01 8.175e+01 8.785e+01 9.638e+01 1.202e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-23 00:51:05,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2167213.3333333335, ans=0.125 2023-11-23 00:51:17,312 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 450, loss[loss=0.07171, simple_loss=0.0973, pruned_loss=0.01439, audio_tagging_loss=0.008661, over 15514.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.0941, pruned_loss=0.01476, audio_tagging_loss=0.00993, over 2721569.92 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:51:24,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2167280.0, ans=0.125 2023-11-23 00:51:25,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325100 2023-11-23 00:51:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2167413.3333333335, ans=0.125 2023-11-23 00:51:49,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-23 00:52:01,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2167480.0, ans=0.0 2023-11-23 00:52:18,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2167546.6666666665, ans=0.0 2023-11-23 00:52:20,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-11-23 00:52:23,274 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 500, loss[loss=0.05103, simple_loss=0.05828, pruned_loss=0.01025, audio_tagging_loss=0.01164, over 15674.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.09364, pruned_loss=0.01458, audio_tagging_loss=0.009772, over 2788987.35 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:52:30,634 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325150 2023-11-23 00:52:30,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2167613.3333333335, ans=0.125 2023-11-23 00:52:33,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2167613.3333333335, ans=0.0 2023-11-23 00:52:38,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2167680.0, ans=0.0 2023-11-23 00:52:54,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.145e+01 8.686e+01 9.449e+01 1.268e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-23 00:53:10,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2167813.3333333335, ans=0.125 2023-11-23 00:53:27,163 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 550, loss[loss=0.09226, simple_loss=0.1194, pruned_loss=0.02005, audio_tagging_loss=0.01252, over 15158.00 frames. ], tot_loss[loss=0.07129, simple_loss=0.09397, pruned_loss=0.01469, audio_tagging_loss=0.00961, over 2843631.24 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:53:34,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325200 2023-11-23 00:53:44,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2168013.3333333335, ans=0.125 2023-11-23 00:53:56,415 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 00:54:02,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2168080.0, ans=0.05 2023-11-23 00:54:11,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2168146.6666666665, ans=0.125 2023-11-23 00:54:28,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2023-11-23 00:54:29,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2168213.3333333335, ans=0.0 2023-11-23 00:54:32,706 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 600, loss[loss=0.04216, simple_loss=0.05566, pruned_loss=0.006288, audio_tagging_loss=0.008047, over 16059.00 frames. ], tot_loss[loss=0.07052, simple_loss=0.09307, pruned_loss=0.01448, audio_tagging_loss=0.0095, over 2887080.12 frames. ], batch size: 62, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:54:40,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325250 2023-11-23 00:54:53,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2168346.6666666665, ans=0.125 2023-11-23 00:55:02,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2168413.3333333335, ans=0.1 2023-11-23 00:55:04,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.119e+01 8.698e+01 9.405e+01 1.451e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-23 00:55:08,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2168413.3333333335, ans=0.1 2023-11-23 00:55:18,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2168480.0, ans=0.125 2023-11-23 00:55:19,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2023-11-23 00:55:23,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2168546.6666666665, ans=0.95 2023-11-23 00:55:32,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2023-11-23 00:55:36,582 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 650, loss[loss=0.07034, simple_loss=0.1009, pruned_loss=0.0133, audio_tagging_loss=0.006566, over 15505.00 frames. ], tot_loss[loss=0.07121, simple_loss=0.09422, pruned_loss=0.01467, audio_tagging_loss=0.009436, over 2934969.42 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:55:44,155 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325300 2023-11-23 00:55:52,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2168680.0, ans=0.0 2023-11-23 00:56:07,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2168746.6666666665, ans=0.125 2023-11-23 00:56:29,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2168880.0, ans=0.125 2023-11-23 00:56:34,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2168880.0, ans=0.07 2023-11-23 00:56:37,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2168880.0, ans=0.2 2023-11-23 00:56:40,458 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 700, loss[loss=0.06631, simple_loss=0.08311, pruned_loss=0.01479, audio_tagging_loss=0.009966, over 14516.00 frames. ], tot_loss[loss=0.07091, simple_loss=0.094, pruned_loss=0.01452, audio_tagging_loss=0.009391, over 2966492.32 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:56:47,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325350 2023-11-23 00:56:58,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2169013.3333333335, ans=0.1 2023-11-23 00:57:07,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2169080.0, ans=0.125 2023-11-23 00:57:13,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.373e+01 9.027e+01 9.909e+01 1.159e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 00:57:17,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2169080.0, ans=22.5 2023-11-23 00:57:23,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2169146.6666666665, ans=0.125 2023-11-23 00:57:44,762 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 750, loss[loss=0.08711, simple_loss=0.1134, pruned_loss=0.01815, audio_tagging_loss=0.01225, over 15287.00 frames. ], tot_loss[loss=0.07106, simple_loss=0.09424, pruned_loss=0.01455, audio_tagging_loss=0.009394, over 2994358.57 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:57:48,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2169280.0, ans=0.1 2023-11-23 00:57:52,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325400 2023-11-23 00:58:10,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-23 00:58:11,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2169413.3333333335, ans=0.125 2023-11-23 00:58:26,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2169480.0, ans=0.125 2023-11-23 00:58:49,713 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 800, loss[loss=0.08514, simple_loss=0.1166, pruned_loss=0.01805, audio_tagging_loss=0.008803, over 14802.00 frames. ], tot_loss[loss=0.07114, simple_loss=0.09411, pruned_loss=0.01463, audio_tagging_loss=0.00945, over 3001695.48 frames. ], batch size: 53, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 00:58:57,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325450 2023-11-23 00:59:02,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2169680.0, ans=10.0 2023-11-23 00:59:08,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2169680.0, ans=0.125 2023-11-23 00:59:10,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2169680.0, ans=0.125 2023-11-23 00:59:16,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2169746.6666666665, ans=0.0 2023-11-23 00:59:21,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.109e+01 8.929e+01 9.816e+01 1.280e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 00:59:29,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2169813.3333333335, ans=0.125 2023-11-23 00:59:54,379 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 850, loss[loss=0.06779, simple_loss=0.09197, pruned_loss=0.01122, audio_tagging_loss=0.01059, over 16702.00 frames. ], tot_loss[loss=0.07143, simple_loss=0.09442, pruned_loss=0.01474, audio_tagging_loss=0.009479, over 3012317.49 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 00:59:57,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-23 01:00:01,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325500 2023-11-23 01:00:06,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2170013.3333333335, ans=0.0 2023-11-23 01:00:09,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2170013.3333333335, ans=0.125 2023-11-23 01:00:24,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2170080.0, ans=0.1 2023-11-23 01:00:36,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2170146.6666666665, ans=0.0 2023-11-23 01:00:45,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2170213.3333333335, ans=0.0 2023-11-23 01:00:57,813 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 900, loss[loss=0.05596, simple_loss=0.07419, pruned_loss=0.009852, audio_tagging_loss=0.009016, over 14375.00 frames. ], tot_loss[loss=0.07123, simple_loss=0.09386, pruned_loss=0.01477, audio_tagging_loss=0.009526, over 3018354.76 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:01:03,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2170280.0, ans=0.125 2023-11-23 01:01:06,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325550 2023-11-23 01:01:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2170346.6666666665, ans=0.1 2023-11-23 01:01:20,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2170346.6666666665, ans=0.2 2023-11-23 01:01:32,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 8.072e+01 8.678e+01 9.441e+01 1.146e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-23 01:01:54,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2170546.6666666665, ans=0.125 2023-11-23 01:02:00,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2170546.6666666665, ans=0.125 2023-11-23 01:02:02,703 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 950, loss[loss=0.04523, simple_loss=0.05872, pruned_loss=0.005991, audio_tagging_loss=0.009876, over 13806.00 frames. ], tot_loss[loss=0.07054, simple_loss=0.0931, pruned_loss=0.01459, audio_tagging_loss=0.009398, over 3025621.79 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:02:05,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-23 01:02:07,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2170613.3333333335, ans=0.1 2023-11-23 01:02:08,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2170613.3333333335, ans=0.125 2023-11-23 01:02:11,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325600 2023-11-23 01:02:32,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2170746.6666666665, ans=0.0 2023-11-23 01:02:37,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2170746.6666666665, ans=0.125 2023-11-23 01:02:48,209 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:02:49,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2170813.3333333335, ans=0.1 2023-11-23 01:02:49,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2170813.3333333335, ans=0.0 2023-11-23 01:02:49,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2023-11-23 01:03:08,279 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1000, loss[loss=0.07675, simple_loss=0.1042, pruned_loss=0.01452, audio_tagging_loss=0.01014, over 14879.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09267, pruned_loss=0.01438, audio_tagging_loss=0.009294, over 3030518.74 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:03:15,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2170946.6666666665, ans=0.0 2023-11-23 01:03:16,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325650 2023-11-23 01:03:34,963 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:03:41,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.239e+01 8.945e+01 9.781e+01 1.255e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 01:03:49,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2171146.6666666665, ans=0.0 2023-11-23 01:03:50,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-23 01:03:53,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-23 01:04:11,922 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1050, loss[loss=0.08472, simple_loss=0.1167, pruned_loss=0.01955, audio_tagging_loss=0.006822, over 14972.00 frames. ], tot_loss[loss=0.0699, simple_loss=0.09256, pruned_loss=0.0144, audio_tagging_loss=0.009218, over 3032967.65 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:04:19,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325700 2023-11-23 01:04:21,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2171280.0, ans=0.125 2023-11-23 01:04:58,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2171480.0, ans=0.125 2023-11-23 01:05:07,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2171546.6666666665, ans=0.0 2023-11-23 01:05:15,390 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1100, loss[loss=0.07858, simple_loss=0.1145, pruned_loss=0.01405, audio_tagging_loss=0.007287, over 14582.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09324, pruned_loss=0.01455, audio_tagging_loss=0.009173, over 3029325.71 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:05:18,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2171613.3333333335, ans=0.125 2023-11-23 01:05:19,773 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:05:23,579 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325750 2023-11-23 01:05:31,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2171680.0, ans=0.125 2023-11-23 01:05:33,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-23 01:05:37,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2171680.0, ans=0.0 2023-11-23 01:05:37,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-23 01:05:48,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.331e+01 9.113e+01 9.844e+01 1.667e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-23 01:05:53,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2171813.3333333335, ans=0.125 2023-11-23 01:06:00,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-11-23 01:06:20,041 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1150, loss[loss=0.0636, simple_loss=0.08484, pruned_loss=0.01319, audio_tagging_loss=0.007984, over 14906.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09368, pruned_loss=0.0145, audio_tagging_loss=0.009017, over 3032162.60 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:06:27,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325800 2023-11-23 01:06:50,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2172080.0, ans=0.125 2023-11-23 01:06:53,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2172080.0, ans=0.0 2023-11-23 01:07:14,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2172213.3333333335, ans=0.125 2023-11-23 01:07:24,590 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1200, loss[loss=0.0811, simple_loss=0.1181, pruned_loss=0.01669, audio_tagging_loss=0.005356, over 15467.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.09377, pruned_loss=0.01448, audio_tagging_loss=0.009122, over 3027018.73 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 01:07:31,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-11-23 01:07:32,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325850 2023-11-23 01:07:33,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2172280.0, ans=0.0 2023-11-23 01:07:55,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2172413.3333333335, ans=0.0 2023-11-23 01:07:57,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.430e+01 8.952e+01 9.650e+01 2.402e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 01:08:01,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2172480.0, ans=0.125 2023-11-23 01:08:29,005 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1250, loss[loss=0.09033, simple_loss=0.1221, pruned_loss=0.02174, audio_tagging_loss=0.007551, over 15653.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09284, pruned_loss=0.01424, audio_tagging_loss=0.009134, over 3036477.41 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 01:08:29,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2172613.3333333335, ans=0.125 2023-11-23 01:08:37,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325900 2023-11-23 01:08:52,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2172680.0, ans=0.1 2023-11-23 01:08:58,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2172746.6666666665, ans=0.125 2023-11-23 01:09:07,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-11-23 01:09:34,056 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1300, loss[loss=0.08708, simple_loss=0.121, pruned_loss=0.01897, audio_tagging_loss=0.007605, over 14760.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09279, pruned_loss=0.01425, audio_tagging_loss=0.009086, over 3032990.23 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:09:41,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 325950 2023-11-23 01:09:50,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2173013.3333333335, ans=0.125 2023-11-23 01:10:08,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.443e+01 8.136e+01 8.697e+01 9.484e+01 1.452e+02, threshold=1.739e+02, percent-clipped=1.0 2023-11-23 01:10:11,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173146.6666666665, ans=0.1 2023-11-23 01:10:23,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2173146.6666666665, ans=0.0 2023-11-23 01:10:29,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2173213.3333333335, ans=0.0 2023-11-23 01:10:38,103 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1350, loss[loss=0.06729, simple_loss=0.08661, pruned_loss=0.01392, audio_tagging_loss=0.01007, over 16324.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09164, pruned_loss=0.01405, audio_tagging_loss=0.009252, over 3035852.59 frames. ], batch size: 61, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:10:45,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326000 2023-11-23 01:10:58,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2173346.6666666665, ans=0.125 2023-11-23 01:11:03,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2173413.3333333335, ans=0.125 2023-11-23 01:11:07,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=12.0 2023-11-23 01:11:17,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2023-11-23 01:11:25,646 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:11:34,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2173546.6666666665, ans=0.0 2023-11-23 01:11:42,778 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1400, loss[loss=0.08065, simple_loss=0.1039, pruned_loss=0.0208, audio_tagging_loss=0.007933, over 14010.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09187, pruned_loss=0.01422, audio_tagging_loss=0.009303, over 3036269.71 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:11:50,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326050 2023-11-23 01:11:53,432 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:11:54,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2173680.0, ans=0.0 2023-11-23 01:11:57,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-23 01:12:06,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2173680.0, ans=0.125 2023-11-23 01:12:09,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173746.6666666665, ans=0.1 2023-11-23 01:12:16,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2173746.6666666665, ans=0.125 2023-11-23 01:12:17,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 8.070e+01 8.559e+01 9.321e+01 1.215e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-23 01:12:17,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2173746.6666666665, ans=0.125 2023-11-23 01:12:23,589 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:12:25,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2023-11-23 01:12:30,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2173813.3333333335, ans=0.0 2023-11-23 01:12:47,038 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1450, loss[loss=0.08025, simple_loss=0.1032, pruned_loss=0.01988, audio_tagging_loss=0.008785, over 13992.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09308, pruned_loss=0.01431, audio_tagging_loss=0.009417, over 3034659.49 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:12:48,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2173946.6666666665, ans=0.2 2023-11-23 01:12:54,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326100 2023-11-23 01:13:29,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2174146.6666666665, ans=0.05 2023-11-23 01:13:50,356 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1500, loss[loss=0.06534, simple_loss=0.08586, pruned_loss=0.01161, audio_tagging_loss=0.0108, over 14381.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09247, pruned_loss=0.0143, audio_tagging_loss=0.00948, over 3029227.88 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:13:55,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2174280.0, ans=0.125 2023-11-23 01:13:57,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326150 2023-11-23 01:14:01,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2174346.6666666665, ans=0.125 2023-11-23 01:14:18,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2174413.3333333335, ans=0.04949747468305833 2023-11-23 01:14:26,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.046e+01 8.933e+01 9.681e+01 1.186e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 01:14:40,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2174480.0, ans=0.07 2023-11-23 01:14:42,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2174546.6666666665, ans=0.0 2023-11-23 01:14:53,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-23 01:14:55,305 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1550, loss[loss=0.04601, simple_loss=0.05158, pruned_loss=0.008261, audio_tagging_loss=0.01195, over 14554.00 frames. ], tot_loss[loss=0.06995, simple_loss=0.09217, pruned_loss=0.01438, audio_tagging_loss=0.009481, over 3033420.63 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:15:04,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326200 2023-11-23 01:15:04,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2174613.3333333335, ans=0.2 2023-11-23 01:15:13,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2174680.0, ans=0.125 2023-11-23 01:15:50,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2174880.0, ans=0.0 2023-11-23 01:15:52,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-23 01:15:54,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2174880.0, ans=0.1 2023-11-23 01:16:03,400 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1600, loss[loss=0.06624, simple_loss=0.08684, pruned_loss=0.0116, audio_tagging_loss=0.01122, over 16697.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.0925, pruned_loss=0.01457, audio_tagging_loss=0.00952, over 3038158.97 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 01:16:05,093 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:16:10,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2174946.6666666665, ans=0.0 2023-11-23 01:16:11,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326250 2023-11-23 01:16:16,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2175013.3333333335, ans=0.125 2023-11-23 01:16:17,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2175013.3333333335, ans=0.1 2023-11-23 01:16:22,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2175013.3333333335, ans=0.1 2023-11-23 01:16:23,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2175013.3333333335, ans=0.07 2023-11-23 01:16:30,073 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:16:35,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.21 vs. limit=10.0 2023-11-23 01:16:37,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.240e+01 8.854e+01 9.452e+01 1.276e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 01:16:42,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-23 01:16:45,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-23 01:16:51,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2175146.6666666665, ans=0.0 2023-11-23 01:16:56,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2175213.3333333335, ans=0.0 2023-11-23 01:16:56,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2175213.3333333335, ans=0.125 2023-11-23 01:17:07,132 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1650, loss[loss=0.06158, simple_loss=0.07166, pruned_loss=0.01488, audio_tagging_loss=0.01087, over 13572.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09197, pruned_loss=0.01455, audio_tagging_loss=0.0095, over 3042449.60 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 32.0 2023-11-23 01:17:14,776 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326300 2023-11-23 01:17:17,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2175280.0, ans=0.0 2023-11-23 01:17:23,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2175346.6666666665, ans=0.125 2023-11-23 01:17:54,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2175480.0, ans=0.0 2023-11-23 01:18:06,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2175546.6666666665, ans=0.0 2023-11-23 01:18:11,454 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1700, loss[loss=0.09031, simple_loss=0.1264, pruned_loss=0.01919, audio_tagging_loss=0.007921, over 16377.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09155, pruned_loss=0.01441, audio_tagging_loss=0.009533, over 3044009.86 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:18:17,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-23 01:18:19,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326350 2023-11-23 01:18:35,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2175680.0, ans=0.0 2023-11-23 01:18:40,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2175746.6666666665, ans=0.125 2023-11-23 01:18:44,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:18:44,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-23 01:18:47,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.185e+01 8.813e+01 9.746e+01 1.184e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 01:18:57,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2175813.3333333335, ans=0.2 2023-11-23 01:19:09,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2175880.0, ans=0.0 2023-11-23 01:19:16,145 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1750, loss[loss=0.0531, simple_loss=0.06332, pruned_loss=0.01126, audio_tagging_loss=0.01019, over 14989.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.09203, pruned_loss=0.01452, audio_tagging_loss=0.009454, over 3051355.89 frames. ], batch size: 57, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:19:23,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326400 2023-11-23 01:19:47,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2176080.0, ans=0.125 2023-11-23 01:19:52,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-23 01:20:19,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-23 01:20:20,581 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1800, loss[loss=0.07968, simple_loss=0.106, pruned_loss=0.0185, audio_tagging_loss=0.008195, over 14125.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.0928, pruned_loss=0.01456, audio_tagging_loss=0.009269, over 3050876.25 frames. ], batch size: 53, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:20:28,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326450 2023-11-23 01:20:34,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2176346.6666666665, ans=0.1 2023-11-23 01:20:57,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.178e+01 8.671e+01 9.411e+01 1.178e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-23 01:21:05,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2176480.0, ans=0.0 2023-11-23 01:21:25,128 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1850, loss[loss=0.06623, simple_loss=0.08586, pruned_loss=0.01261, audio_tagging_loss=0.01069, over 15574.00 frames. ], tot_loss[loss=0.0703, simple_loss=0.09301, pruned_loss=0.01458, audio_tagging_loss=0.009217, over 3052137.23 frames. ], batch size: 58, lr: 2.45e-03, grad_scale: 16.0 2023-11-23 01:21:33,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-23 01:21:33,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326500 2023-11-23 01:22:01,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2176746.6666666665, ans=0.125 2023-11-23 01:22:05,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2176813.3333333335, ans=0.125 2023-11-23 01:22:07,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2176813.3333333335, ans=0.5 2023-11-23 01:22:21,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2176880.0, ans=0.0 2023-11-23 01:22:31,053 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1900, loss[loss=0.08851, simple_loss=0.1203, pruned_loss=0.02028, audio_tagging_loss=0.008088, over 14545.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09316, pruned_loss=0.01449, audio_tagging_loss=0.009156, over 3056245.88 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:22:39,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326550 2023-11-23 01:22:59,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2177080.0, ans=0.2 2023-11-23 01:23:04,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2177080.0, ans=0.0 2023-11-23 01:23:06,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.801e+01 8.128e+01 8.928e+01 9.740e+01 2.315e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-23 01:23:20,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2177146.6666666665, ans=0.125 2023-11-23 01:23:26,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2177213.3333333335, ans=0.125 2023-11-23 01:23:36,022 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 1950, loss[loss=0.04975, simple_loss=0.05949, pruned_loss=0.008104, audio_tagging_loss=0.0119, over 15430.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09209, pruned_loss=0.01416, audio_tagging_loss=0.009171, over 3051820.16 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:23:43,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-11-23 01:23:43,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326600 2023-11-23 01:23:50,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2177346.6666666665, ans=0.1 2023-11-23 01:23:53,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2177346.6666666665, ans=0.125 2023-11-23 01:24:13,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2177413.3333333335, ans=0.1 2023-11-23 01:24:14,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2177480.0, ans=0.0 2023-11-23 01:24:37,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2177546.6666666665, ans=0.125 2023-11-23 01:24:38,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2177546.6666666665, ans=0.125 2023-11-23 01:24:39,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2177613.3333333335, ans=0.05 2023-11-23 01:24:40,692 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2000, loss[loss=0.06272, simple_loss=0.08642, pruned_loss=0.01055, audio_tagging_loss=0.008966, over 14925.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09168, pruned_loss=0.01413, audio_tagging_loss=0.009206, over 3048656.93 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:24:48,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326650 2023-11-23 01:25:05,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2177680.0, ans=0.125 2023-11-23 01:25:08,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2177746.6666666665, ans=0.1 2023-11-23 01:25:17,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.041e+01 8.683e+01 9.382e+01 1.367e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-23 01:25:19,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-23 01:25:22,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2177813.3333333335, ans=0.125 2023-11-23 01:25:36,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2177880.0, ans=0.0 2023-11-23 01:25:40,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2177880.0, ans=0.0 2023-11-23 01:25:45,371 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2050, loss[loss=0.06904, simple_loss=0.08834, pruned_loss=0.01453, audio_tagging_loss=0.01033, over 15079.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09207, pruned_loss=0.01407, audio_tagging_loss=0.009144, over 3047731.57 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:25:46,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2177946.6666666665, ans=0.1 2023-11-23 01:25:53,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326700 2023-11-23 01:26:03,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2178013.3333333335, ans=0.2 2023-11-23 01:26:07,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2178013.3333333335, ans=0.0 2023-11-23 01:26:10,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2178080.0, ans=0.0 2023-11-23 01:26:19,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2178080.0, ans=0.125 2023-11-23 01:26:19,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2178080.0, ans=0.125 2023-11-23 01:26:21,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2178080.0, ans=0.1 2023-11-23 01:26:21,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2178080.0, ans=0.1 2023-11-23 01:26:22,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2178146.6666666665, ans=0.125 2023-11-23 01:26:25,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2178146.6666666665, ans=0.125 2023-11-23 01:26:28,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2178146.6666666665, ans=0.125 2023-11-23 01:26:39,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-11-23 01:26:46,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2178213.3333333335, ans=0.125 2023-11-23 01:26:49,578 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2100, loss[loss=0.0694, simple_loss=0.09684, pruned_loss=0.01299, audio_tagging_loss=0.007988, over 14653.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09223, pruned_loss=0.01401, audio_tagging_loss=0.009115, over 3047396.53 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:26:57,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326750 2023-11-23 01:27:25,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.249e+01 8.824e+01 9.661e+01 1.236e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 01:27:48,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2178546.6666666665, ans=0.125 2023-11-23 01:27:53,285 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2150, loss[loss=0.08227, simple_loss=0.1134, pruned_loss=0.01637, audio_tagging_loss=0.009205, over 15036.00 frames. ], tot_loss[loss=0.0689, simple_loss=0.09193, pruned_loss=0.01387, audio_tagging_loss=0.009068, over 3044728.74 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:28:00,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326800 2023-11-23 01:28:21,986 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:28:24,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2178746.6666666665, ans=10.0 2023-11-23 01:28:27,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2178746.6666666665, ans=0.125 2023-11-23 01:28:33,806 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:28:58,346 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2200, loss[loss=0.06452, simple_loss=0.09097, pruned_loss=0.01007, audio_tagging_loss=0.008968, over 15909.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09282, pruned_loss=0.01407, audio_tagging_loss=0.009026, over 3046585.56 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:29:07,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326850 2023-11-23 01:29:07,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2178946.6666666665, ans=0.2 2023-11-23 01:29:07,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2178946.6666666665, ans=0.1 2023-11-23 01:29:08,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-23 01:29:10,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2023-11-23 01:29:27,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2179080.0, ans=0.2 2023-11-23 01:29:34,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.071e+01 8.694e+01 9.748e+01 1.270e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-23 01:30:02,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-23 01:30:03,038 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2250, loss[loss=0.0908, simple_loss=0.1163, pruned_loss=0.02227, audio_tagging_loss=0.01041, over 15493.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09314, pruned_loss=0.01431, audio_tagging_loss=0.009116, over 3041435.74 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:30:03,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2179280.0, ans=0.0 2023-11-23 01:30:10,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326900 2023-11-23 01:30:11,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2179280.0, ans=0.125 2023-11-23 01:30:17,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2179346.6666666665, ans=0.0 2023-11-23 01:30:25,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2179346.6666666665, ans=0.95 2023-11-23 01:30:35,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-11-23 01:30:39,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2179480.0, ans=0.125 2023-11-23 01:30:44,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179480.0, ans=0.1 2023-11-23 01:30:48,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-23 01:31:07,414 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2300, loss[loss=0.06653, simple_loss=0.08469, pruned_loss=0.01372, audio_tagging_loss=0.01047, over 16182.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09276, pruned_loss=0.01441, audio_tagging_loss=0.009168, over 3043092.14 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:31:12,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2179613.3333333335, ans=0.125 2023-11-23 01:31:14,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 326950 2023-11-23 01:31:15,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-23 01:31:25,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-23 01:31:29,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2179680.0, ans=0.2 2023-11-23 01:31:44,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.554e+01 8.250e+01 8.860e+01 9.394e+01 1.286e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 01:31:51,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2179813.3333333335, ans=0.125 2023-11-23 01:31:51,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-11-23 01:32:05,568 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:32:07,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2179880.0, ans=0.5 2023-11-23 01:32:12,401 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2350, loss[loss=0.06797, simple_loss=0.08773, pruned_loss=0.01199, audio_tagging_loss=0.01212, over 14590.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09281, pruned_loss=0.01451, audio_tagging_loss=0.009278, over 3037575.02 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:32:16,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2179946.6666666665, ans=0.1 2023-11-23 01:32:19,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327000 2023-11-23 01:32:29,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2180013.3333333335, ans=0.125 2023-11-23 01:32:34,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2180013.3333333335, ans=0.1 2023-11-23 01:32:36,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2023-11-23 01:32:39,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-23 01:32:53,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-23 01:33:01,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2180146.6666666665, ans=0.2 2023-11-23 01:33:05,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=8.0 2023-11-23 01:33:10,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=22.5 2023-11-23 01:33:17,548 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2400, loss[loss=0.06207, simple_loss=0.07867, pruned_loss=0.01303, audio_tagging_loss=0.009696, over 14074.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09272, pruned_loss=0.01441, audio_tagging_loss=0.009369, over 3043202.58 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:33:24,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2180280.0, ans=0.125 2023-11-23 01:33:25,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327050 2023-11-23 01:33:27,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2180280.0, ans=0.0 2023-11-23 01:33:33,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2180346.6666666665, ans=0.0 2023-11-23 01:33:48,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2180413.3333333335, ans=0.1 2023-11-23 01:33:54,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.524e+01 9.128e+01 9.621e+01 1.327e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-23 01:33:54,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2180480.0, ans=0.125 2023-11-23 01:34:21,269 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2450, loss[loss=0.08344, simple_loss=0.1155, pruned_loss=0.01699, audio_tagging_loss=0.008691, over 15200.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09284, pruned_loss=0.01461, audio_tagging_loss=0.009408, over 3042932.61 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:34:27,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2180613.3333333335, ans=0.1 2023-11-23 01:34:28,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327100 2023-11-23 01:34:55,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2180746.6666666665, ans=0.125 2023-11-23 01:35:01,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-11-23 01:35:03,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2180813.3333333335, ans=0.125 2023-11-23 01:35:04,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-23 01:35:11,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2180880.0, ans=0.125 2023-11-23 01:35:17,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2180880.0, ans=0.125 2023-11-23 01:35:23,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2180880.0, ans=0.0 2023-11-23 01:35:25,320 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2500, loss[loss=0.0786, simple_loss=0.1024, pruned_loss=0.01844, audio_tagging_loss=0.008969, over 13919.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.09247, pruned_loss=0.01446, audio_tagging_loss=0.009399, over 3049862.27 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:35:25,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2180946.6666666665, ans=10.0 2023-11-23 01:35:25,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2180946.6666666665, ans=0.0 2023-11-23 01:35:33,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327150 2023-11-23 01:35:58,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2181080.0, ans=0.0 2023-11-23 01:36:02,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.294e+01 8.910e+01 9.572e+01 1.200e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 01:36:04,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2023-11-23 01:36:09,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2181146.6666666665, ans=0.035 2023-11-23 01:36:14,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2181146.6666666665, ans=0.2 2023-11-23 01:36:16,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-23 01:36:23,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2181213.3333333335, ans=0.125 2023-11-23 01:36:30,899 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2550, loss[loss=0.07376, simple_loss=0.0936, pruned_loss=0.017, audio_tagging_loss=0.009965, over 14394.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09239, pruned_loss=0.01462, audio_tagging_loss=0.009272, over 3044151.88 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:36:38,237 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327200 2023-11-23 01:36:54,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2181413.3333333335, ans=0.09899494936611666 2023-11-23 01:37:19,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2023-11-23 01:37:25,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-23 01:37:28,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2181546.6666666665, ans=0.1 2023-11-23 01:37:30,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2181546.6666666665, ans=0.125 2023-11-23 01:37:31,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2181546.6666666665, ans=0.125 2023-11-23 01:37:35,094 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2600, loss[loss=0.07926, simple_loss=0.1009, pruned_loss=0.01766, audio_tagging_loss=0.01115, over 15027.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09131, pruned_loss=0.01433, audio_tagging_loss=0.009316, over 3042695.26 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:37:39,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2181613.3333333335, ans=0.125 2023-11-23 01:37:42,631 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327250 2023-11-23 01:37:50,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2181680.0, ans=0.125 2023-11-23 01:37:53,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2181680.0, ans=0.0 2023-11-23 01:38:13,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.227e+01 8.835e+01 9.553e+01 1.278e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 01:38:24,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2181813.3333333335, ans=0.0 2023-11-23 01:38:29,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2023-11-23 01:38:29,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2181880.0, ans=0.125 2023-11-23 01:38:39,219 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2650, loss[loss=0.07908, simple_loss=0.1042, pruned_loss=0.01794, audio_tagging_loss=0.009044, over 15392.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09336, pruned_loss=0.01459, audio_tagging_loss=0.009167, over 3033474.37 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:38:47,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327300 2023-11-23 01:38:55,594 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.548e-03 2023-11-23 01:39:06,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=22.5 2023-11-23 01:39:23,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2182146.6666666665, ans=0.0 2023-11-23 01:39:29,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-23 01:39:35,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2182213.3333333335, ans=0.125 2023-11-23 01:39:39,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-23 01:39:40,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2182213.3333333335, ans=0.125 2023-11-23 01:39:44,313 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2700, loss[loss=0.07928, simple_loss=0.09747, pruned_loss=0.01927, audio_tagging_loss=0.01128, over 14937.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09279, pruned_loss=0.0144, audio_tagging_loss=0.009061, over 3037419.33 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:39:52,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327350 2023-11-23 01:39:57,454 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:39:57,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2182346.6666666665, ans=0.09899494936611666 2023-11-23 01:40:21,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.147e+01 8.099e+01 8.608e+01 9.208e+01 1.380e+02, threshold=1.722e+02, percent-clipped=0.0 2023-11-23 01:40:28,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-23 01:40:32,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2182480.0, ans=0.1 2023-11-23 01:40:48,628 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2750, loss[loss=0.07545, simple_loss=0.101, pruned_loss=0.01571, audio_tagging_loss=0.009264, over 15054.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09166, pruned_loss=0.01419, audio_tagging_loss=0.009157, over 3048335.09 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:40:56,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327400 2023-11-23 01:41:24,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2182746.6666666665, ans=0.125 2023-11-23 01:41:31,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2182813.3333333335, ans=0.125 2023-11-23 01:41:31,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2182813.3333333335, ans=0.0 2023-11-23 01:41:43,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2182880.0, ans=0.0 2023-11-23 01:41:44,313 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:41:52,829 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2800, loss[loss=0.06884, simple_loss=0.08435, pruned_loss=0.01565, audio_tagging_loss=0.01102, over 15113.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09128, pruned_loss=0.0141, audio_tagging_loss=0.009125, over 3054223.05 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:42:00,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327450 2023-11-23 01:42:02,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2182946.6666666665, ans=0.5 2023-11-23 01:42:05,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2183013.3333333335, ans=0.2 2023-11-23 01:42:10,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2183013.3333333335, ans=0.125 2023-11-23 01:42:26,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2183080.0, ans=0.0 2023-11-23 01:42:31,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.504e+01 8.188e+01 8.710e+01 9.337e+01 1.091e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-23 01:42:39,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2183146.6666666665, ans=0.2 2023-11-23 01:42:58,030 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2850, loss[loss=0.05543, simple_loss=0.07558, pruned_loss=0.007664, audio_tagging_loss=0.009982, over 16095.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09202, pruned_loss=0.0143, audio_tagging_loss=0.009105, over 3053505.21 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:43:06,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327500 2023-11-23 01:43:20,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2183346.6666666665, ans=0.2 2023-11-23 01:43:34,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2183480.0, ans=0.2 2023-11-23 01:43:35,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2183480.0, ans=0.125 2023-11-23 01:44:01,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2023-11-23 01:44:01,865 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2900, loss[loss=0.0636, simple_loss=0.07956, pruned_loss=0.01444, audio_tagging_loss=0.009381, over 14460.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.0925, pruned_loss=0.01431, audio_tagging_loss=0.008997, over 3049695.11 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:44:06,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2183613.3333333335, ans=0.125 2023-11-23 01:44:09,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327550 2023-11-23 01:44:26,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2183746.6666666665, ans=0.0 2023-11-23 01:44:26,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2183746.6666666665, ans=0.125 2023-11-23 01:44:37,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2023-11-23 01:44:41,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.459e+01 9.104e+01 9.833e+01 1.241e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-23 01:44:47,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2183813.3333333335, ans=0.125 2023-11-23 01:44:47,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2023-11-23 01:44:53,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2183880.0, ans=0.125 2023-11-23 01:45:05,853 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 2950, loss[loss=0.07528, simple_loss=0.1012, pruned_loss=0.01592, audio_tagging_loss=0.00874, over 14930.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09379, pruned_loss=0.01463, audio_tagging_loss=0.00904, over 3050548.62 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:45:13,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327600 2023-11-23 01:45:25,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2184013.3333333335, ans=0.1 2023-11-23 01:45:40,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2184080.0, ans=0.07 2023-11-23 01:45:41,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2184080.0, ans=0.0 2023-11-23 01:46:04,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2184213.3333333335, ans=0.125 2023-11-23 01:46:05,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-11-23 01:46:09,987 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3000, loss[loss=0.07931, simple_loss=0.1078, pruned_loss=0.01664, audio_tagging_loss=0.008772, over 15273.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.0944, pruned_loss=0.01483, audio_tagging_loss=0.009048, over 3048172.48 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:46:09,987 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 01:46:53,249 INFO [train_asr.py:1253] (1/4) Epoch 28, validation: loss=0.05807, simple_loss=0.05122, pruned_loss=0.005026, audio_tagging_loss=0.02744, over 4681554.00 frames. 2023-11-23 01:46:53,250 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 01:47:00,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327650 2023-11-23 01:47:32,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.143e+01 8.821e+01 9.621e+01 1.257e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-23 01:47:43,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2184546.6666666665, ans=0.0 2023-11-23 01:47:46,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2184546.6666666665, ans=0.125 2023-11-23 01:47:48,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2184546.6666666665, ans=0.125 2023-11-23 01:47:48,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2184546.6666666665, ans=0.125 2023-11-23 01:47:53,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2184546.6666666665, ans=0.125 2023-11-23 01:47:55,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2184613.3333333335, ans=0.2 2023-11-23 01:47:56,820 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3050, loss[loss=0.07572, simple_loss=0.1026, pruned_loss=0.01615, audio_tagging_loss=0.008256, over 16362.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09419, pruned_loss=0.0147, audio_tagging_loss=0.009158, over 3045199.50 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:48:04,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327700 2023-11-23 01:48:23,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2184746.6666666665, ans=0.1 2023-11-23 01:48:32,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2023-11-23 01:48:35,729 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:48:38,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-23 01:48:50,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2184880.0, ans=0.0 2023-11-23 01:48:55,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2184880.0, ans=0.125 2023-11-23 01:48:57,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-23 01:49:01,417 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3100, loss[loss=0.06658, simple_loss=0.08559, pruned_loss=0.01148, audio_tagging_loss=0.0123, over 14476.00 frames. ], tot_loss[loss=0.07088, simple_loss=0.09407, pruned_loss=0.01461, audio_tagging_loss=0.009229, over 3041222.66 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:49:09,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327750 2023-11-23 01:49:10,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2184946.6666666665, ans=0.2 2023-11-23 01:49:18,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2023-11-23 01:49:26,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-23 01:49:39,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.129e+01 8.806e+01 9.628e+01 1.604e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-23 01:49:50,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2185146.6666666665, ans=0.125 2023-11-23 01:49:54,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2185213.3333333335, ans=0.07 2023-11-23 01:49:58,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2185213.3333333335, ans=0.125 2023-11-23 01:50:05,792 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3150, loss[loss=0.06428, simple_loss=0.08494, pruned_loss=0.01103, audio_tagging_loss=0.01078, over 15039.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09354, pruned_loss=0.01443, audio_tagging_loss=0.009249, over 3045437.20 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 01:50:12,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-11-23 01:50:13,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327800 2023-11-23 01:50:14,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2185280.0, ans=0.2 2023-11-23 01:51:05,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2185546.6666666665, ans=0.125 2023-11-23 01:51:09,986 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3200, loss[loss=0.06875, simple_loss=0.08957, pruned_loss=0.01705, audio_tagging_loss=0.006921, over 16331.00 frames. ], tot_loss[loss=0.07035, simple_loss=0.09317, pruned_loss=0.01439, audio_tagging_loss=0.009374, over 3040904.55 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:51:17,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327850 2023-11-23 01:51:22,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2185680.0, ans=10.0 2023-11-23 01:51:25,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-23 01:51:29,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2185680.0, ans=0.125 2023-11-23 01:51:49,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.167e+01 8.814e+01 9.617e+01 1.193e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 01:51:51,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2185813.3333333335, ans=0.125 2023-11-23 01:52:01,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-11-23 01:52:14,667 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3250, loss[loss=0.06732, simple_loss=0.08652, pruned_loss=0.01386, audio_tagging_loss=0.0102, over 16265.00 frames. ], tot_loss[loss=0.06952, simple_loss=0.09193, pruned_loss=0.0141, audio_tagging_loss=0.00945, over 3042888.36 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:52:23,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327900 2023-11-23 01:52:23,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2185946.6666666665, ans=0.09899494936611666 2023-11-23 01:52:37,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2186013.3333333335, ans=0.0 2023-11-23 01:52:38,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-23 01:52:39,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2186080.0, ans=0.0 2023-11-23 01:52:42,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-23 01:52:45,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2186080.0, ans=10.0 2023-11-23 01:52:59,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2186146.6666666665, ans=0.0 2023-11-23 01:53:18,586 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3300, loss[loss=0.06081, simple_loss=0.08092, pruned_loss=0.008764, audio_tagging_loss=0.01159, over 14564.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09249, pruned_loss=0.01427, audio_tagging_loss=0.009517, over 3047919.50 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:53:22,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2186280.0, ans=0.125 2023-11-23 01:53:26,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 327950 2023-11-23 01:53:46,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2186413.3333333335, ans=0.125 2023-11-23 01:53:57,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.400e+01 9.018e+01 9.669e+01 1.285e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 01:53:58,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2186480.0, ans=0.125 2023-11-23 01:54:15,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2186546.6666666665, ans=15.0 2023-11-23 01:54:16,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2186546.6666666665, ans=0.125 2023-11-23 01:54:22,731 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3350, loss[loss=0.06995, simple_loss=0.08752, pruned_loss=0.01812, audio_tagging_loss=0.00807, over 16050.00 frames. ], tot_loss[loss=0.06997, simple_loss=0.09266, pruned_loss=0.01433, audio_tagging_loss=0.009309, over 3046958.98 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:54:29,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2186613.3333333335, ans=0.0 2023-11-23 01:54:30,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328000 2023-11-23 01:55:19,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2186880.0, ans=0.125 2023-11-23 01:55:30,853 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3400, loss[loss=0.08171, simple_loss=0.1175, pruned_loss=0.01484, audio_tagging_loss=0.008144, over 16190.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09255, pruned_loss=0.01449, audio_tagging_loss=0.009195, over 3047341.30 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:55:39,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328050 2023-11-23 01:55:42,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2186946.6666666665, ans=0.0 2023-11-23 01:55:59,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2187080.0, ans=0.1 2023-11-23 01:56:09,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.298e+01 8.922e+01 9.490e+01 1.312e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 01:56:35,851 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3450, loss[loss=0.05914, simple_loss=0.08198, pruned_loss=0.01121, audio_tagging_loss=0.006939, over 15329.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09298, pruned_loss=0.01458, audio_tagging_loss=0.009073, over 3045310.22 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:56:43,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328100 2023-11-23 01:56:44,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.07 vs. limit=10.0 2023-11-23 01:56:55,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2187346.6666666665, ans=0.125 2023-11-23 01:57:00,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2187413.3333333335, ans=0.125 2023-11-23 01:57:00,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2187413.3333333335, ans=0.125 2023-11-23 01:57:05,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=22.5 2023-11-23 01:57:15,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2187480.0, ans=0.2 2023-11-23 01:57:28,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2187546.6666666665, ans=0.125 2023-11-23 01:57:39,720 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3500, loss[loss=0.05008, simple_loss=0.06468, pruned_loss=0.0078, audio_tagging_loss=0.009941, over 15525.00 frames. ], tot_loss[loss=0.06971, simple_loss=0.09244, pruned_loss=0.01441, audio_tagging_loss=0.009084, over 3046697.24 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:57:46,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2187613.3333333335, ans=0.0 2023-11-23 01:57:47,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328150 2023-11-23 01:57:52,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-23 01:57:59,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-23 01:58:13,409 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 01:58:18,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.297e+01 8.906e+01 9.538e+01 1.458e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 01:58:19,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2187813.3333333335, ans=0.125 2023-11-23 01:58:26,173 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:58:27,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2187813.3333333335, ans=0.2 2023-11-23 01:58:28,575 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 01:58:44,532 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3550, loss[loss=0.06904, simple_loss=0.07853, pruned_loss=0.01926, audio_tagging_loss=0.01052, over 15139.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09223, pruned_loss=0.01425, audio_tagging_loss=0.009131, over 3048875.00 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:58:53,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328200 2023-11-23 01:59:12,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-23 01:59:18,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2188080.0, ans=0.0 2023-11-23 01:59:19,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2188080.0, ans=0.07 2023-11-23 01:59:50,106 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3600, loss[loss=0.06392, simple_loss=0.07996, pruned_loss=0.01456, audio_tagging_loss=0.00938, over 14462.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09176, pruned_loss=0.01417, audio_tagging_loss=0.0091, over 3046824.98 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 01:59:50,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2188280.0, ans=0.125 2023-11-23 01:59:55,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2188280.0, ans=0.125 2023-11-23 01:59:55,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188280.0, ans=0.1 2023-11-23 01:59:57,463 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328250 2023-11-23 02:00:02,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2188346.6666666665, ans=0.125 2023-11-23 02:00:14,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2188413.3333333335, ans=0.2 2023-11-23 02:00:20,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2188413.3333333335, ans=0.2 2023-11-23 02:00:28,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.210e+01 8.927e+01 9.741e+01 1.268e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 02:00:44,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2188546.6666666665, ans=0.0 2023-11-23 02:00:46,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-23 02:00:53,675 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3650, loss[loss=0.06238, simple_loss=0.08059, pruned_loss=0.01253, audio_tagging_loss=0.009551, over 14813.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09185, pruned_loss=0.01426, audio_tagging_loss=0.009113, over 3044430.19 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:01:00,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328300 2023-11-23 02:01:14,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2188680.0, ans=0.1 2023-11-23 02:01:26,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-23 02:01:31,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2188813.3333333335, ans=0.2 2023-11-23 02:01:39,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2188813.3333333335, ans=0.125 2023-11-23 02:01:45,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188880.0, ans=0.1 2023-11-23 02:01:57,282 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3700, loss[loss=0.08452, simple_loss=0.1161, pruned_loss=0.01849, audio_tagging_loss=0.008, over 16191.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09378, pruned_loss=0.01473, audio_tagging_loss=0.009098, over 3051811.08 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:01:57,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2188946.6666666665, ans=0.07 2023-11-23 02:02:05,411 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328350 2023-11-23 02:02:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2189013.3333333335, ans=0.125 2023-11-23 02:02:37,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.324e+01 8.915e+01 9.653e+01 1.223e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 02:02:39,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2189146.6666666665, ans=0.0 2023-11-23 02:02:57,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2189213.3333333335, ans=0.125 2023-11-23 02:03:03,381 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3750, loss[loss=0.05938, simple_loss=0.07914, pruned_loss=0.01124, audio_tagging_loss=0.008569, over 15370.00 frames. ], tot_loss[loss=0.07066, simple_loss=0.09379, pruned_loss=0.01464, audio_tagging_loss=0.009126, over 3056205.20 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:03:10,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328400 2023-11-23 02:03:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2189413.3333333335, ans=0.125 2023-11-23 02:03:48,605 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:03:49,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2189480.0, ans=0.0 2023-11-23 02:04:05,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2189613.3333333335, ans=0.125 2023-11-23 02:04:06,822 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3800, loss[loss=0.06256, simple_loss=0.08788, pruned_loss=0.008982, audio_tagging_loss=0.00964, over 15587.00 frames. ], tot_loss[loss=0.07053, simple_loss=0.09354, pruned_loss=0.01462, audio_tagging_loss=0.009139, over 3048006.41 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:04:13,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2189613.3333333335, ans=0.125 2023-11-23 02:04:14,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328450 2023-11-23 02:04:24,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2189680.0, ans=0.125 2023-11-23 02:04:47,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.883e+01 8.455e+01 8.996e+01 9.813e+01 1.168e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 02:04:54,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2189813.3333333335, ans=0.2 2023-11-23 02:04:55,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2189813.3333333335, ans=0.0 2023-11-23 02:04:55,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2189813.3333333335, ans=0.125 2023-11-23 02:05:10,358 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3850, loss[loss=0.0631, simple_loss=0.07947, pruned_loss=0.01254, audio_tagging_loss=0.01082, over 15025.00 frames. ], tot_loss[loss=0.07056, simple_loss=0.09364, pruned_loss=0.0145, audio_tagging_loss=0.009249, over 3045911.38 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:05:19,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328500 2023-11-23 02:06:02,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2190213.3333333335, ans=0.1 2023-11-23 02:06:15,814 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3900, loss[loss=0.06938, simple_loss=0.08193, pruned_loss=0.01824, audio_tagging_loss=0.01018, over 14650.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09294, pruned_loss=0.01442, audio_tagging_loss=0.009325, over 3039771.74 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:06:22,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2190280.0, ans=0.1 2023-11-23 02:06:23,387 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328550 2023-11-23 02:06:53,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.746e+01 8.052e+01 8.828e+01 9.660e+01 1.313e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 02:07:18,493 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 3950, loss[loss=0.07315, simple_loss=0.1005, pruned_loss=0.01534, audio_tagging_loss=0.007576, over 16101.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09273, pruned_loss=0.01431, audio_tagging_loss=0.009419, over 3040940.25 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:07:23,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2190613.3333333335, ans=0.05 2023-11-23 02:07:25,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328600 2023-11-23 02:07:42,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2023-11-23 02:07:57,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2190813.3333333335, ans=0.1 2023-11-23 02:08:22,770 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4000, loss[loss=0.05985, simple_loss=0.06999, pruned_loss=0.01651, audio_tagging_loss=0.008348, over 15480.00 frames. ], tot_loss[loss=0.07086, simple_loss=0.0938, pruned_loss=0.01452, audio_tagging_loss=0.009442, over 3043588.79 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:08:30,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328650 2023-11-23 02:08:34,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2191013.3333333335, ans=0.125 2023-11-23 02:08:36,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2191013.3333333335, ans=0.125 2023-11-23 02:08:38,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2191013.3333333335, ans=0.125 2023-11-23 02:08:45,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-11-23 02:09:03,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.297e+01 8.975e+01 9.665e+01 1.273e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-23 02:09:10,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2023-11-23 02:09:18,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2191213.3333333335, ans=0.1 2023-11-23 02:09:19,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2191213.3333333335, ans=0.125 2023-11-23 02:09:21,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2191213.3333333335, ans=0.0 2023-11-23 02:09:28,245 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4050, loss[loss=0.05852, simple_loss=0.07285, pruned_loss=0.01091, audio_tagging_loss=0.01119, over 15682.00 frames. ], tot_loss[loss=0.07122, simple_loss=0.09419, pruned_loss=0.01468, audio_tagging_loss=0.009452, over 3042770.30 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:09:29,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2191280.0, ans=0.5 2023-11-23 02:09:32,515 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:09:36,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328700 2023-11-23 02:09:50,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2023-11-23 02:09:53,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2191413.3333333335, ans=0.125 2023-11-23 02:09:56,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-23 02:10:01,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-23 02:10:32,648 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4100, loss[loss=0.06336, simple_loss=0.08283, pruned_loss=0.01501, audio_tagging_loss=0.006932, over 14778.00 frames. ], tot_loss[loss=0.07093, simple_loss=0.0941, pruned_loss=0.01449, audio_tagging_loss=0.009396, over 3045452.47 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:10:40,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328750 2023-11-23 02:11:00,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2191746.6666666665, ans=0.1 2023-11-23 02:11:01,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2191746.6666666665, ans=0.125 2023-11-23 02:11:05,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2191746.6666666665, ans=0.1 2023-11-23 02:11:13,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.198e+01 8.662e+01 9.320e+01 1.292e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-23 02:11:16,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2191813.3333333335, ans=0.07 2023-11-23 02:11:25,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2191880.0, ans=0.125 2023-11-23 02:11:26,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2191880.0, ans=0.125 2023-11-23 02:11:30,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2191880.0, ans=0.2 2023-11-23 02:11:36,018 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4150, loss[loss=0.05518, simple_loss=0.06686, pruned_loss=0.009662, audio_tagging_loss=0.01209, over 15666.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09305, pruned_loss=0.01434, audio_tagging_loss=0.009287, over 3048256.04 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:11:43,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328800 2023-11-23 02:12:07,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2192080.0, ans=0.0 2023-11-23 02:12:12,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2192080.0, ans=0.125 2023-11-23 02:12:22,720 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:12:40,178 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4200, loss[loss=0.04998, simple_loss=0.05942, pruned_loss=0.009376, audio_tagging_loss=0.0109, over 16458.00 frames. ], tot_loss[loss=0.07038, simple_loss=0.0934, pruned_loss=0.0144, audio_tagging_loss=0.009279, over 3050190.90 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:12:41,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2192280.0, ans=0.125 2023-11-23 02:12:48,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328850 2023-11-23 02:12:51,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=2192280.0, ans=15.0 2023-11-23 02:12:56,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2192346.6666666665, ans=0.0 2023-11-23 02:13:05,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2192413.3333333335, ans=0.1 2023-11-23 02:13:05,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-23 02:13:15,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-11-23 02:13:16,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2192413.3333333335, ans=0.05 2023-11-23 02:13:20,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.473e+01 9.086e+01 9.924e+01 1.385e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 02:13:31,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2192546.6666666665, ans=0.125 2023-11-23 02:13:32,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-23 02:13:36,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2192546.6666666665, ans=0.0 2023-11-23 02:13:44,961 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4250, loss[loss=0.05567, simple_loss=0.06645, pruned_loss=0.01132, audio_tagging_loss=0.01113, over 14323.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.0931, pruned_loss=0.0144, audio_tagging_loss=0.009188, over 3049362.74 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:13:52,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328900 2023-11-23 02:14:26,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2192813.3333333335, ans=0.125 2023-11-23 02:14:36,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2192880.0, ans=0.2 2023-11-23 02:14:41,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2192880.0, ans=10.0 2023-11-23 02:14:45,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2192880.0, ans=0.0 2023-11-23 02:14:48,994 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4300, loss[loss=0.05697, simple_loss=0.07524, pruned_loss=0.01105, audio_tagging_loss=0.008309, over 14834.00 frames. ], tot_loss[loss=0.07052, simple_loss=0.09382, pruned_loss=0.01447, audio_tagging_loss=0.00914, over 3046927.62 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:14:50,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2192946.6666666665, ans=0.125 2023-11-23 02:14:51,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2192946.6666666665, ans=0.0 2023-11-23 02:14:56,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 328950 2023-11-23 02:14:57,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=15.0 2023-11-23 02:14:58,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2192946.6666666665, ans=0.0 2023-11-23 02:15:01,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2193013.3333333335, ans=0.125 2023-11-23 02:15:06,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2193013.3333333335, ans=0.125 2023-11-23 02:15:06,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2193013.3333333335, ans=0.125 2023-11-23 02:15:23,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2193080.0, ans=0.125 2023-11-23 02:15:30,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.649e+01 8.452e+01 8.978e+01 9.927e+01 1.263e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 02:15:33,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2193146.6666666665, ans=0.125 2023-11-23 02:15:53,681 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4350, loss[loss=0.07883, simple_loss=0.1131, pruned_loss=0.01444, audio_tagging_loss=0.007827, over 15194.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.0947, pruned_loss=0.01463, audio_tagging_loss=0.009033, over 3047476.36 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 16.0 2023-11-23 02:15:54,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-23 02:16:02,573 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329000 2023-11-23 02:16:17,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-11-23 02:16:35,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.09 vs. limit=22.5 2023-11-23 02:16:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2193613.3333333335, ans=0.125 2023-11-23 02:16:58,990 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4400, loss[loss=0.07759, simple_loss=0.109, pruned_loss=0.01453, audio_tagging_loss=0.008557, over 14050.00 frames. ], tot_loss[loss=0.0715, simple_loss=0.0955, pruned_loss=0.01473, audio_tagging_loss=0.009016, over 3051394.27 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:16:59,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2193613.3333333335, ans=0.125 2023-11-23 02:17:06,996 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329050 2023-11-23 02:17:07,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2193613.3333333335, ans=0.2 2023-11-23 02:17:07,167 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:17:17,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2193680.0, ans=0.1 2023-11-23 02:17:19,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2193680.0, ans=0.0 2023-11-23 02:17:40,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.171e+01 8.943e+01 9.862e+01 1.169e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 02:17:48,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-23 02:17:48,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-23 02:18:03,068 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4450, loss[loss=0.05706, simple_loss=0.07873, pruned_loss=0.01103, audio_tagging_loss=0.006663, over 14905.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09527, pruned_loss=0.01469, audio_tagging_loss=0.00892, over 3054849.55 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:18:08,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2193946.6666666665, ans=0.0 2023-11-23 02:18:10,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329100 2023-11-23 02:18:31,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2194080.0, ans=15.0 2023-11-23 02:18:55,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2194213.3333333335, ans=0.1 2023-11-23 02:19:05,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2194213.3333333335, ans=0.125 2023-11-23 02:19:05,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2194213.3333333335, ans=0.125 2023-11-23 02:19:07,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-23 02:19:07,662 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4500, loss[loss=0.08192, simple_loss=0.1141, pruned_loss=0.01958, audio_tagging_loss=0.00531, over 15173.00 frames. ], tot_loss[loss=0.07082, simple_loss=0.09467, pruned_loss=0.01456, audio_tagging_loss=0.008925, over 3053430.54 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:19:11,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2194280.0, ans=0.125 2023-11-23 02:19:16,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329150 2023-11-23 02:19:23,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2194346.6666666665, ans=0.0 2023-11-23 02:19:37,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2194413.3333333335, ans=0.125 2023-11-23 02:19:43,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-23 02:19:48,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.364e+01 9.070e+01 9.690e+01 1.146e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-23 02:19:55,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2194480.0, ans=0.125 2023-11-23 02:20:02,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2194546.6666666665, ans=0.125 2023-11-23 02:20:10,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-23 02:20:13,041 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4550, loss[loss=0.0769, simple_loss=0.1083, pruned_loss=0.01402, audio_tagging_loss=0.008741, over 15466.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09405, pruned_loss=0.01437, audio_tagging_loss=0.00903, over 3056729.71 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2023-11-23 02:20:20,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329200 2023-11-23 02:20:43,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2194746.6666666665, ans=0.1 2023-11-23 02:20:58,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2194813.3333333335, ans=0.2 2023-11-23 02:21:01,787 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:21:15,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2194946.6666666665, ans=0.125 2023-11-23 02:21:16,485 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4600, loss[loss=0.048, simple_loss=0.06279, pruned_loss=0.008411, audio_tagging_loss=0.008195, over 13837.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09351, pruned_loss=0.01438, audio_tagging_loss=0.009147, over 3054834.06 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:21:24,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329250 2023-11-23 02:21:43,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-11-23 02:21:50,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2195080.0, ans=0.025 2023-11-23 02:21:58,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.216e+01 8.878e+01 9.478e+01 1.349e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-23 02:21:59,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=12.0 2023-11-23 02:22:01,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-23 02:22:04,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-23 02:22:04,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2195146.6666666665, ans=0.125 2023-11-23 02:22:21,461 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4650, loss[loss=0.05325, simple_loss=0.07261, pruned_loss=0.008487, audio_tagging_loss=0.008456, over 15384.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09267, pruned_loss=0.01416, audio_tagging_loss=0.009197, over 3052422.67 frames. ], batch size: 60, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:22:29,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329300 2023-11-23 02:22:37,005 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:23:02,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2195480.0, ans=0.1 2023-11-23 02:23:27,290 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4700, loss[loss=0.07419, simple_loss=0.09651, pruned_loss=0.01479, audio_tagging_loss=0.01114, over 15457.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09214, pruned_loss=0.01399, audio_tagging_loss=0.009281, over 3048308.22 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:23:34,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329350 2023-11-23 02:23:50,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2195680.0, ans=0.125 2023-11-23 02:24:09,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.732e+01 8.291e+01 9.016e+01 9.707e+01 1.177e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 02:24:27,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2195880.0, ans=0.07 2023-11-23 02:24:31,716 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4750, loss[loss=0.1025, simple_loss=0.1271, pruned_loss=0.0268, audio_tagging_loss=0.01211, over 16391.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09206, pruned_loss=0.01409, audio_tagging_loss=0.009346, over 3048039.51 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:24:37,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2195946.6666666665, ans=0.05 2023-11-23 02:24:39,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329400 2023-11-23 02:24:47,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2196013.3333333335, ans=0.125 2023-11-23 02:24:51,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-23 02:24:55,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2196013.3333333335, ans=0.0 2023-11-23 02:25:14,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2196146.6666666665, ans=0.125 2023-11-23 02:25:37,116 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4800, loss[loss=0.04765, simple_loss=0.05986, pruned_loss=0.006653, audio_tagging_loss=0.01107, over 15187.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09177, pruned_loss=0.01405, audio_tagging_loss=0.009489, over 3051033.06 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:25:41,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2196280.0, ans=0.125 2023-11-23 02:25:44,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329450 2023-11-23 02:25:59,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2196346.6666666665, ans=0.125 2023-11-23 02:26:05,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2196413.3333333335, ans=0.125 2023-11-23 02:26:16,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2023-11-23 02:26:19,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2023-11-23 02:26:19,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.219e+01 8.831e+01 9.548e+01 1.229e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 02:26:28,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2196546.6666666665, ans=0.2 2023-11-23 02:26:28,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=22.5 2023-11-23 02:26:30,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2196546.6666666665, ans=0.2 2023-11-23 02:26:34,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-23 02:26:43,272 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4850, loss[loss=0.06217, simple_loss=0.0883, pruned_loss=0.009974, audio_tagging_loss=0.008047, over 16125.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09209, pruned_loss=0.014, audio_tagging_loss=0.009519, over 3053891.09 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:26:48,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2196613.3333333335, ans=0.125 2023-11-23 02:26:50,768 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329500 2023-11-23 02:26:55,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2196680.0, ans=0.2 2023-11-23 02:26:58,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2196680.0, ans=0.125 2023-11-23 02:27:07,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2196746.6666666665, ans=0.125 2023-11-23 02:27:10,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2196746.6666666665, ans=0.125 2023-11-23 02:27:33,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-23 02:27:47,941 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4900, loss[loss=0.05041, simple_loss=0.06549, pruned_loss=0.008536, audio_tagging_loss=0.009128, over 16884.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09212, pruned_loss=0.01405, audio_tagging_loss=0.009452, over 3055831.55 frames. ], batch size: 65, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:27:55,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329550 2023-11-23 02:28:05,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2023-11-23 02:28:18,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2197080.0, ans=0.2 2023-11-23 02:28:22,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2197080.0, ans=0.0 2023-11-23 02:28:31,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.170e+01 8.747e+01 9.254e+01 1.120e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-23 02:28:32,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2197146.6666666665, ans=0.125 2023-11-23 02:28:51,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2197280.0, ans=0.125 2023-11-23 02:28:52,710 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 4950, loss[loss=0.0827, simple_loss=0.1173, pruned_loss=0.01646, audio_tagging_loss=0.007609, over 15087.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09208, pruned_loss=0.0139, audio_tagging_loss=0.009313, over 3049693.54 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:29:00,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329600 2023-11-23 02:29:09,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2197346.6666666665, ans=0.05 2023-11-23 02:29:25,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-23 02:29:26,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2197413.3333333335, ans=0.0 2023-11-23 02:29:27,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2023-11-23 02:29:36,637 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:29:41,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2197480.0, ans=0.125 2023-11-23 02:29:47,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2023-11-23 02:29:59,497 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5000, loss[loss=0.07311, simple_loss=0.09046, pruned_loss=0.01527, audio_tagging_loss=0.01261, over 15860.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09264, pruned_loss=0.01428, audio_tagging_loss=0.00922, over 3048261.06 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:30:07,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329650 2023-11-23 02:30:41,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.477e+01 9.069e+01 9.699e+01 1.131e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-23 02:30:50,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2197880.0, ans=0.2 2023-11-23 02:30:51,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2197880.0, ans=0.5 2023-11-23 02:30:57,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2197880.0, ans=0.0 2023-11-23 02:31:04,432 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5050, loss[loss=0.071, simple_loss=0.09937, pruned_loss=0.01509, audio_tagging_loss=0.006232, over 16139.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09252, pruned_loss=0.01427, audio_tagging_loss=0.00913, over 3048141.78 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:31:11,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329700 2023-11-23 02:31:16,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-23 02:31:18,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2198013.3333333335, ans=0.0 2023-11-23 02:31:19,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2198013.3333333335, ans=0.0 2023-11-23 02:31:30,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2198080.0, ans=0.125 2023-11-23 02:32:08,930 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5100, loss[loss=0.08352, simple_loss=0.111, pruned_loss=0.0203, audio_tagging_loss=0.007738, over 14952.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09272, pruned_loss=0.01435, audio_tagging_loss=0.009097, over 3042260.74 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:32:16,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329750 2023-11-23 02:32:24,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2198346.6666666665, ans=0.125 2023-11-23 02:32:30,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2198346.6666666665, ans=0.0 2023-11-23 02:32:37,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-23 02:32:38,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=12.0 2023-11-23 02:32:41,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2198413.3333333335, ans=0.04949747468305833 2023-11-23 02:32:51,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.231e+01 8.134e+01 8.796e+01 9.528e+01 1.100e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 02:33:04,689 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:33:13,775 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5150, loss[loss=0.07988, simple_loss=0.1097, pruned_loss=0.01616, audio_tagging_loss=0.00888, over 14974.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09277, pruned_loss=0.01448, audio_tagging_loss=0.00907, over 3047266.76 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:33:23,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329800 2023-11-23 02:33:36,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2198680.0, ans=0.1 2023-11-23 02:33:51,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2198746.6666666665, ans=0.125 2023-11-23 02:33:54,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2198813.3333333335, ans=0.07 2023-11-23 02:33:55,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2198813.3333333335, ans=0.04949747468305833 2023-11-23 02:34:08,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2198880.0, ans=0.0 2023-11-23 02:34:20,439 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5200, loss[loss=0.07759, simple_loss=0.1049, pruned_loss=0.01696, audio_tagging_loss=0.008181, over 14614.00 frames. ], tot_loss[loss=0.06991, simple_loss=0.09305, pruned_loss=0.01437, audio_tagging_loss=0.009019, over 3047642.09 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:34:27,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329850 2023-11-23 02:34:29,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2198946.6666666665, ans=0.1 2023-11-23 02:34:30,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2198946.6666666665, ans=0.0 2023-11-23 02:35:05,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.387e+01 9.079e+01 9.668e+01 1.776e+02, threshold=1.816e+02, percent-clipped=1.0 2023-11-23 02:35:23,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2199213.3333333335, ans=0.125 2023-11-23 02:35:25,317 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5250, loss[loss=0.07181, simple_loss=0.08737, pruned_loss=0.01969, audio_tagging_loss=0.008435, over 14165.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09401, pruned_loss=0.01472, audio_tagging_loss=0.008908, over 3042131.50 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:35:26,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2199280.0, ans=0.0 2023-11-23 02:35:32,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-23 02:35:32,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329900 2023-11-23 02:36:18,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2199546.6666666665, ans=0.125 2023-11-23 02:36:30,314 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5300, loss[loss=0.07613, simple_loss=0.115, pruned_loss=0.01224, audio_tagging_loss=0.006383, over 14834.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09366, pruned_loss=0.01455, audio_tagging_loss=0.00887, over 3044519.58 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:36:33,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2199613.3333333335, ans=0.1 2023-11-23 02:36:39,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 329950 2023-11-23 02:37:11,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2199813.3333333335, ans=0.2 2023-11-23 02:37:11,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2199813.3333333335, ans=0.2 2023-11-23 02:37:14,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.403e+01 8.848e+01 9.438e+01 1.127e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 02:37:33,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2199880.0, ans=0.125 2023-11-23 02:37:33,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2199880.0, ans=0.125 2023-11-23 02:37:37,488 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5350, loss[loss=0.06542, simple_loss=0.07931, pruned_loss=0.01307, audio_tagging_loss=0.0127, over 14392.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09356, pruned_loss=0.01455, audio_tagging_loss=0.008935, over 3036934.06 frames. ], batch size: 54, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:37:39,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2199946.6666666665, ans=0.05 2023-11-23 02:37:44,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330000 2023-11-23 02:38:07,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2200080.0, ans=0.125 2023-11-23 02:38:13,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2200080.0, ans=0.0 2023-11-23 02:38:14,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2200146.6666666665, ans=0.0 2023-11-23 02:38:36,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2200213.3333333335, ans=0.125 2023-11-23 02:38:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2200213.3333333335, ans=0.2 2023-11-23 02:38:42,203 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5400, loss[loss=0.06583, simple_loss=0.09609, pruned_loss=0.009001, audio_tagging_loss=0.008778, over 14976.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09411, pruned_loss=0.01461, audio_tagging_loss=0.008987, over 3047746.98 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:38:46,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2200280.0, ans=0.09899494936611666 2023-11-23 02:38:46,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2200280.0, ans=0.0 2023-11-23 02:38:49,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330050 2023-11-23 02:38:49,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2200280.0, ans=0.0 2023-11-23 02:39:07,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2200413.3333333335, ans=0.05 2023-11-23 02:39:18,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-23 02:39:20,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-23 02:39:27,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.068e+01 8.843e+01 9.502e+01 1.253e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 02:39:27,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2200480.0, ans=0.2 2023-11-23 02:39:42,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2200546.6666666665, ans=0.05 2023-11-23 02:39:42,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2200546.6666666665, ans=0.035 2023-11-23 02:39:44,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2200546.6666666665, ans=0.125 2023-11-23 02:39:47,555 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5450, loss[loss=0.04845, simple_loss=0.06399, pruned_loss=0.006094, audio_tagging_loss=0.01036, over 14378.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09352, pruned_loss=0.01456, audio_tagging_loss=0.009133, over 3045667.98 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:39:47,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2200613.3333333335, ans=0.2 2023-11-23 02:39:49,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2200613.3333333335, ans=0.2 2023-11-23 02:39:56,131 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330100 2023-11-23 02:39:56,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2200613.3333333335, ans=0.0 2023-11-23 02:40:06,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2200680.0, ans=0.125 2023-11-23 02:40:27,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2023-11-23 02:40:35,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2200813.3333333335, ans=0.07 2023-11-23 02:40:40,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2023-11-23 02:40:43,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-23 02:40:45,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2200880.0, ans=0.0 2023-11-23 02:40:53,879 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5500, loss[loss=0.06781, simple_loss=0.09158, pruned_loss=0.009548, audio_tagging_loss=0.01248, over 15338.00 frames. ], tot_loss[loss=0.07062, simple_loss=0.09344, pruned_loss=0.01468, audio_tagging_loss=0.009229, over 3048582.07 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:41:01,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330150 2023-11-23 02:41:15,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2201013.3333333335, ans=0.2 2023-11-23 02:41:24,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2201080.0, ans=0.0 2023-11-23 02:41:29,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2201080.0, ans=0.125 2023-11-23 02:41:37,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.324e+01 8.941e+01 9.835e+01 1.232e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-23 02:41:54,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2201213.3333333335, ans=0.125 2023-11-23 02:41:57,979 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5550, loss[loss=0.07525, simple_loss=0.0964, pruned_loss=0.01876, audio_tagging_loss=0.008294, over 15221.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09315, pruned_loss=0.01455, audio_tagging_loss=0.009324, over 3043050.67 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:42:05,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330200 2023-11-23 02:42:10,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2201346.6666666665, ans=0.1 2023-11-23 02:42:24,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-23 02:42:32,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-23 02:42:36,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=12.0 2023-11-23 02:42:45,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2201480.0, ans=0.0 2023-11-23 02:43:00,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-23 02:43:02,320 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5600, loss[loss=0.08236, simple_loss=0.118, pruned_loss=0.01618, audio_tagging_loss=0.007163, over 15642.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09317, pruned_loss=0.01455, audio_tagging_loss=0.009326, over 3041998.60 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:43:10,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330250 2023-11-23 02:43:11,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2201613.3333333335, ans=0.0 2023-11-23 02:43:13,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2201613.3333333335, ans=0.125 2023-11-23 02:43:20,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2201680.0, ans=0.95 2023-11-23 02:43:47,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.200e+01 8.820e+01 9.406e+01 1.283e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-23 02:43:49,088 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:43:50,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2201813.3333333335, ans=0.1 2023-11-23 02:44:08,219 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5650, loss[loss=0.05762, simple_loss=0.07527, pruned_loss=0.01098, audio_tagging_loss=0.009008, over 15262.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09258, pruned_loss=0.01444, audio_tagging_loss=0.009469, over 3045521.55 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:44:08,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2201946.6666666665, ans=0.125 2023-11-23 02:44:15,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330300 2023-11-23 02:44:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2202013.3333333335, ans=0.125 2023-11-23 02:44:31,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2202013.3333333335, ans=0.0 2023-11-23 02:44:32,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2202080.0, ans=0.125 2023-11-23 02:44:35,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2202080.0, ans=0.125 2023-11-23 02:44:43,647 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:44:53,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2202146.6666666665, ans=0.125 2023-11-23 02:44:57,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-23 02:45:11,943 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5700, loss[loss=0.07842, simple_loss=0.1136, pruned_loss=0.01339, audio_tagging_loss=0.00823, over 15602.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09266, pruned_loss=0.01442, audio_tagging_loss=0.009459, over 3045262.94 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:45:20,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330350 2023-11-23 02:45:47,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2202413.3333333335, ans=0.5 2023-11-23 02:45:57,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.129e+01 8.046e+01 8.620e+01 9.394e+01 1.276e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-23 02:46:10,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-23 02:46:16,710 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5750, loss[loss=0.0664, simple_loss=0.08831, pruned_loss=0.01532, audio_tagging_loss=0.006921, over 15563.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09242, pruned_loss=0.01444, audio_tagging_loss=0.009292, over 3044832.60 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:46:24,310 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330400 2023-11-23 02:46:28,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2202680.0, ans=0.125 2023-11-23 02:46:39,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2202680.0, ans=0.125 2023-11-23 02:46:42,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2023-11-23 02:46:44,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.21 vs. limit=10.0 2023-11-23 02:46:48,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2202746.6666666665, ans=0.2 2023-11-23 02:47:01,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2202813.3333333335, ans=0.125 2023-11-23 02:47:18,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2202880.0, ans=0.125 2023-11-23 02:47:22,027 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5800, loss[loss=0.06064, simple_loss=0.08596, pruned_loss=0.01031, audio_tagging_loss=0.007343, over 16643.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09265, pruned_loss=0.0143, audio_tagging_loss=0.009142, over 3047555.83 frames. ], batch size: 61, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:47:29,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330450 2023-11-23 02:47:33,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2203013.3333333335, ans=0.1 2023-11-23 02:47:35,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-11-23 02:47:42,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-23 02:47:46,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2203080.0, ans=0.0 2023-11-23 02:47:47,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2203080.0, ans=0.125 2023-11-23 02:48:00,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2203146.6666666665, ans=0.125 2023-11-23 02:48:02,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-23 02:48:06,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.335e+01 8.418e+01 9.019e+01 9.630e+01 1.229e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 02:48:15,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2203213.3333333335, ans=0.0 2023-11-23 02:48:19,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-23 02:48:19,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-23 02:48:21,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-23 02:48:26,307 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5850, loss[loss=0.08212, simple_loss=0.1061, pruned_loss=0.02134, audio_tagging_loss=0.007743, over 14243.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09207, pruned_loss=0.01426, audio_tagging_loss=0.009207, over 3040915.24 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:48:33,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330500 2023-11-23 02:48:53,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2203413.3333333335, ans=0.025 2023-11-23 02:49:14,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2203480.0, ans=0.0 2023-11-23 02:49:30,668 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5900, loss[loss=0.05935, simple_loss=0.07402, pruned_loss=0.007542, audio_tagging_loss=0.0148, over 14794.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09242, pruned_loss=0.01438, audio_tagging_loss=0.009181, over 3042542.74 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:49:38,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330550 2023-11-23 02:50:15,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.273e+01 9.005e+01 9.559e+01 1.170e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 02:50:26,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2203880.0, ans=0.0 2023-11-23 02:50:30,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2203880.0, ans=0.125 2023-11-23 02:50:35,416 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 5950, loss[loss=0.05427, simple_loss=0.07651, pruned_loss=0.009175, audio_tagging_loss=0.006837, over 15100.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09248, pruned_loss=0.01441, audio_tagging_loss=0.009134, over 3047998.49 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 02:50:41,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2203946.6666666665, ans=0.125 2023-11-23 02:50:43,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330600 2023-11-23 02:50:55,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2204013.3333333335, ans=0.2 2023-11-23 02:50:55,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2204013.3333333335, ans=0.1 2023-11-23 02:50:57,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-23 02:50:58,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2204013.3333333335, ans=0.0 2023-11-23 02:51:05,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2204080.0, ans=0.2 2023-11-23 02:51:13,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-11-23 02:51:37,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2204213.3333333335, ans=0.02 2023-11-23 02:51:40,707 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6000, loss[loss=0.06639, simple_loss=0.08188, pruned_loss=0.01557, audio_tagging_loss=0.00988, over 15421.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.0919, pruned_loss=0.01435, audio_tagging_loss=0.009214, over 3043265.48 frames. ], batch size: 60, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:51:40,708 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 02:52:24,758 INFO [train_asr.py:1253] (1/4) Epoch 28, validation: loss=0.05863, simple_loss=0.05128, pruned_loss=0.0051, audio_tagging_loss=0.02789, over 4681554.00 frames. 2023-11-23 02:52:24,759 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 02:52:26,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2023-11-23 02:52:33,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330650 2023-11-23 02:52:33,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2204280.0, ans=0.0 2023-11-23 02:53:10,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.180e+01 8.677e+01 9.605e+01 1.280e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-23 02:53:11,353 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 02:53:22,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=2204546.6666666665, ans=0.1 2023-11-23 02:53:30,845 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6050, loss[loss=0.07428, simple_loss=0.0948, pruned_loss=0.01589, audio_tagging_loss=0.01099, over 14877.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09122, pruned_loss=0.01414, audio_tagging_loss=0.009233, over 3046479.54 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:53:38,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330700 2023-11-23 02:53:42,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2204680.0, ans=0.0 2023-11-23 02:53:58,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2204746.6666666665, ans=0.0 2023-11-23 02:54:14,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2204813.3333333335, ans=0.1 2023-11-23 02:54:17,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2204813.3333333335, ans=0.125 2023-11-23 02:54:20,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2204813.3333333335, ans=0.1 2023-11-23 02:54:27,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2204880.0, ans=0.0 2023-11-23 02:54:32,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2204880.0, ans=0.125 2023-11-23 02:54:35,063 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6100, loss[loss=0.08088, simple_loss=0.11, pruned_loss=0.01822, audio_tagging_loss=0.007684, over 15570.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09196, pruned_loss=0.01414, audio_tagging_loss=0.009169, over 3047589.67 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:54:35,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2023-11-23 02:54:40,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2204946.6666666665, ans=0.2 2023-11-23 02:54:41,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2204946.6666666665, ans=0.125 2023-11-23 02:54:42,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330750 2023-11-23 02:55:19,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2205146.6666666665, ans=0.125 2023-11-23 02:55:21,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 7.997e+01 8.605e+01 9.181e+01 1.192e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-23 02:55:26,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2205213.3333333335, ans=0.0 2023-11-23 02:55:33,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2205213.3333333335, ans=0.1 2023-11-23 02:55:39,749 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6150, loss[loss=0.06992, simple_loss=0.0876, pruned_loss=0.0137, audio_tagging_loss=0.01242, over 15039.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.0914, pruned_loss=0.01399, audio_tagging_loss=0.009172, over 3054075.41 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:55:48,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330800 2023-11-23 02:55:55,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-23 02:56:22,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2205480.0, ans=0.1 2023-11-23 02:56:24,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 02:56:27,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2205480.0, ans=10.0 2023-11-23 02:56:44,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2205613.3333333335, ans=0.025 2023-11-23 02:56:45,207 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6200, loss[loss=0.08162, simple_loss=0.109, pruned_loss=0.01532, audio_tagging_loss=0.0118, over 16082.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09121, pruned_loss=0.01401, audio_tagging_loss=0.009189, over 3061109.92 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:56:53,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330850 2023-11-23 02:57:05,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2205680.0, ans=0.0 2023-11-23 02:57:12,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2205746.6666666665, ans=0.125 2023-11-23 02:57:17,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-23 02:57:21,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2205746.6666666665, ans=0.07 2023-11-23 02:57:30,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.249e+01 8.951e+01 9.781e+01 1.178e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 02:57:44,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2205880.0, ans=0.125 2023-11-23 02:57:49,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2205946.6666666665, ans=0.125 2023-11-23 02:57:50,228 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6250, loss[loss=0.057, simple_loss=0.0746, pruned_loss=0.009769, audio_tagging_loss=0.009931, over 15943.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09165, pruned_loss=0.01402, audio_tagging_loss=0.009236, over 3062275.73 frames. ], batch size: 60, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:57:52,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-23 02:57:57,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330900 2023-11-23 02:58:00,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2205946.6666666665, ans=10.0 2023-11-23 02:58:04,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2206013.3333333335, ans=0.1 2023-11-23 02:58:17,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2206080.0, ans=0.125 2023-11-23 02:58:27,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2206080.0, ans=0.5 2023-11-23 02:58:37,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2206146.6666666665, ans=0.0 2023-11-23 02:58:54,275 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6300, loss[loss=0.09563, simple_loss=0.1273, pruned_loss=0.02289, audio_tagging_loss=0.009079, over 14273.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09209, pruned_loss=0.0143, audio_tagging_loss=0.009315, over 3053803.17 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 02:59:01,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 330950 2023-11-23 02:59:10,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2206346.6666666665, ans=0.035 2023-11-23 02:59:14,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2206346.6666666665, ans=0.125 2023-11-23 02:59:21,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2206413.3333333335, ans=0.125 2023-11-23 02:59:29,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2206413.3333333335, ans=0.125 2023-11-23 02:59:40,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 8.187e+01 8.713e+01 9.379e+01 1.255e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-23 02:59:41,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2206480.0, ans=0.125 2023-11-23 02:59:59,632 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6350, loss[loss=0.05335, simple_loss=0.06631, pruned_loss=0.009365, audio_tagging_loss=0.01083, over 14328.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09226, pruned_loss=0.01444, audio_tagging_loss=0.009523, over 3056870.90 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:00:08,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331000 2023-11-23 03:00:11,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2206613.3333333335, ans=0.125 2023-11-23 03:00:15,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2206680.0, ans=0.2 2023-11-23 03:00:29,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2206746.6666666665, ans=0.5 2023-11-23 03:00:37,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2023-11-23 03:01:01,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2206880.0, ans=0.0 2023-11-23 03:01:05,856 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6400, loss[loss=0.06767, simple_loss=0.09331, pruned_loss=0.01137, audio_tagging_loss=0.009639, over 14681.00 frames. ], tot_loss[loss=0.0705, simple_loss=0.09287, pruned_loss=0.0146, audio_tagging_loss=0.009474, over 3046917.49 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:01:08,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2206946.6666666665, ans=0.025 2023-11-23 03:01:09,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2206946.6666666665, ans=0.125 2023-11-23 03:01:13,434 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331050 2023-11-23 03:01:17,418 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:01:25,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-23 03:01:25,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2207013.3333333335, ans=0.125 2023-11-23 03:01:37,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2207080.0, ans=0.125 2023-11-23 03:01:50,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2207146.6666666665, ans=0.07 2023-11-23 03:01:52,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.045e+01 8.704e+01 9.425e+01 1.351e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-23 03:02:09,957 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6450, loss[loss=0.08459, simple_loss=0.1208, pruned_loss=0.01742, audio_tagging_loss=0.006746, over 16052.00 frames. ], tot_loss[loss=0.07031, simple_loss=0.0924, pruned_loss=0.01452, audio_tagging_loss=0.009586, over 3040324.60 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:02:14,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-23 03:02:17,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331100 2023-11-23 03:02:22,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=12.0 2023-11-23 03:02:45,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-23 03:03:15,313 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6500, loss[loss=0.06678, simple_loss=0.09458, pruned_loss=0.01061, audio_tagging_loss=0.008881, over 15645.00 frames. ], tot_loss[loss=0.07058, simple_loss=0.09266, pruned_loss=0.01466, audio_tagging_loss=0.009584, over 3044462.04 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:03:23,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331150 2023-11-23 03:03:24,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-11-23 03:03:43,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2207746.6666666665, ans=0.125 2023-11-23 03:03:53,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-23 03:04:01,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 8.162e+01 8.865e+01 9.552e+01 1.149e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 03:04:20,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2207946.6666666665, ans=0.125 2023-11-23 03:04:21,225 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6550, loss[loss=0.05051, simple_loss=0.06647, pruned_loss=0.00788, audio_tagging_loss=0.009397, over 14604.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09257, pruned_loss=0.0145, audio_tagging_loss=0.009375, over 3044452.07 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:04:22,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2207946.6666666665, ans=0.125 2023-11-23 03:04:22,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2207946.6666666665, ans=0.125 2023-11-23 03:04:28,971 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331200 2023-11-23 03:04:35,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2208013.3333333335, ans=0.125 2023-11-23 03:04:46,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-23 03:04:48,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2208080.0, ans=0.1 2023-11-23 03:05:00,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2208146.6666666665, ans=0.125 2023-11-23 03:05:14,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2208213.3333333335, ans=0.95 2023-11-23 03:05:20,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2208213.3333333335, ans=0.125 2023-11-23 03:05:25,297 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6600, loss[loss=0.06931, simple_loss=0.08343, pruned_loss=0.01633, audio_tagging_loss=0.01126, over 14034.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09358, pruned_loss=0.01473, audio_tagging_loss=0.009211, over 3049350.23 frames. ], batch size: 54, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:05:27,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2208280.0, ans=15.0 2023-11-23 03:05:32,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331250 2023-11-23 03:05:36,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2208346.6666666665, ans=0.125 2023-11-23 03:05:50,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2208413.3333333335, ans=0.125 2023-11-23 03:05:58,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2208413.3333333335, ans=0.125 2023-11-23 03:06:12,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.179e+01 8.913e+01 9.440e+01 1.641e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 03:06:27,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2208546.6666666665, ans=0.025 2023-11-23 03:06:29,557 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6650, loss[loss=0.08362, simple_loss=0.1251, pruned_loss=0.01652, audio_tagging_loss=0.004534, over 14827.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09382, pruned_loss=0.01471, audio_tagging_loss=0.009138, over 3048486.02 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:06:37,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331300 2023-11-23 03:06:44,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2208680.0, ans=0.05 2023-11-23 03:07:18,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-23 03:07:20,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2208880.0, ans=0.2 2023-11-23 03:07:29,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2208880.0, ans=0.2 2023-11-23 03:07:32,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2208880.0, ans=0.125 2023-11-23 03:07:34,628 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6700, loss[loss=0.06984, simple_loss=0.08934, pruned_loss=0.01716, audio_tagging_loss=0.008012, over 14541.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09292, pruned_loss=0.01439, audio_tagging_loss=0.009162, over 3049436.88 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:07:43,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331350 2023-11-23 03:08:04,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2209080.0, ans=0.125 2023-11-23 03:08:22,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.251e+01 8.844e+01 9.540e+01 1.359e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 03:08:32,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2023-11-23 03:08:40,172 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6750, loss[loss=0.06943, simple_loss=0.08943, pruned_loss=0.01536, audio_tagging_loss=0.009355, over 15432.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09231, pruned_loss=0.01433, audio_tagging_loss=0.009253, over 3042046.58 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:08:47,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331400 2023-11-23 03:08:54,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=22.5 2023-11-23 03:09:11,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2209413.3333333335, ans=10.0 2023-11-23 03:09:27,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2209480.0, ans=0.5 2023-11-23 03:09:28,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=10.0 2023-11-23 03:09:32,741 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:09:45,059 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6800, loss[loss=0.05545, simple_loss=0.07041, pruned_loss=0.009513, audio_tagging_loss=0.01073, over 14478.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09238, pruned_loss=0.01422, audio_tagging_loss=0.009177, over 3039319.35 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:09:49,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2209613.3333333335, ans=0.125 2023-11-23 03:09:53,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331450 2023-11-23 03:10:00,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2209680.0, ans=0.2 2023-11-23 03:10:03,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2209680.0, ans=0.125 2023-11-23 03:10:05,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2209680.0, ans=0.125 2023-11-23 03:10:10,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2209746.6666666665, ans=0.05 2023-11-23 03:10:15,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2209746.6666666665, ans=0.0 2023-11-23 03:10:31,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.090e+01 8.741e+01 9.523e+01 1.182e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 03:10:34,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2209813.3333333335, ans=0.125 2023-11-23 03:10:51,113 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6850, loss[loss=0.05139, simple_loss=0.05967, pruned_loss=0.008885, audio_tagging_loss=0.01267, over 14747.00 frames. ], tot_loss[loss=0.06948, simple_loss=0.0922, pruned_loss=0.01418, audio_tagging_loss=0.009201, over 3037782.72 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:10:58,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331500 2023-11-23 03:11:18,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2210080.0, ans=0.1 2023-11-23 03:11:23,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2210080.0, ans=0.125 2023-11-23 03:11:56,072 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6900, loss[loss=0.06967, simple_loss=0.08828, pruned_loss=0.01471, audio_tagging_loss=0.01082, over 14977.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09187, pruned_loss=0.01403, audio_tagging_loss=0.009115, over 3037941.93 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:12:00,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2210280.0, ans=0.125 2023-11-23 03:12:03,596 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331550 2023-11-23 03:12:31,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2210413.3333333335, ans=0.125 2023-11-23 03:12:43,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.867e+01 8.094e+01 8.690e+01 9.402e+01 1.184e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-23 03:12:44,875 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 03:12:53,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2210546.6666666665, ans=0.1 2023-11-23 03:12:54,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2210546.6666666665, ans=0.125 2023-11-23 03:13:00,461 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 6950, loss[loss=0.06007, simple_loss=0.08507, pruned_loss=0.009643, audio_tagging_loss=0.007894, over 15281.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09273, pruned_loss=0.01408, audio_tagging_loss=0.00901, over 3038365.85 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:13:07,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331600 2023-11-23 03:13:12,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2210680.0, ans=0.125 2023-11-23 03:13:27,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2210746.6666666665, ans=0.0 2023-11-23 03:13:36,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2210746.6666666665, ans=0.125 2023-11-23 03:13:46,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-23 03:13:54,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2210880.0, ans=0.125 2023-11-23 03:13:59,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2210880.0, ans=0.1 2023-11-23 03:14:06,050 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7000, loss[loss=0.072, simple_loss=0.09408, pruned_loss=0.01679, audio_tagging_loss=0.008174, over 14803.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09323, pruned_loss=0.01431, audio_tagging_loss=0.009011, over 3038999.06 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:14:14,220 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331650 2023-11-23 03:14:25,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-23 03:14:53,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.294e+01 8.963e+01 9.673e+01 1.557e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 03:14:59,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-11-23 03:15:02,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2211213.3333333335, ans=0.0 2023-11-23 03:15:10,761 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7050, loss[loss=0.07588, simple_loss=0.102, pruned_loss=0.01509, audio_tagging_loss=0.009813, over 15874.00 frames. ], tot_loss[loss=0.06993, simple_loss=0.09304, pruned_loss=0.01435, audio_tagging_loss=0.009053, over 3032328.72 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:15:16,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2211280.0, ans=0.125 2023-11-23 03:15:16,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2211280.0, ans=0.07 2023-11-23 03:15:18,650 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331700 2023-11-23 03:15:33,631 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:15:38,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2211413.3333333335, ans=0.1 2023-11-23 03:16:04,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2023-11-23 03:16:07,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2211546.6666666665, ans=0.125 2023-11-23 03:16:14,846 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7100, loss[loss=0.08267, simple_loss=0.1122, pruned_loss=0.01749, audio_tagging_loss=0.00906, over 15634.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09317, pruned_loss=0.01425, audio_tagging_loss=0.009121, over 3039643.24 frames. ], batch size: 55, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:16:22,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331750 2023-11-23 03:16:56,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2211813.3333333335, ans=0.2 2023-11-23 03:17:02,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.441e+01 9.062e+01 9.663e+01 1.297e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-23 03:17:02,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2211813.3333333335, ans=0.125 2023-11-23 03:17:03,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-23 03:17:19,044 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7150, loss[loss=0.06723, simple_loss=0.09139, pruned_loss=0.01302, audio_tagging_loss=0.008519, over 15115.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.0934, pruned_loss=0.01428, audio_tagging_loss=0.009145, over 3045569.35 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 16.0 2023-11-23 03:17:26,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331800 2023-11-23 03:17:40,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2212013.3333333335, ans=0.125 2023-11-23 03:17:45,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2212080.0, ans=0.1 2023-11-23 03:17:48,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-23 03:18:16,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2212213.3333333335, ans=0.1 2023-11-23 03:18:20,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2212213.3333333335, ans=0.125 2023-11-23 03:18:22,447 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7200, loss[loss=0.06605, simple_loss=0.08775, pruned_loss=0.01552, audio_tagging_loss=0.006657, over 14573.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.09288, pruned_loss=0.01419, audio_tagging_loss=0.009246, over 3048619.55 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:18:24,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-23 03:18:29,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331850 2023-11-23 03:18:47,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2212413.3333333335, ans=0.125 2023-11-23 03:18:50,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2212413.3333333335, ans=0.125 2023-11-23 03:18:54,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2212413.3333333335, ans=0.95 2023-11-23 03:19:06,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2212480.0, ans=0.0 2023-11-23 03:19:08,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.113e+01 8.957e+01 9.713e+01 1.213e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 03:19:13,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2212546.6666666665, ans=0.125 2023-11-23 03:19:21,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2023-11-23 03:19:22,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2212546.6666666665, ans=0.125 2023-11-23 03:19:24,862 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7250, loss[loss=0.05813, simple_loss=0.08426, pruned_loss=0.007238, audio_tagging_loss=0.00876, over 15182.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09354, pruned_loss=0.01429, audio_tagging_loss=0.009203, over 3047203.94 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2023-11-23 03:19:32,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331900 2023-11-23 03:19:58,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2212746.6666666665, ans=0.125 2023-11-23 03:20:28,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2212946.6666666665, ans=0.0 2023-11-23 03:20:29,101 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7300, loss[loss=0.09166, simple_loss=0.1245, pruned_loss=0.02325, audio_tagging_loss=0.006174, over 14312.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.09321, pruned_loss=0.01422, audio_tagging_loss=0.009268, over 3046782.70 frames. ], batch size: 52, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:20:30,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2212946.6666666665, ans=0.0 2023-11-23 03:20:30,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2212946.6666666665, ans=0.04949747468305833 2023-11-23 03:20:36,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 331950 2023-11-23 03:20:42,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2213013.3333333335, ans=0.09899494936611666 2023-11-23 03:21:09,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2213146.6666666665, ans=10.0 2023-11-23 03:21:15,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 8.147e+01 8.723e+01 9.402e+01 1.286e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-23 03:21:17,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:21:30,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2213213.3333333335, ans=0.2 2023-11-23 03:21:33,049 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7350, loss[loss=0.06162, simple_loss=0.07768, pruned_loss=0.01341, audio_tagging_loss=0.009369, over 15966.00 frames. ], tot_loss[loss=0.06997, simple_loss=0.09291, pruned_loss=0.01431, audio_tagging_loss=0.009206, over 3043862.73 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:21:34,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2213280.0, ans=10.0 2023-11-23 03:21:35,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2213280.0, ans=0.125 2023-11-23 03:21:39,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2213280.0, ans=0.2 2023-11-23 03:21:40,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332000 2023-11-23 03:22:00,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2213413.3333333335, ans=0.0 2023-11-23 03:22:32,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2213546.6666666665, ans=0.125 2023-11-23 03:22:39,809 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7400, loss[loss=0.0849, simple_loss=0.1126, pruned_loss=0.01966, audio_tagging_loss=0.008964, over 16283.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09237, pruned_loss=0.01419, audio_tagging_loss=0.009204, over 3040590.00 frames. ], batch size: 62, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:22:47,294 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332050 2023-11-23 03:22:52,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2213680.0, ans=0.125 2023-11-23 03:22:54,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2213680.0, ans=0.0 2023-11-23 03:22:55,853 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:22:57,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-23 03:23:03,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2213680.0, ans=0.2 2023-11-23 03:23:13,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2213746.6666666665, ans=0.125 2023-11-23 03:23:28,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.451e+01 8.955e+01 9.673e+01 1.193e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 03:23:44,629 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7450, loss[loss=0.0597, simple_loss=0.07579, pruned_loss=0.0117, audio_tagging_loss=0.01011, over 15153.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09223, pruned_loss=0.01427, audio_tagging_loss=0.009119, over 3044774.99 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:23:50,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2213946.6666666665, ans=0.0 2023-11-23 03:23:52,467 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332100 2023-11-23 03:24:21,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2214146.6666666665, ans=0.1 2023-11-23 03:24:24,873 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:24:27,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-23 03:24:34,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2214213.3333333335, ans=0.125 2023-11-23 03:24:49,007 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7500, loss[loss=0.0709, simple_loss=0.09594, pruned_loss=0.01553, audio_tagging_loss=0.007392, over 15890.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09233, pruned_loss=0.01413, audio_tagging_loss=0.009067, over 3051650.93 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:24:56,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332150 2023-11-23 03:25:03,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2214346.6666666665, ans=0.125 2023-11-23 03:25:20,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2214413.3333333335, ans=0.125 2023-11-23 03:25:26,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2214480.0, ans=0.125 2023-11-23 03:25:37,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.184e+01 8.938e+01 9.763e+01 1.268e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-23 03:25:40,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2214546.6666666665, ans=0.125 2023-11-23 03:25:52,771 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7550, loss[loss=0.07386, simple_loss=0.09944, pruned_loss=0.01567, audio_tagging_loss=0.008472, over 15939.00 frames. ], tot_loss[loss=0.06964, simple_loss=0.09289, pruned_loss=0.01422, audio_tagging_loss=0.008983, over 3052751.52 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:25:54,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-23 03:26:00,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332200 2023-11-23 03:26:03,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-11-23 03:26:06,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2023-11-23 03:26:07,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2214680.0, ans=0.0 2023-11-23 03:26:09,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2214680.0, ans=0.2 2023-11-23 03:26:55,767 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7600, loss[loss=0.0477, simple_loss=0.0639, pruned_loss=0.008968, audio_tagging_loss=0.006782, over 14287.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09249, pruned_loss=0.0143, audio_tagging_loss=0.00899, over 3052895.59 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:26:56,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2214946.6666666665, ans=0.125 2023-11-23 03:27:04,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332250 2023-11-23 03:27:04,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2214946.6666666665, ans=0.0 2023-11-23 03:27:10,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2215013.3333333335, ans=0.0 2023-11-23 03:27:15,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2215013.3333333335, ans=0.2 2023-11-23 03:27:20,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2215013.3333333335, ans=0.1 2023-11-23 03:27:21,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2215080.0, ans=0.2 2023-11-23 03:27:26,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2215080.0, ans=0.125 2023-11-23 03:27:28,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2215080.0, ans=0.125 2023-11-23 03:27:30,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2215080.0, ans=0.125 2023-11-23 03:27:31,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-23 03:27:35,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-11-23 03:27:44,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.025e+01 8.532e+01 9.208e+01 1.132e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-23 03:28:01,618 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7650, loss[loss=0.07543, simple_loss=0.107, pruned_loss=0.01491, audio_tagging_loss=0.007019, over 14196.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09236, pruned_loss=0.01437, audio_tagging_loss=0.009101, over 3047570.65 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:28:09,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332300 2023-11-23 03:28:14,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-11-23 03:28:34,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2215413.3333333335, ans=0.125 2023-11-23 03:28:56,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2215546.6666666665, ans=0.5 2023-11-23 03:29:05,103 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7700, loss[loss=0.06293, simple_loss=0.08409, pruned_loss=0.01158, audio_tagging_loss=0.009299, over 16615.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09242, pruned_loss=0.01429, audio_tagging_loss=0.009081, over 3045177.71 frames. ], batch size: 64, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:29:11,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2215613.3333333335, ans=0.125 2023-11-23 03:29:12,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332350 2023-11-23 03:29:24,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-11-23 03:29:52,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2215813.3333333335, ans=0.125 2023-11-23 03:29:55,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.060e+01 8.592e+01 9.630e+01 1.129e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-23 03:29:57,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2215880.0, ans=0.125 2023-11-23 03:30:01,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-23 03:30:08,621 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7750, loss[loss=0.06945, simple_loss=0.08873, pruned_loss=0.01438, audio_tagging_loss=0.0107, over 15137.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09254, pruned_loss=0.01431, audio_tagging_loss=0.009071, over 3041303.43 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:30:16,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332400 2023-11-23 03:30:19,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2215946.6666666665, ans=0.0 2023-11-23 03:31:14,407 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7800, loss[loss=0.06388, simple_loss=0.08195, pruned_loss=0.01393, audio_tagging_loss=0.008972, over 14602.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09182, pruned_loss=0.01422, audio_tagging_loss=0.009146, over 3038992.34 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:31:18,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2216280.0, ans=0.07 2023-11-23 03:31:21,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332450 2023-11-23 03:32:04,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.076e+01 8.334e+01 8.987e+01 9.736e+01 1.242e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 03:32:18,082 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7850, loss[loss=0.1032, simple_loss=0.1395, pruned_loss=0.02721, audio_tagging_loss=0.006276, over 16551.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.0926, pruned_loss=0.01432, audio_tagging_loss=0.009195, over 3045059.79 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:32:18,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2216613.3333333335, ans=0.5 2023-11-23 03:32:22,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=22.5 2023-11-23 03:32:25,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332500 2023-11-23 03:32:27,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2216613.3333333335, ans=0.0 2023-11-23 03:33:00,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2216813.3333333335, ans=0.0 2023-11-23 03:33:16,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2216880.0, ans=0.1 2023-11-23 03:33:17,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2216880.0, ans=0.125 2023-11-23 03:33:21,802 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7900, loss[loss=0.06964, simple_loss=0.08987, pruned_loss=0.01439, audio_tagging_loss=0.01032, over 15495.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09231, pruned_loss=0.01437, audio_tagging_loss=0.009308, over 3048972.95 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:33:25,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2216946.6666666665, ans=0.1 2023-11-23 03:33:29,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332550 2023-11-23 03:33:41,082 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.504e-03 2023-11-23 03:33:47,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2217080.0, ans=0.2 2023-11-23 03:34:08,080 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:34:08,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2217146.6666666665, ans=0.1 2023-11-23 03:34:11,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.317e+01 8.969e+01 9.609e+01 1.150e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-23 03:34:26,048 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 7950, loss[loss=0.05399, simple_loss=0.06996, pruned_loss=0.00742, audio_tagging_loss=0.01159, over 16583.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09356, pruned_loss=0.01451, audio_tagging_loss=0.009338, over 3054251.29 frames. ], batch size: 62, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:34:33,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332600 2023-11-23 03:34:40,827 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 03:34:55,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2217413.3333333335, ans=0.0 2023-11-23 03:35:00,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-11-23 03:35:02,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2217480.0, ans=10.0 2023-11-23 03:35:03,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2217480.0, ans=0.125 2023-11-23 03:35:15,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2217480.0, ans=0.2 2023-11-23 03:35:16,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2217546.6666666665, ans=0.125 2023-11-23 03:35:16,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2217546.6666666665, ans=0.0 2023-11-23 03:35:26,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-23 03:35:30,610 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8000, loss[loss=0.06551, simple_loss=0.08552, pruned_loss=0.01324, audio_tagging_loss=0.009511, over 14951.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09309, pruned_loss=0.0144, audio_tagging_loss=0.009411, over 3049068.32 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:35:37,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332650 2023-11-23 03:35:42,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2217680.0, ans=0.125 2023-11-23 03:35:55,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2217746.6666666665, ans=0.1 2023-11-23 03:36:01,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2217746.6666666665, ans=0.125 2023-11-23 03:36:20,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.081e+01 8.720e+01 9.413e+01 1.910e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-23 03:36:24,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2023-11-23 03:36:27,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2217880.0, ans=0.2 2023-11-23 03:36:33,474 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8050, loss[loss=0.06298, simple_loss=0.08544, pruned_loss=0.01021, audio_tagging_loss=0.01005, over 14442.00 frames. ], tot_loss[loss=0.07035, simple_loss=0.09295, pruned_loss=0.01442, audio_tagging_loss=0.009456, over 3052345.12 frames. ], batch size: 54, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:36:40,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332700 2023-11-23 03:37:11,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-11-23 03:37:37,078 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8100, loss[loss=0.07206, simple_loss=0.08634, pruned_loss=0.01652, audio_tagging_loss=0.01237, over 14755.00 frames. ], tot_loss[loss=0.07037, simple_loss=0.0931, pruned_loss=0.01438, audio_tagging_loss=0.00944, over 3047047.27 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:37:45,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332750 2023-11-23 03:38:06,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2218413.3333333335, ans=0.0 2023-11-23 03:38:18,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-11-23 03:38:25,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2218480.0, ans=0.1 2023-11-23 03:38:27,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.342e+01 8.867e+01 9.602e+01 1.203e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 03:38:40,751 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8150, loss[loss=0.08696, simple_loss=0.1132, pruned_loss=0.02136, audio_tagging_loss=0.009027, over 15261.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09359, pruned_loss=0.01455, audio_tagging_loss=0.009296, over 3047744.98 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:38:48,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332800 2023-11-23 03:39:05,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-23 03:39:12,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2218746.6666666665, ans=0.2 2023-11-23 03:39:40,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2218880.0, ans=0.125 2023-11-23 03:39:45,649 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8200, loss[loss=0.09244, simple_loss=0.1305, pruned_loss=0.02135, audio_tagging_loss=0.005816, over 15213.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09305, pruned_loss=0.01438, audio_tagging_loss=0.00918, over 3046616.16 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:39:45,681 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 03:39:49,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2218946.6666666665, ans=0.0 2023-11-23 03:39:52,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2218946.6666666665, ans=0.2 2023-11-23 03:39:53,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332850 2023-11-23 03:40:00,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2219013.3333333335, ans=0.125 2023-11-23 03:40:11,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-23 03:40:15,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2219080.0, ans=0.07 2023-11-23 03:40:16,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2023-11-23 03:40:35,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2219213.3333333335, ans=0.125 2023-11-23 03:40:37,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.334e+01 8.200e+01 8.804e+01 9.476e+01 1.277e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-23 03:40:47,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2219213.3333333335, ans=0.1 2023-11-23 03:40:49,906 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8250, loss[loss=0.06557, simple_loss=0.08837, pruned_loss=0.01282, audio_tagging_loss=0.008573, over 15234.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09321, pruned_loss=0.01431, audio_tagging_loss=0.009127, over 3045959.16 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:40:58,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332900 2023-11-23 03:41:21,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2219413.3333333335, ans=0.0 2023-11-23 03:41:21,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2219413.3333333335, ans=0.125 2023-11-23 03:41:24,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-11-23 03:41:41,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2219546.6666666665, ans=0.125 2023-11-23 03:41:54,266 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8300, loss[loss=0.06426, simple_loss=0.081, pruned_loss=0.012, audio_tagging_loss=0.01176, over 14206.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09211, pruned_loss=0.01426, audio_tagging_loss=0.009159, over 3046399.37 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:41:55,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2219613.3333333335, ans=0.2 2023-11-23 03:41:56,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2219613.3333333335, ans=0.07 2023-11-23 03:41:56,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2219613.3333333335, ans=0.0 2023-11-23 03:42:01,588 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 332950 2023-11-23 03:42:01,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2219613.3333333335, ans=0.0 2023-11-23 03:42:08,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-23 03:42:19,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2219746.6666666665, ans=0.0 2023-11-23 03:42:22,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2219746.6666666665, ans=0.125 2023-11-23 03:42:38,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2219813.3333333335, ans=0.0 2023-11-23 03:42:46,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.213e+01 8.531e+01 9.088e+01 9.532e+01 2.273e+02, threshold=1.818e+02, percent-clipped=2.0 2023-11-23 03:42:47,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2219880.0, ans=0.125 2023-11-23 03:42:54,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2219880.0, ans=0.0 2023-11-23 03:42:57,834 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8350, loss[loss=0.05979, simple_loss=0.08839, pruned_loss=0.00882, audio_tagging_loss=0.006781, over 15802.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09235, pruned_loss=0.01429, audio_tagging_loss=0.009034, over 3046951.70 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 8.0 2023-11-23 03:43:05,972 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333000 2023-11-23 03:43:13,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-23 03:43:22,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2220013.3333333335, ans=0.2 2023-11-23 03:43:28,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2220080.0, ans=0.0 2023-11-23 03:43:30,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2220080.0, ans=0.125 2023-11-23 03:43:46,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2023-11-23 03:44:03,101 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8400, loss[loss=0.05371, simple_loss=0.06086, pruned_loss=0.0114, audio_tagging_loss=0.01188, over 13944.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09201, pruned_loss=0.01407, audio_tagging_loss=0.009069, over 3048559.19 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:44:10,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333050 2023-11-23 03:44:15,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2220346.6666666665, ans=0.125 2023-11-23 03:44:27,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2220413.3333333335, ans=0.5 2023-11-23 03:44:37,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2220413.3333333335, ans=0.025 2023-11-23 03:44:55,758 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 7.977e+01 8.728e+01 9.553e+01 1.214e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-23 03:45:07,854 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8450, loss[loss=0.07658, simple_loss=0.1029, pruned_loss=0.01804, audio_tagging_loss=0.007112, over 16131.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09143, pruned_loss=0.01398, audio_tagging_loss=0.009155, over 3043252.10 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:45:15,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333100 2023-11-23 03:45:44,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2220746.6666666665, ans=0.2 2023-11-23 03:45:45,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-23 03:46:12,215 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8500, loss[loss=0.06165, simple_loss=0.06717, pruned_loss=0.01535, audio_tagging_loss=0.01271, over 14161.00 frames. ], tot_loss[loss=0.06909, simple_loss=0.09189, pruned_loss=0.01404, audio_tagging_loss=0.009098, over 3045068.00 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:46:19,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333150 2023-11-23 03:46:19,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2220946.6666666665, ans=0.0 2023-11-23 03:46:27,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2221013.3333333335, ans=0.1 2023-11-23 03:47:03,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2221213.3333333335, ans=0.125 2023-11-23 03:47:03,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.153e+01 8.814e+01 9.449e+01 1.976e+02, threshold=1.763e+02, percent-clipped=1.0 2023-11-23 03:47:15,495 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8550, loss[loss=0.06251, simple_loss=0.07815, pruned_loss=0.01336, audio_tagging_loss=0.01007, over 16366.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09233, pruned_loss=0.01414, audio_tagging_loss=0.009182, over 3046541.75 frames. ], batch size: 63, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:47:20,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2221280.0, ans=0.125 2023-11-23 03:47:23,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333200 2023-11-23 03:47:49,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2221413.3333333335, ans=0.0 2023-11-23 03:47:57,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2221480.0, ans=0.1 2023-11-23 03:48:17,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2221546.6666666665, ans=0.1 2023-11-23 03:48:19,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2221613.3333333335, ans=0.0 2023-11-23 03:48:20,343 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8600, loss[loss=0.05774, simple_loss=0.07587, pruned_loss=0.009995, audio_tagging_loss=0.00981, over 15733.00 frames. ], tot_loss[loss=0.06952, simple_loss=0.09225, pruned_loss=0.01421, audio_tagging_loss=0.009187, over 3043409.59 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:48:20,787 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:48:27,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333250 2023-11-23 03:48:28,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-11-23 03:48:32,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2023-11-23 03:48:36,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2023-11-23 03:48:38,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2221680.0, ans=0.125 2023-11-23 03:48:38,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2221680.0, ans=0.125 2023-11-23 03:49:12,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.558e+01 9.096e+01 9.715e+01 2.705e+02, threshold=1.819e+02, percent-clipped=1.0 2023-11-23 03:49:18,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2221880.0, ans=0.125 2023-11-23 03:49:23,116 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8650, loss[loss=0.05725, simple_loss=0.06471, pruned_loss=0.0122, audio_tagging_loss=0.0127, over 14862.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09263, pruned_loss=0.0145, audio_tagging_loss=0.009206, over 3045554.68 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:49:30,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333300 2023-11-23 03:49:42,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2222013.3333333335, ans=0.125 2023-11-23 03:50:03,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-23 03:50:05,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2222146.6666666665, ans=0.125 2023-11-23 03:50:25,864 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8700, loss[loss=0.07691, simple_loss=0.1072, pruned_loss=0.01312, audio_tagging_loss=0.0102, over 15052.00 frames. ], tot_loss[loss=0.0708, simple_loss=0.09345, pruned_loss=0.01474, audio_tagging_loss=0.009329, over 3051540.21 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:50:27,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222280.0, ans=0.1 2023-11-23 03:50:29,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2222280.0, ans=0.1 2023-11-23 03:50:34,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333350 2023-11-23 03:50:39,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2222346.6666666665, ans=0.125 2023-11-23 03:50:52,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2222413.3333333335, ans=0.0 2023-11-23 03:50:59,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2222413.3333333335, ans=0.2 2023-11-23 03:50:59,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-23 03:51:17,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.537e+01 9.054e+01 9.922e+01 1.181e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 03:51:20,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=12.0 2023-11-23 03:51:30,112 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8750, loss[loss=0.0586, simple_loss=0.08084, pruned_loss=0.01018, audio_tagging_loss=0.007997, over 14592.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09318, pruned_loss=0.01451, audio_tagging_loss=0.009358, over 3048111.80 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:51:37,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333400 2023-11-23 03:51:56,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=2222746.6666666665, ans=12.0 2023-11-23 03:52:10,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2222813.3333333335, ans=0.0 2023-11-23 03:52:10,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222813.3333333335, ans=0.1 2023-11-23 03:52:23,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-23 03:52:33,773 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8800, loss[loss=0.08968, simple_loss=0.1207, pruned_loss=0.01907, audio_tagging_loss=0.01027, over 15628.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09338, pruned_loss=0.01462, audio_tagging_loss=0.009359, over 3047620.89 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:52:37,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2222946.6666666665, ans=0.0 2023-11-23 03:52:41,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333450 2023-11-23 03:53:10,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2223080.0, ans=0.2 2023-11-23 03:53:15,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2223146.6666666665, ans=0.125 2023-11-23 03:53:16,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2223146.6666666665, ans=0.1 2023-11-23 03:53:26,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.235e+01 8.890e+01 9.571e+01 2.057e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-23 03:53:26,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2223213.3333333335, ans=0.0 2023-11-23 03:53:37,217 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8850, loss[loss=0.08979, simple_loss=0.1235, pruned_loss=0.0221, audio_tagging_loss=0.00596, over 15260.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09348, pruned_loss=0.01459, audio_tagging_loss=0.009381, over 3050458.02 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:53:45,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333500 2023-11-23 03:53:48,744 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 03:54:04,318 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 03:54:26,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2223480.0, ans=0.125 2023-11-23 03:54:26,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-23 03:54:32,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2223546.6666666665, ans=0.125 2023-11-23 03:54:40,977 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8900, loss[loss=0.05907, simple_loss=0.07797, pruned_loss=0.01089, audio_tagging_loss=0.009188, over 15603.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09399, pruned_loss=0.01461, audio_tagging_loss=0.009239, over 3049127.26 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:54:50,371 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333550 2023-11-23 03:55:00,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2223680.0, ans=0.0 2023-11-23 03:55:24,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2223813.3333333335, ans=0.125 2023-11-23 03:55:34,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.471e+01 9.014e+01 9.931e+01 1.268e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 03:55:39,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2223880.0, ans=0.2 2023-11-23 03:55:46,438 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 8950, loss[loss=0.07225, simple_loss=0.1005, pruned_loss=0.01352, audio_tagging_loss=0.008488, over 15162.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.09331, pruned_loss=0.01444, audio_tagging_loss=0.009189, over 3048087.09 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:55:53,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333600 2023-11-23 03:56:50,582 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9000, loss[loss=0.08114, simple_loss=0.1101, pruned_loss=0.01555, audio_tagging_loss=0.01056, over 16016.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.09329, pruned_loss=0.01422, audio_tagging_loss=0.009207, over 3047760.21 frames. ], batch size: 63, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 03:56:50,584 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 03:57:11,624 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6075, 4.3532, 3.7567, 4.2405], device='cuda:1') 2023-11-23 03:57:33,759 INFO [train_asr.py:1253] (1/4) Epoch 28, validation: loss=0.05919, simple_loss=0.05113, pruned_loss=0.004978, audio_tagging_loss=0.02865, over 4681554.00 frames. 2023-11-23 03:57:33,760 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 03:57:41,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333650 2023-11-23 03:57:43,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2224280.0, ans=0.125 2023-11-23 03:57:48,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-23 03:57:55,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-23 03:58:09,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-23 03:58:28,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.337e+01 8.839e+01 9.721e+01 1.226e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-23 03:58:37,955 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9050, loss[loss=0.07916, simple_loss=0.1094, pruned_loss=0.01544, audio_tagging_loss=0.009002, over 15277.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09419, pruned_loss=0.01434, audio_tagging_loss=0.00904, over 3050142.02 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:58:38,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2224613.3333333335, ans=0.1 2023-11-23 03:58:45,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333700 2023-11-23 03:58:52,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2224680.0, ans=0.0 2023-11-23 03:59:11,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2224746.6666666665, ans=0.0 2023-11-23 03:59:19,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-23 03:59:37,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2224880.0, ans=22.5 2023-11-23 03:59:41,503 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9100, loss[loss=0.05708, simple_loss=0.06926, pruned_loss=0.01196, audio_tagging_loss=0.01049, over 14889.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.09374, pruned_loss=0.01424, audio_tagging_loss=0.008961, over 3048217.00 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 03:59:49,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333750 2023-11-23 03:59:53,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2225013.3333333335, ans=0.1 2023-11-23 04:00:02,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2225013.3333333335, ans=0.0 2023-11-23 04:00:04,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2225013.3333333335, ans=0.1 2023-11-23 04:00:30,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2225146.6666666665, ans=0.2 2023-11-23 04:00:34,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.276e+01 8.928e+01 9.613e+01 1.240e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 04:00:42,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2225213.3333333335, ans=0.125 2023-11-23 04:00:46,268 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9150, loss[loss=0.06864, simple_loss=0.08838, pruned_loss=0.0169, audio_tagging_loss=0.007547, over 14410.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.093, pruned_loss=0.01419, audio_tagging_loss=0.008938, over 3049842.33 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:00:54,071 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333800 2023-11-23 04:01:01,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2225346.6666666665, ans=0.125 2023-11-23 04:01:13,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2225413.3333333335, ans=0.125 2023-11-23 04:01:16,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2225413.3333333335, ans=0.125 2023-11-23 04:01:20,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2023-11-23 04:01:28,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2225480.0, ans=0.0 2023-11-23 04:01:41,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2225546.6666666665, ans=0.0 2023-11-23 04:01:50,721 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9200, loss[loss=0.07759, simple_loss=0.1084, pruned_loss=0.01753, audio_tagging_loss=0.005865, over 15135.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09384, pruned_loss=0.01459, audio_tagging_loss=0.008885, over 3051915.15 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:01:58,098 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333850 2023-11-23 04:02:33,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2225813.3333333335, ans=0.0 2023-11-23 04:02:44,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2023-11-23 04:02:44,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.331e+01 8.795e+01 9.315e+01 1.180e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 04:02:53,980 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9250, loss[loss=0.05382, simple_loss=0.0739, pruned_loss=0.008737, audio_tagging_loss=0.008136, over 16726.00 frames. ], tot_loss[loss=0.06971, simple_loss=0.09293, pruned_loss=0.01434, audio_tagging_loss=0.008909, over 3052568.43 frames. ], batch size: 65, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:03:01,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333900 2023-11-23 04:03:01,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2225946.6666666665, ans=0.125 2023-11-23 04:03:04,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2023-11-23 04:03:24,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2226080.0, ans=0.125 2023-11-23 04:03:36,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-11-23 04:03:43,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2226146.6666666665, ans=0.125 2023-11-23 04:03:58,069 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9300, loss[loss=0.07018, simple_loss=0.09802, pruned_loss=0.01397, audio_tagging_loss=0.007192, over 15178.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09329, pruned_loss=0.01431, audio_tagging_loss=0.008889, over 3059441.76 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:04:00,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2226280.0, ans=0.125 2023-11-23 04:04:00,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-23 04:04:06,704 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 333950 2023-11-23 04:04:39,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2023-11-23 04:04:43,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2226480.0, ans=0.2 2023-11-23 04:04:52,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.259e+01 8.847e+01 9.648e+01 1.420e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 04:05:02,496 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9350, loss[loss=0.07965, simple_loss=0.1058, pruned_loss=0.01833, audio_tagging_loss=0.008409, over 15895.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09324, pruned_loss=0.01425, audio_tagging_loss=0.008974, over 3057292.27 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:05:08,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2226613.3333333335, ans=0.0 2023-11-23 04:05:10,429 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334000 2023-11-23 04:05:45,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2226813.3333333335, ans=0.0 2023-11-23 04:05:48,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2226813.3333333335, ans=0.0 2023-11-23 04:05:55,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2226880.0, ans=0.025 2023-11-23 04:05:56,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2226880.0, ans=0.0 2023-11-23 04:05:58,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2226880.0, ans=0.125 2023-11-23 04:05:58,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2226880.0, ans=0.0 2023-11-23 04:06:06,800 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9400, loss[loss=0.06966, simple_loss=0.08713, pruned_loss=0.01354, audio_tagging_loss=0.01255, over 15387.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09314, pruned_loss=0.01424, audio_tagging_loss=0.009133, over 3060979.55 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:06:14,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334050 2023-11-23 04:06:19,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2227013.3333333335, ans=0.1 2023-11-23 04:06:25,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2227013.3333333335, ans=0.2 2023-11-23 04:06:39,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2227080.0, ans=0.125 2023-11-23 04:06:53,233 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:06:53,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2227146.6666666665, ans=0.125 2023-11-23 04:07:00,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.264e+01 9.005e+01 9.544e+01 1.394e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 04:07:07,473 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:07:10,611 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9450, loss[loss=0.06273, simple_loss=0.08748, pruned_loss=0.009543, audio_tagging_loss=0.009451, over 14878.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09341, pruned_loss=0.0144, audio_tagging_loss=0.009157, over 3058144.20 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:07:18,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334100 2023-11-23 04:07:27,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:08:11,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2227546.6666666665, ans=0.1 2023-11-23 04:08:15,168 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9500, loss[loss=0.04866, simple_loss=0.06704, pruned_loss=0.004568, audio_tagging_loss=0.01057, over 14794.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09359, pruned_loss=0.01445, audio_tagging_loss=0.009218, over 3044951.19 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:08:17,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2227613.3333333335, ans=0.1 2023-11-23 04:08:22,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334150 2023-11-23 04:08:33,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2227680.0, ans=0.0 2023-11-23 04:08:42,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2227746.6666666665, ans=0.2 2023-11-23 04:08:44,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2227746.6666666665, ans=0.125 2023-11-23 04:08:51,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2227746.6666666665, ans=0.125 2023-11-23 04:09:03,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2023-11-23 04:09:09,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.198e+01 8.793e+01 9.435e+01 1.623e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 04:09:11,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2227880.0, ans=0.125 2023-11-23 04:09:19,592 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9550, loss[loss=0.06492, simple_loss=0.09655, pruned_loss=0.006773, audio_tagging_loss=0.009868, over 14750.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09386, pruned_loss=0.01448, audio_tagging_loss=0.009313, over 3047808.41 frames. ], batch size: 54, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:09:20,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-23 04:09:26,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334200 2023-11-23 04:09:50,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2228080.0, ans=0.125 2023-11-23 04:10:05,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2228146.6666666665, ans=0.0 2023-11-23 04:10:23,892 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9600, loss[loss=0.05487, simple_loss=0.07634, pruned_loss=0.006817, audio_tagging_loss=0.009882, over 15618.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.09412, pruned_loss=0.01442, audio_tagging_loss=0.009356, over 3046946.03 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:10:31,903 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334250 2023-11-23 04:10:47,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2228346.6666666665, ans=0.125 2023-11-23 04:10:47,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-23 04:10:58,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2228413.3333333335, ans=0.0 2023-11-23 04:11:17,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.563e+01 8.386e+01 9.005e+01 9.855e+01 1.264e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 04:11:23,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2228546.6666666665, ans=0.2 2023-11-23 04:11:28,314 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9650, loss[loss=0.06884, simple_loss=0.09247, pruned_loss=0.0136, audio_tagging_loss=0.009013, over 15198.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09439, pruned_loss=0.01451, audio_tagging_loss=0.009288, over 3051787.57 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:11:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2228613.3333333335, ans=0.0 2023-11-23 04:11:35,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334300 2023-11-23 04:11:35,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2228613.3333333335, ans=0.2 2023-11-23 04:11:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2228680.0, ans=0.0 2023-11-23 04:11:40,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2228680.0, ans=0.0 2023-11-23 04:11:40,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2228680.0, ans=0.125 2023-11-23 04:11:46,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2228680.0, ans=0.0 2023-11-23 04:11:58,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2023-11-23 04:12:25,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2228880.0, ans=0.1 2023-11-23 04:12:31,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2228946.6666666665, ans=0.0 2023-11-23 04:12:31,994 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9700, loss[loss=0.1048, simple_loss=0.1431, pruned_loss=0.02522, audio_tagging_loss=0.008027, over 16253.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09482, pruned_loss=0.01454, audio_tagging_loss=0.00914, over 3053103.35 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:12:33,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2228946.6666666665, ans=0.125 2023-11-23 04:12:37,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-11-23 04:12:39,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334350 2023-11-23 04:13:00,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-23 04:13:22,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2229213.3333333335, ans=0.95 2023-11-23 04:13:23,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2229213.3333333335, ans=0.125 2023-11-23 04:13:27,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.051e+01 8.815e+01 9.310e+01 1.256e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 04:13:36,184 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9750, loss[loss=0.06304, simple_loss=0.08137, pruned_loss=0.01381, audio_tagging_loss=0.008548, over 14939.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.09435, pruned_loss=0.01442, audio_tagging_loss=0.009052, over 3047084.51 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:13:41,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-23 04:13:44,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334400 2023-11-23 04:13:59,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2229346.6666666665, ans=0.035 2023-11-23 04:14:14,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2229480.0, ans=0.0 2023-11-23 04:14:41,478 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9800, loss[loss=0.07456, simple_loss=0.09965, pruned_loss=0.01553, audio_tagging_loss=0.009202, over 16078.00 frames. ], tot_loss[loss=0.07062, simple_loss=0.09415, pruned_loss=0.01454, audio_tagging_loss=0.009006, over 3043467.08 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:14:45,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2229613.3333333335, ans=0.07 2023-11-23 04:14:48,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334450 2023-11-23 04:15:00,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2229680.0, ans=0.125 2023-11-23 04:15:06,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-23 04:15:20,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2229813.3333333335, ans=0.95 2023-11-23 04:15:20,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2229813.3333333335, ans=0.1 2023-11-23 04:15:23,820 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:15:29,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2229813.3333333335, ans=0.2 2023-11-23 04:15:36,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.531e+01 9.197e+01 9.757e+01 1.250e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-23 04:15:37,526 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:15:42,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2229880.0, ans=0.2 2023-11-23 04:15:45,035 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9850, loss[loss=0.07124, simple_loss=0.08895, pruned_loss=0.01366, audio_tagging_loss=0.01311, over 14873.00 frames. ], tot_loss[loss=0.07032, simple_loss=0.09353, pruned_loss=0.01455, audio_tagging_loss=0.009005, over 3039559.73 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:15:52,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334500 2023-11-23 04:16:35,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2230213.3333333335, ans=0.125 2023-11-23 04:16:48,577 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9900, loss[loss=0.08209, simple_loss=0.1148, pruned_loss=0.0197, audio_tagging_loss=0.004976, over 15109.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09294, pruned_loss=0.01426, audio_tagging_loss=0.009066, over 3040488.77 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:16:49,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2230280.0, ans=0.0 2023-11-23 04:16:54,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2230280.0, ans=0.125 2023-11-23 04:16:57,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334550 2023-11-23 04:16:58,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2230280.0, ans=0.0 2023-11-23 04:17:08,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2230346.6666666665, ans=0.125 2023-11-23 04:17:10,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-23 04:17:35,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2023-11-23 04:17:44,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 8.266e+01 8.858e+01 9.665e+01 1.141e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 04:17:53,600 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 9950, loss[loss=0.06191, simple_loss=0.08691, pruned_loss=0.01262, audio_tagging_loss=0.005828, over 13904.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09228, pruned_loss=0.01427, audio_tagging_loss=0.009071, over 3036495.47 frames. ], batch size: 54, lr: 2.42e-03, grad_scale: 16.0 2023-11-23 04:18:00,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334600 2023-11-23 04:18:34,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2230813.3333333335, ans=0.125 2023-11-23 04:18:35,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-23 04:18:37,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2230813.3333333335, ans=0.2 2023-11-23 04:18:45,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2230880.0, ans=0.125 2023-11-23 04:18:47,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2230880.0, ans=0.125 2023-11-23 04:18:57,017 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10000, loss[loss=0.07779, simple_loss=0.1067, pruned_loss=0.01827, audio_tagging_loss=0.006172, over 14550.00 frames. ], tot_loss[loss=0.0696, simple_loss=0.09253, pruned_loss=0.01444, audio_tagging_loss=0.008896, over 3037476.09 frames. ], batch size: 54, lr: 2.42e-03, grad_scale: 32.0 2023-11-23 04:19:04,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334650 2023-11-23 04:19:29,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2231080.0, ans=0.2 2023-11-23 04:19:38,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2231146.6666666665, ans=0.2 2023-11-23 04:19:41,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2231146.6666666665, ans=0.125 2023-11-23 04:19:51,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2231213.3333333335, ans=0.0 2023-11-23 04:19:51,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.342e+01 8.814e+01 9.358e+01 1.133e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 04:19:58,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2231213.3333333335, ans=0.125 2023-11-23 04:20:00,549 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10050, loss[loss=0.07336, simple_loss=0.09349, pruned_loss=0.01704, audio_tagging_loss=0.009581, over 15013.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09255, pruned_loss=0.01429, audio_tagging_loss=0.008974, over 3047036.59 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:20:08,034 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334700 2023-11-23 04:20:36,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-23 04:21:06,037 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10100, loss[loss=0.09058, simple_loss=0.1269, pruned_loss=0.01838, audio_tagging_loss=0.00876, over 15862.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09165, pruned_loss=0.01406, audio_tagging_loss=0.00912, over 3053270.74 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:21:13,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334750 2023-11-23 04:21:25,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-23 04:21:26,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2231680.0, ans=0.125 2023-11-23 04:21:28,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2231680.0, ans=0.0 2023-11-23 04:21:39,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2231746.6666666665, ans=0.0 2023-11-23 04:21:40,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2231746.6666666665, ans=0.125 2023-11-23 04:21:50,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-23 04:21:57,917 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:22:02,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.395e+01 8.740e+01 9.651e+01 1.218e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 04:22:04,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2023-11-23 04:22:10,052 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10150, loss[loss=0.07737, simple_loss=0.1054, pruned_loss=0.01715, audio_tagging_loss=0.007499, over 16666.00 frames. ], tot_loss[loss=0.06939, simple_loss=0.09219, pruned_loss=0.01419, audio_tagging_loss=0.009111, over 3054420.14 frames. ], batch size: 63, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:22:10,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2231946.6666666665, ans=0.0 2023-11-23 04:22:13,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2231946.6666666665, ans=0.125 2023-11-23 04:22:17,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334800 2023-11-23 04:22:32,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2232013.3333333335, ans=0.125 2023-11-23 04:22:39,037 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:22:51,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2232146.6666666665, ans=0.0 2023-11-23 04:22:57,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2232146.6666666665, ans=0.125 2023-11-23 04:23:13,267 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10200, loss[loss=0.06926, simple_loss=0.0854, pruned_loss=0.01466, audio_tagging_loss=0.01191, over 15971.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09172, pruned_loss=0.01408, audio_tagging_loss=0.009231, over 3052575.87 frames. ], batch size: 61, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:23:13,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2232280.0, ans=0.125 2023-11-23 04:23:18,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2232280.0, ans=0.0 2023-11-23 04:23:20,567 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334850 2023-11-23 04:23:30,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2232346.6666666665, ans=0.015 2023-11-23 04:23:30,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-23 04:23:37,525 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:23:42,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-23 04:23:43,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2232413.3333333335, ans=0.125 2023-11-23 04:23:49,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2232413.3333333335, ans=0.05 2023-11-23 04:23:51,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-23 04:23:55,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2232480.0, ans=0.0 2023-11-23 04:24:07,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2232546.6666666665, ans=0.09899494936611666 2023-11-23 04:24:09,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.292e+01 8.929e+01 9.861e+01 1.255e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 04:24:17,274 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10250, loss[loss=0.08926, simple_loss=0.1089, pruned_loss=0.0242, audio_tagging_loss=0.01063, over 15056.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.09228, pruned_loss=0.01418, audio_tagging_loss=0.009306, over 3050637.81 frames. ], batch size: 58, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:24:25,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334900 2023-11-23 04:24:26,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2023-11-23 04:24:48,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-23 04:24:53,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2232746.6666666665, ans=0.95 2023-11-23 04:25:00,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2232813.3333333335, ans=0.125 2023-11-23 04:25:04,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2232813.3333333335, ans=0.015 2023-11-23 04:25:09,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2232880.0, ans=0.125 2023-11-23 04:25:14,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2232880.0, ans=0.125 2023-11-23 04:25:22,427 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10300, loss[loss=0.06955, simple_loss=0.08961, pruned_loss=0.01333, audio_tagging_loss=0.01142, over 16269.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09231, pruned_loss=0.01424, audio_tagging_loss=0.009328, over 3049358.48 frames. ], batch size: 61, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:25:29,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 334950 2023-11-23 04:26:06,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2233146.6666666665, ans=0.125 2023-11-23 04:26:08,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2233146.6666666665, ans=0.125 2023-11-23 04:26:19,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.243e+01 9.019e+01 9.717e+01 1.166e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 04:26:25,955 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10350, loss[loss=0.05173, simple_loss=0.06907, pruned_loss=0.007087, audio_tagging_loss=0.01011, over 15596.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09282, pruned_loss=0.01437, audio_tagging_loss=0.009359, over 3053435.73 frames. ], batch size: 60, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:26:27,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2233280.0, ans=0.0 2023-11-23 04:26:30,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2233280.0, ans=0.95 2023-11-23 04:26:33,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335000 2023-11-23 04:26:37,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2233346.6666666665, ans=0.125 2023-11-23 04:26:51,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2233413.3333333335, ans=0.0 2023-11-23 04:27:09,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2233480.0, ans=0.125 2023-11-23 04:27:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2233480.0, ans=0.1 2023-11-23 04:27:21,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2233546.6666666665, ans=0.125 2023-11-23 04:27:30,462 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10400, loss[loss=0.07411, simple_loss=0.1071, pruned_loss=0.01446, audio_tagging_loss=0.006116, over 14339.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09303, pruned_loss=0.01431, audio_tagging_loss=0.00945, over 3047542.72 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:27:30,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2233613.3333333335, ans=0.04949747468305833 2023-11-23 04:27:39,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335050 2023-11-23 04:27:44,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2233680.0, ans=0.05 2023-11-23 04:27:58,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-23 04:28:08,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2233813.3333333335, ans=0.125 2023-11-23 04:28:10,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2233813.3333333335, ans=0.125 2023-11-23 04:28:13,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2233813.3333333335, ans=0.2 2023-11-23 04:28:28,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2023-11-23 04:28:30,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.149e+01 8.861e+01 9.460e+01 1.224e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 04:28:35,411 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10450, loss[loss=0.05565, simple_loss=0.07891, pruned_loss=0.007545, audio_tagging_loss=0.00865, over 15163.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09308, pruned_loss=0.01436, audio_tagging_loss=0.009421, over 3039037.17 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:28:40,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2233946.6666666665, ans=0.5 2023-11-23 04:28:43,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335100 2023-11-23 04:29:08,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-11-23 04:29:19,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2234146.6666666665, ans=0.125 2023-11-23 04:29:39,651 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10500, loss[loss=0.06703, simple_loss=0.09268, pruned_loss=0.01346, audio_tagging_loss=0.007224, over 14284.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09176, pruned_loss=0.01399, audio_tagging_loss=0.009271, over 3039196.03 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:29:47,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335150 2023-11-23 04:29:54,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2234346.6666666665, ans=0.1 2023-11-23 04:29:57,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2234346.6666666665, ans=0.1 2023-11-23 04:30:03,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2234346.6666666665, ans=0.0 2023-11-23 04:30:25,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2234480.0, ans=0.1 2023-11-23 04:30:34,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2234546.6666666665, ans=0.125 2023-11-23 04:30:36,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-23 04:30:37,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.079e+01 8.755e+01 9.423e+01 1.196e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-23 04:30:42,940 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10550, loss[loss=0.06288, simple_loss=0.08591, pruned_loss=0.008095, audio_tagging_loss=0.01183, over 14335.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.0915, pruned_loss=0.01393, audio_tagging_loss=0.009198, over 3034528.70 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:30:51,463 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335200 2023-11-23 04:30:54,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2234613.3333333335, ans=0.0 2023-11-23 04:31:04,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2234680.0, ans=0.2 2023-11-23 04:31:27,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2234813.3333333335, ans=0.2 2023-11-23 04:31:45,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-23 04:31:47,908 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10600, loss[loss=0.07131, simple_loss=0.08483, pruned_loss=0.01797, audio_tagging_loss=0.01092, over 15091.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09124, pruned_loss=0.01382, audio_tagging_loss=0.009197, over 3028391.26 frames. ], batch size: 58, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:31:55,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335250 2023-11-23 04:32:01,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2235013.3333333335, ans=0.125 2023-11-23 04:32:24,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-23 04:32:25,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2235146.6666666665, ans=0.0 2023-11-23 04:32:41,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2235213.3333333335, ans=0.125 2023-11-23 04:32:46,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.415e+01 8.292e+01 8.923e+01 9.577e+01 1.197e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 04:32:51,942 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10650, loss[loss=0.06493, simple_loss=0.08909, pruned_loss=0.01474, audio_tagging_loss=0.005649, over 14213.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09217, pruned_loss=0.01404, audio_tagging_loss=0.009141, over 3027522.64 frames. ], batch size: 53, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:32:52,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2235280.0, ans=0.1 2023-11-23 04:32:58,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2235280.0, ans=0.125 2023-11-23 04:32:59,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335300 2023-11-23 04:33:05,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2235346.6666666665, ans=0.0 2023-11-23 04:33:14,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2235346.6666666665, ans=0.125 2023-11-23 04:33:14,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2235346.6666666665, ans=0.125 2023-11-23 04:33:30,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-23 04:33:34,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2235480.0, ans=0.125 2023-11-23 04:33:54,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2235613.3333333335, ans=0.0 2023-11-23 04:33:55,242 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10700, loss[loss=0.05776, simple_loss=0.07632, pruned_loss=0.012, audio_tagging_loss=0.0076, over 14610.00 frames. ], tot_loss[loss=0.06948, simple_loss=0.09246, pruned_loss=0.01416, audio_tagging_loss=0.009089, over 3024741.59 frames. ], batch size: 57, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:34:03,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335350 2023-11-23 04:34:04,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2235613.3333333335, ans=0.0 2023-11-23 04:34:26,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-23 04:34:35,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2235813.3333333335, ans=0.125 2023-11-23 04:34:40,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2235813.3333333335, ans=0.0 2023-11-23 04:34:54,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.303e+01 8.893e+01 9.663e+01 2.175e+02, threshold=1.779e+02, percent-clipped=1.0 2023-11-23 04:34:58,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-23 04:35:00,007 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10750, loss[loss=0.07303, simple_loss=0.1014, pruned_loss=0.0144, audio_tagging_loss=0.007927, over 14861.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09277, pruned_loss=0.01423, audio_tagging_loss=0.00898, over 3024416.93 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:35:07,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335400 2023-11-23 04:35:12,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2236013.3333333335, ans=0.125 2023-11-23 04:35:25,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-23 04:35:40,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2236146.6666666665, ans=0.125 2023-11-23 04:35:45,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2236146.6666666665, ans=0.05 2023-11-23 04:36:01,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2236213.3333333335, ans=0.125 2023-11-23 04:36:01,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2236213.3333333335, ans=0.0 2023-11-23 04:36:03,427 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10800, loss[loss=0.06836, simple_loss=0.08951, pruned_loss=0.01386, audio_tagging_loss=0.009742, over 15335.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09208, pruned_loss=0.01413, audio_tagging_loss=0.009088, over 3027135.64 frames. ], batch size: 57, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:36:05,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2236280.0, ans=0.125 2023-11-23 04:36:11,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335450 2023-11-23 04:36:22,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2236346.6666666665, ans=0.05 2023-11-23 04:36:29,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2236413.3333333335, ans=0.125 2023-11-23 04:36:40,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2236413.3333333335, ans=0.2 2023-11-23 04:36:51,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2236480.0, ans=0.1 2023-11-23 04:36:53,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2236546.6666666665, ans=0.0 2023-11-23 04:37:02,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.072e+01 8.725e+01 9.352e+01 1.267e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-23 04:37:03,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2236546.6666666665, ans=0.1 2023-11-23 04:37:07,367 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10850, loss[loss=0.06765, simple_loss=0.0927, pruned_loss=0.01279, audio_tagging_loss=0.008506, over 14547.00 frames. ], tot_loss[loss=0.0696, simple_loss=0.09256, pruned_loss=0.01423, audio_tagging_loss=0.009083, over 3033328.53 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:37:15,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335500 2023-11-23 04:37:39,639 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:37:53,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2236813.3333333335, ans=0.125 2023-11-23 04:38:05,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-11-23 04:38:07,673 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:38:11,724 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10900, loss[loss=0.05009, simple_loss=0.06618, pruned_loss=0.007578, audio_tagging_loss=0.009425, over 15276.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09286, pruned_loss=0.01436, audio_tagging_loss=0.009038, over 3043301.38 frames. ], batch size: 59, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:38:14,460 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:38:19,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335550 2023-11-23 04:38:39,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2237080.0, ans=0.1 2023-11-23 04:38:50,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2237146.6666666665, ans=0.125 2023-11-23 04:38:52,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2237146.6666666665, ans=0.1 2023-11-23 04:39:10,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.435e+01 8.359e+01 8.884e+01 9.760e+01 1.616e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-23 04:39:14,987 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 10950, loss[loss=0.05827, simple_loss=0.07492, pruned_loss=0.01023, audio_tagging_loss=0.01058, over 13681.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09239, pruned_loss=0.01421, audio_tagging_loss=0.009143, over 3040203.15 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:39:22,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335600 2023-11-23 04:39:22,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2237280.0, ans=0.0 2023-11-23 04:39:36,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2237346.6666666665, ans=0.0 2023-11-23 04:40:05,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2237546.6666666665, ans=0.0 2023-11-23 04:40:11,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-11-23 04:40:19,229 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11000, loss[loss=0.09689, simple_loss=0.1314, pruned_loss=0.02499, audio_tagging_loss=0.006218, over 15150.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09223, pruned_loss=0.01411, audio_tagging_loss=0.009286, over 3040067.84 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:40:26,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335650 2023-11-23 04:40:28,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2237613.3333333335, ans=0.0 2023-11-23 04:40:29,651 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:40:49,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2237746.6666666665, ans=0.0 2023-11-23 04:40:55,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2237746.6666666665, ans=0.025 2023-11-23 04:41:14,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2237880.0, ans=0.125 2023-11-23 04:41:19,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.268e+01 8.870e+01 9.647e+01 1.227e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 04:41:22,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2237946.6666666665, ans=0.125 2023-11-23 04:41:23,736 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11050, loss[loss=0.07678, simple_loss=0.1044, pruned_loss=0.01619, audio_tagging_loss=0.008367, over 16593.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09223, pruned_loss=0.01418, audio_tagging_loss=0.009368, over 3042871.54 frames. ], batch size: 60, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:41:31,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335700 2023-11-23 04:41:34,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2237946.6666666665, ans=0.2 2023-11-23 04:41:52,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2238080.0, ans=0.125 2023-11-23 04:42:27,701 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11100, loss[loss=0.05919, simple_loss=0.06749, pruned_loss=0.01442, audio_tagging_loss=0.01103, over 16002.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09175, pruned_loss=0.01408, audio_tagging_loss=0.009474, over 3045226.06 frames. ], batch size: 60, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:42:35,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335750 2023-11-23 04:42:40,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2238346.6666666665, ans=0.0 2023-11-23 04:42:55,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2238413.3333333335, ans=0.0 2023-11-23 04:43:05,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-23 04:43:19,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-23 04:43:23,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2238546.6666666665, ans=0.125 2023-11-23 04:43:24,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=15.0 2023-11-23 04:43:27,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.341e+01 8.890e+01 9.757e+01 1.222e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-23 04:43:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2238613.3333333335, ans=0.2 2023-11-23 04:43:31,099 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11150, loss[loss=0.06681, simple_loss=0.09221, pruned_loss=0.00974, audio_tagging_loss=0.01096, over 15651.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09187, pruned_loss=0.01408, audio_tagging_loss=0.009588, over 3043176.79 frames. ], batch size: 59, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 04:43:34,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2023-11-23 04:43:39,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335800 2023-11-23 04:44:01,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2238746.6666666665, ans=0.0 2023-11-23 04:44:08,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2238746.6666666665, ans=0.1 2023-11-23 04:44:22,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2238880.0, ans=0.125 2023-11-23 04:44:35,988 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11200, loss[loss=0.08531, simple_loss=0.1139, pruned_loss=0.02063, audio_tagging_loss=0.007739, over 15869.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09116, pruned_loss=0.01393, audio_tagging_loss=0.009641, over 3041569.05 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:44:40,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2238946.6666666665, ans=0.2 2023-11-23 04:44:42,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=22.5 2023-11-23 04:44:44,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335850 2023-11-23 04:45:05,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2239080.0, ans=0.0 2023-11-23 04:45:07,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2239080.0, ans=0.1 2023-11-23 04:45:11,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2239080.0, ans=0.125 2023-11-23 04:45:16,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2239146.6666666665, ans=0.125 2023-11-23 04:45:17,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2239146.6666666665, ans=0.04949747468305833 2023-11-23 04:45:30,540 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:45:30,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2239213.3333333335, ans=0.02 2023-11-23 04:45:36,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.304e+01 9.043e+01 9.933e+01 1.318e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 04:45:37,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2239213.3333333335, ans=0.0 2023-11-23 04:45:40,648 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11250, loss[loss=0.07341, simple_loss=0.09752, pruned_loss=0.01569, audio_tagging_loss=0.008957, over 15368.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09144, pruned_loss=0.01404, audio_tagging_loss=0.009506, over 3051520.99 frames. ], batch size: 58, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:45:48,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335900 2023-11-23 04:46:16,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-11-23 04:46:21,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-23 04:46:37,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-23 04:46:44,931 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11300, loss[loss=0.08123, simple_loss=0.1054, pruned_loss=0.01953, audio_tagging_loss=0.009013, over 14866.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09183, pruned_loss=0.01415, audio_tagging_loss=0.009293, over 3048094.47 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:46:52,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 335950 2023-11-23 04:47:41,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2239880.0, ans=0.04949747468305833 2023-11-23 04:47:43,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.333e+01 8.937e+01 9.779e+01 1.201e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 04:47:47,555 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11350, loss[loss=0.07235, simple_loss=0.102, pruned_loss=0.01322, audio_tagging_loss=0.008123, over 15896.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.0929, pruned_loss=0.01426, audio_tagging_loss=0.009168, over 3051082.18 frames. ], batch size: 59, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:47:56,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336000 2023-11-23 04:48:14,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2240013.3333333335, ans=0.125 2023-11-23 04:48:45,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2240213.3333333335, ans=0.125 2023-11-23 04:48:46,945 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:48:56,654 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11400, loss[loss=0.08405, simple_loss=0.1131, pruned_loss=0.01764, audio_tagging_loss=0.009857, over 14748.00 frames. ], tot_loss[loss=0.06964, simple_loss=0.09287, pruned_loss=0.0141, audio_tagging_loss=0.009107, over 3045811.00 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:48:58,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2240280.0, ans=0.1 2023-11-23 04:49:03,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336050 2023-11-23 04:49:18,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2240346.6666666665, ans=0.125 2023-11-23 04:49:27,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2240413.3333333335, ans=0.0 2023-11-23 04:49:46,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2240546.6666666665, ans=0.1 2023-11-23 04:49:55,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2240546.6666666665, ans=0.125 2023-11-23 04:49:56,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.358e+01 8.216e+01 8.830e+01 9.675e+01 1.432e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 04:50:00,122 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11450, loss[loss=0.07633, simple_loss=0.1027, pruned_loss=0.01716, audio_tagging_loss=0.007834, over 15670.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09205, pruned_loss=0.01394, audio_tagging_loss=0.0091, over 3043755.02 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:50:07,506 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336100 2023-11-23 04:51:02,790 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11500, loss[loss=0.08067, simple_loss=0.1113, pruned_loss=0.01973, audio_tagging_loss=0.005297, over 15502.00 frames. ], tot_loss[loss=0.06916, simple_loss=0.09203, pruned_loss=0.01409, audio_tagging_loss=0.009049, over 3045756.75 frames. ], batch size: 57, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:51:04,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.59 vs. limit=10.0 2023-11-23 04:51:09,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2240946.6666666665, ans=0.0 2023-11-23 04:51:11,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336150 2023-11-23 04:51:12,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-23 04:51:25,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2241013.3333333335, ans=0.0 2023-11-23 04:51:37,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2241080.0, ans=0.1 2023-11-23 04:51:37,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2023-11-23 04:51:38,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2241080.0, ans=0.0 2023-11-23 04:52:01,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2241213.3333333335, ans=0.2 2023-11-23 04:52:03,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2241213.3333333335, ans=0.125 2023-11-23 04:52:04,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.510e+01 8.416e+01 8.974e+01 9.531e+01 1.779e+02, threshold=1.795e+02, percent-clipped=1.0 2023-11-23 04:52:08,717 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11550, loss[loss=0.05659, simple_loss=0.06283, pruned_loss=0.01272, audio_tagging_loss=0.01246, over 15317.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09197, pruned_loss=0.01412, audio_tagging_loss=0.009048, over 3046964.41 frames. ], batch size: 60, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:52:09,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2241280.0, ans=0.0 2023-11-23 04:52:13,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2241280.0, ans=0.125 2023-11-23 04:52:16,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336200 2023-11-23 04:52:20,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2241346.6666666665, ans=0.125 2023-11-23 04:52:27,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-23 04:52:28,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2241346.6666666665, ans=0.0 2023-11-23 04:52:41,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2241413.3333333335, ans=0.0 2023-11-23 04:52:46,495 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 04:53:04,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-23 04:53:06,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2241546.6666666665, ans=0.1 2023-11-23 04:53:06,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2241546.6666666665, ans=0.125 2023-11-23 04:53:12,010 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11600, loss[loss=0.0645, simple_loss=0.08945, pruned_loss=0.0114, audio_tagging_loss=0.008377, over 15643.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09249, pruned_loss=0.01423, audio_tagging_loss=0.009027, over 3042238.02 frames. ], batch size: 58, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:53:19,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336250 2023-11-23 04:53:20,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2241613.3333333335, ans=0.0 2023-11-23 04:53:54,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2241813.3333333335, ans=0.0 2023-11-23 04:54:05,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2241880.0, ans=0.125 2023-11-23 04:54:12,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.276e+01 8.228e+01 9.009e+01 9.457e+01 1.103e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 04:54:13,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2241880.0, ans=0.0 2023-11-23 04:54:15,856 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11650, loss[loss=0.06801, simple_loss=0.08872, pruned_loss=0.01516, audio_tagging_loss=0.008492, over 16340.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.0927, pruned_loss=0.01424, audio_tagging_loss=0.009001, over 3052723.87 frames. ], batch size: 62, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:54:20,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2241946.6666666665, ans=0.125 2023-11-23 04:54:23,836 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336300 2023-11-23 04:54:31,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2242013.3333333335, ans=0.0 2023-11-23 04:54:32,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-11-23 04:54:39,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-23 04:54:44,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2242080.0, ans=0.0 2023-11-23 04:54:53,131 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 04:55:02,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2242146.6666666665, ans=0.125 2023-11-23 04:55:06,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2242213.3333333335, ans=0.125 2023-11-23 04:55:20,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2242280.0, ans=0.0 2023-11-23 04:55:21,576 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11700, loss[loss=0.04915, simple_loss=0.06795, pruned_loss=0.007926, audio_tagging_loss=0.007254, over 14698.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09276, pruned_loss=0.01433, audio_tagging_loss=0.009042, over 3051859.73 frames. ], batch size: 59, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:55:29,446 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336350 2023-11-23 04:55:32,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2242280.0, ans=0.1 2023-11-23 04:55:33,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2242346.6666666665, ans=0.125 2023-11-23 04:55:35,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2242346.6666666665, ans=0.2 2023-11-23 04:55:39,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2242346.6666666665, ans=0.0 2023-11-23 04:55:44,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-23 04:55:51,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2242413.3333333335, ans=0.0 2023-11-23 04:55:58,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2023-11-23 04:56:13,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-11-23 04:56:21,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.295e+01 8.847e+01 9.516e+01 1.261e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 04:56:25,732 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11750, loss[loss=0.07289, simple_loss=0.09397, pruned_loss=0.01824, audio_tagging_loss=0.007667, over 14904.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09261, pruned_loss=0.01433, audio_tagging_loss=0.009053, over 3049016.22 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:56:33,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336400 2023-11-23 04:56:34,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2242613.3333333335, ans=0.0 2023-11-23 04:56:40,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-11-23 04:56:42,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2242680.0, ans=0.0 2023-11-23 04:56:51,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2023-11-23 04:57:08,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-23 04:57:13,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2242813.3333333335, ans=0.125 2023-11-23 04:57:17,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2242880.0, ans=0.5 2023-11-23 04:57:18,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2242880.0, ans=0.125 2023-11-23 04:57:29,433 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11800, loss[loss=0.05893, simple_loss=0.07159, pruned_loss=0.01125, audio_tagging_loss=0.01189, over 15734.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09196, pruned_loss=0.01433, audio_tagging_loss=0.009152, over 3048884.13 frames. ], batch size: 63, lr: 2.41e-03, grad_scale: 32.0 2023-11-23 04:57:30,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2242946.6666666665, ans=0.125 2023-11-23 04:57:33,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2242946.6666666665, ans=0.0 2023-11-23 04:57:36,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336450 2023-11-23 04:57:38,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2242946.6666666665, ans=0.2 2023-11-23 04:57:44,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-23 04:57:47,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-23 04:57:56,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2023-11-23 04:58:31,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.315e+01 8.837e+01 9.445e+01 1.101e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 04:58:33,796 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11850, loss[loss=0.07091, simple_loss=0.09605, pruned_loss=0.01454, audio_tagging_loss=0.008348, over 15216.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.093, pruned_loss=0.01468, audio_tagging_loss=0.009284, over 3047767.43 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:58:38,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2243280.0, ans=0.04949747468305833 2023-11-23 04:58:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2243280.0, ans=0.0 2023-11-23 04:58:41,612 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336500 2023-11-23 04:58:51,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-23 04:59:12,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2243480.0, ans=0.0 2023-11-23 04:59:16,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2243480.0, ans=0.2 2023-11-23 04:59:17,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2243480.0, ans=0.1 2023-11-23 04:59:38,819 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11900, loss[loss=0.1089, simple_loss=0.1417, pruned_loss=0.0239, audio_tagging_loss=0.01415, over 15251.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09304, pruned_loss=0.01452, audio_tagging_loss=0.009325, over 3048533.84 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 04:59:42,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2243613.3333333335, ans=0.125 2023-11-23 04:59:46,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336550 2023-11-23 04:59:58,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2243680.0, ans=0.1 2023-11-23 04:59:59,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2243680.0, ans=0.04949747468305833 2023-11-23 05:00:09,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2243746.6666666665, ans=0.125 2023-11-23 05:00:11,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2243746.6666666665, ans=0.1 2023-11-23 05:00:17,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2243813.3333333335, ans=0.1 2023-11-23 05:00:18,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2243813.3333333335, ans=0.0 2023-11-23 05:00:40,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.380e+01 8.877e+01 9.570e+01 1.262e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 05:00:41,884 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 11950, loss[loss=0.0586, simple_loss=0.07472, pruned_loss=0.01018, audio_tagging_loss=0.01106, over 14605.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.09338, pruned_loss=0.01466, audio_tagging_loss=0.009425, over 3051854.80 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 8.0 2023-11-23 05:00:49,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336600 2023-11-23 05:00:55,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2244013.3333333335, ans=0.0 2023-11-23 05:01:15,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2244080.0, ans=0.025 2023-11-23 05:01:29,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2023-11-23 05:01:32,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2244213.3333333335, ans=0.1 2023-11-23 05:01:36,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2244213.3333333335, ans=0.0 2023-11-23 05:01:36,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2244213.3333333335, ans=0.125 2023-11-23 05:01:43,498 INFO [train_asr.py:1221] (1/4) Epoch 28, batch 12000, loss[loss=0.06564, simple_loss=0.08757, pruned_loss=0.01293, audio_tagging_loss=0.008924, over 14422.00 frames. ], tot_loss[loss=0.07072, simple_loss=0.09324, pruned_loss=0.01462, audio_tagging_loss=0.009482, over 3046639.31 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2023-11-23 05:01:43,499 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 05:02:07,788 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9926, 3.7031, 3.3274, 3.6356], device='cuda:1') 2023-11-23 05:02:27,080 INFO [train_asr.py:1253] (1/4) Epoch 28, validation: loss=0.05897, simple_loss=0.05124, pruned_loss=0.005128, audio_tagging_loss=0.02822, over 4681554.00 frames. 2023-11-23 05:02:27,081 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 05:02:33,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2244280.0, ans=0.2 2023-11-23 05:02:34,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336650 2023-11-23 05:02:40,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2244346.6666666665, ans=0.125 2023-11-23 05:02:40,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2244346.6666666665, ans=0.125 2023-11-23 05:03:30,860 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 0, loss[loss=0.07799, simple_loss=0.09038, pruned_loss=0.01414, audio_tagging_loss=0.01866, over 15225.00 frames. ], tot_loss[loss=0.07799, simple_loss=0.09038, pruned_loss=0.01414, audio_tagging_loss=0.01866, over 15225.00 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 32.0 2023-11-23 05:03:30,861 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 05:04:08,411 INFO [train_asr.py:1253] (1/4) Epoch 29, validation: loss=0.05816, simple_loss=0.05122, pruned_loss=0.005095, audio_tagging_loss=0.02745, over 4681554.00 frames. 2023-11-23 05:04:08,412 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 05:04:18,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-23 05:04:40,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.521e+01 9.209e+01 1.026e+02 2.736e+02, threshold=1.842e+02, percent-clipped=1.0 2023-11-23 05:04:49,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336700 2023-11-23 05:05:02,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2244706.6666666665, ans=0.95 2023-11-23 05:05:12,058 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 50, loss[loss=0.07301, simple_loss=0.07496, pruned_loss=0.0152, audio_tagging_loss=0.02032, over 14819.00 frames. ], tot_loss[loss=0.079, simple_loss=0.09271, pruned_loss=0.01477, audio_tagging_loss=0.01787, over 682099.17 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:05:13,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2244773.3333333335, ans=0.1 2023-11-23 05:05:45,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2244906.6666666665, ans=0.0 2023-11-23 05:05:51,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2244973.3333333335, ans=0.125 2023-11-23 05:05:54,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336750 2023-11-23 05:06:13,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.94 vs. limit=22.5 2023-11-23 05:06:18,175 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 100, loss[loss=0.09085, simple_loss=0.1116, pruned_loss=0.02119, audio_tagging_loss=0.01385, over 15380.00 frames. ], tot_loss[loss=0.07707, simple_loss=0.09108, pruned_loss=0.01448, audio_tagging_loss=0.01705, over 1207272.18 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:06:44,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2245240.0, ans=0.2 2023-11-23 05:06:49,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 8.985e+01 9.652e+01 1.024e+02 1.304e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-23 05:06:52,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2245240.0, ans=0.125 2023-11-23 05:06:59,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336800 2023-11-23 05:07:22,220 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 150, loss[loss=0.06824, simple_loss=0.09381, pruned_loss=0.0123, audio_tagging_loss=0.009037, over 15450.00 frames. ], tot_loss[loss=0.07562, simple_loss=0.09206, pruned_loss=0.01448, audio_tagging_loss=0.0151, over 1613193.45 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:07:29,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2245440.0, ans=0.1 2023-11-23 05:07:53,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2245573.3333333335, ans=0.2 2023-11-23 05:07:55,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2245573.3333333335, ans=0.0 2023-11-23 05:08:04,394 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336850 2023-11-23 05:08:27,336 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 200, loss[loss=0.06464, simple_loss=0.08353, pruned_loss=0.01237, audio_tagging_loss=0.01051, over 14941.00 frames. ], tot_loss[loss=0.074, simple_loss=0.09266, pruned_loss=0.01442, audio_tagging_loss=0.01324, over 1919754.41 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:08:37,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2245773.3333333335, ans=0.125 2023-11-23 05:08:56,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2245906.6666666665, ans=0.1 2023-11-23 05:09:00,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.383e+01 9.139e+01 9.827e+01 1.313e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 05:09:08,674 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336900 2023-11-23 05:09:32,715 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 250, loss[loss=0.08124, simple_loss=0.09377, pruned_loss=0.01829, audio_tagging_loss=0.01606, over 14619.00 frames. ], tot_loss[loss=0.07353, simple_loss=0.09383, pruned_loss=0.01455, audio_tagging_loss=0.01206, over 2170317.44 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:09:35,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=2246106.6666666665, ans=12.0 2023-11-23 05:09:36,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2246106.6666666665, ans=10.0 2023-11-23 05:09:39,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2246106.6666666665, ans=0.125 2023-11-23 05:10:12,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 336950 2023-11-23 05:10:36,205 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 300, loss[loss=0.07587, simple_loss=0.104, pruned_loss=0.01644, audio_tagging_loss=0.007444, over 14586.00 frames. ], tot_loss[loss=0.07282, simple_loss=0.09444, pruned_loss=0.01459, audio_tagging_loss=0.01101, over 2373945.54 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:10:36,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2246440.0, ans=0.0 2023-11-23 05:10:37,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2246440.0, ans=0.5 2023-11-23 05:10:40,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2246440.0, ans=0.0 2023-11-23 05:10:55,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2246506.6666666665, ans=0.125 2023-11-23 05:11:10,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 8.241e+01 8.912e+01 9.854e+01 1.344e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 05:11:17,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337000 2023-11-23 05:11:23,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2246640.0, ans=0.1 2023-11-23 05:11:40,974 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 350, loss[loss=0.07284, simple_loss=0.09396, pruned_loss=0.01368, audio_tagging_loss=0.01218, over 14898.00 frames. ], tot_loss[loss=0.07194, simple_loss=0.09376, pruned_loss=0.01453, audio_tagging_loss=0.01053, over 2524904.52 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:11:46,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2246773.3333333335, ans=0.125 2023-11-23 05:11:48,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2246773.3333333335, ans=0.1 2023-11-23 05:11:52,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2246840.0, ans=0.125 2023-11-23 05:12:22,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337050 2023-11-23 05:12:28,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2246973.3333333335, ans=0.0 2023-11-23 05:12:29,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2246973.3333333335, ans=0.125 2023-11-23 05:12:34,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-23 05:12:39,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2247040.0, ans=0.125 2023-11-23 05:12:42,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2247040.0, ans=0.2 2023-11-23 05:12:45,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2247106.6666666665, ans=0.2 2023-11-23 05:12:46,370 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 400, loss[loss=0.06959, simple_loss=0.0801, pruned_loss=0.01679, audio_tagging_loss=0.01275, over 15044.00 frames. ], tot_loss[loss=0.07149, simple_loss=0.09369, pruned_loss=0.01449, audio_tagging_loss=0.01015, over 2643808.58 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:12:51,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2247106.6666666665, ans=0.0 2023-11-23 05:13:10,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2247240.0, ans=0.0 2023-11-23 05:13:18,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.176e+01 8.720e+01 9.339e+01 1.453e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-23 05:13:20,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=12.0 2023-11-23 05:13:26,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337100 2023-11-23 05:13:37,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2247373.3333333335, ans=0.0 2023-11-23 05:13:50,273 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 450, loss[loss=0.05861, simple_loss=0.07977, pruned_loss=0.01243, audio_tagging_loss=0.006299, over 15470.00 frames. ], tot_loss[loss=0.07136, simple_loss=0.09377, pruned_loss=0.01461, audio_tagging_loss=0.009864, over 2731707.25 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:13:50,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=2247440.0, ans=12.0 2023-11-23 05:13:56,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-23 05:13:58,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2247440.0, ans=0.0 2023-11-23 05:14:22,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2247573.3333333335, ans=0.0 2023-11-23 05:14:30,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-23 05:14:31,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337150 2023-11-23 05:14:46,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2247706.6666666665, ans=0.2 2023-11-23 05:14:52,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2247773.3333333335, ans=0.125 2023-11-23 05:14:53,597 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 500, loss[loss=0.06423, simple_loss=0.08086, pruned_loss=0.01307, audio_tagging_loss=0.01073, over 15612.00 frames. ], tot_loss[loss=0.07111, simple_loss=0.09306, pruned_loss=0.01479, audio_tagging_loss=0.009785, over 2800664.16 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:15:26,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.484e+01 9.126e+01 9.707e+01 1.686e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 05:15:29,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2247906.6666666665, ans=0.05 2023-11-23 05:15:33,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2023-11-23 05:15:34,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337200 2023-11-23 05:15:56,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2248106.6666666665, ans=0.125 2023-11-23 05:15:57,888 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 550, loss[loss=0.08191, simple_loss=0.1057, pruned_loss=0.01696, audio_tagging_loss=0.01209, over 15933.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.09386, pruned_loss=0.01477, audio_tagging_loss=0.009547, over 2854332.13 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:15:58,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2248106.6666666665, ans=0.0 2023-11-23 05:16:20,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2248173.3333333335, ans=0.125 2023-11-23 05:16:38,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337250 2023-11-23 05:16:53,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2248373.3333333335, ans=0.1 2023-11-23 05:17:01,983 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 600, loss[loss=0.07586, simple_loss=0.09946, pruned_loss=0.018, audio_tagging_loss=0.008129, over 15245.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.09377, pruned_loss=0.01476, audio_tagging_loss=0.009523, over 2898288.91 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:17:09,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2248440.0, ans=0.125 2023-11-23 05:17:20,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2248506.6666666665, ans=0.2 2023-11-23 05:17:24,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-23 05:17:34,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.545e+01 8.479e+01 8.808e+01 9.540e+01 1.178e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-23 05:17:42,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337300 2023-11-23 05:17:45,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2248640.0, ans=0.125 2023-11-23 05:18:05,057 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 650, loss[loss=0.0558, simple_loss=0.0776, pruned_loss=0.007281, audio_tagging_loss=0.009721, over 16627.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.0932, pruned_loss=0.01455, audio_tagging_loss=0.009489, over 2928162.35 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:18:27,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2248840.0, ans=10.0 2023-11-23 05:18:46,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337350 2023-11-23 05:18:48,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2248973.3333333335, ans=0.125 2023-11-23 05:18:58,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2249040.0, ans=0.125 2023-11-23 05:19:09,382 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 700, loss[loss=0.05629, simple_loss=0.07362, pruned_loss=0.01152, audio_tagging_loss=0.007959, over 14811.00 frames. ], tot_loss[loss=0.07064, simple_loss=0.09337, pruned_loss=0.01454, audio_tagging_loss=0.009408, over 2950800.47 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:19:37,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2249240.0, ans=0.0 2023-11-23 05:19:38,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-23 05:19:44,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.167e+01 8.621e+01 9.555e+01 1.511e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-23 05:19:51,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337400 2023-11-23 05:19:56,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2249306.6666666665, ans=0.1 2023-11-23 05:19:57,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2249306.6666666665, ans=0.2 2023-11-23 05:20:00,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2249373.3333333335, ans=0.2 2023-11-23 05:20:15,750 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 750, loss[loss=0.07038, simple_loss=0.09313, pruned_loss=0.01276, audio_tagging_loss=0.01105, over 15245.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09362, pruned_loss=0.0146, audio_tagging_loss=0.009433, over 2973005.08 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:20:40,993 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:20:54,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2249640.0, ans=0.1 2023-11-23 05:20:57,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337450 2023-11-23 05:21:09,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2249706.6666666665, ans=0.125 2023-11-23 05:21:10,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2249706.6666666665, ans=0.0 2023-11-23 05:21:15,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2249706.6666666665, ans=0.0 2023-11-23 05:21:19,959 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 800, loss[loss=0.07706, simple_loss=0.09672, pruned_loss=0.01887, audio_tagging_loss=0.009828, over 15236.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09422, pruned_loss=0.01471, audio_tagging_loss=0.009345, over 2987002.44 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:21:37,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2249840.0, ans=0.125 2023-11-23 05:21:47,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-23 05:21:51,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2249906.6666666665, ans=0.2 2023-11-23 05:21:55,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.531e+01 8.991e+01 9.709e+01 1.279e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-23 05:22:01,735 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337500 2023-11-23 05:22:06,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-23 05:22:07,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.80 vs. limit=10.0 2023-11-23 05:22:07,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2023-11-23 05:22:21,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2250040.0, ans=0.1 2023-11-23 05:22:24,182 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 850, loss[loss=0.07559, simple_loss=0.09482, pruned_loss=0.01503, audio_tagging_loss=0.01315, over 15471.00 frames. ], tot_loss[loss=0.07131, simple_loss=0.09449, pruned_loss=0.01464, audio_tagging_loss=0.00943, over 3001718.87 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:22:51,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-11-23 05:22:52,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2250240.0, ans=0.2 2023-11-23 05:23:06,274 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337550 2023-11-23 05:23:30,308 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 900, loss[loss=0.08195, simple_loss=0.1167, pruned_loss=0.0161, audio_tagging_loss=0.007527, over 15629.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09445, pruned_loss=0.01461, audio_tagging_loss=0.009488, over 3013738.19 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:23:40,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2250440.0, ans=0.125 2023-11-23 05:24:02,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=22.5 2023-11-23 05:24:03,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.195e+01 8.718e+01 9.591e+01 1.345e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-23 05:24:04,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2250573.3333333335, ans=0.0 2023-11-23 05:24:11,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337600 2023-11-23 05:24:27,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-23 05:24:32,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2250706.6666666665, ans=0.125 2023-11-23 05:24:34,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2250773.3333333335, ans=0.025 2023-11-23 05:24:35,015 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 950, loss[loss=0.08161, simple_loss=0.1196, pruned_loss=0.0128, audio_tagging_loss=0.009019, over 16481.00 frames. ], tot_loss[loss=0.07101, simple_loss=0.09437, pruned_loss=0.01448, audio_tagging_loss=0.009347, over 3018642.16 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:24:39,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2250773.3333333335, ans=0.0 2023-11-23 05:24:40,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2023-11-23 05:24:43,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2250773.3333333335, ans=0.0 2023-11-23 05:24:48,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-11-23 05:25:17,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337650 2023-11-23 05:25:39,745 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1000, loss[loss=0.06162, simple_loss=0.08756, pruned_loss=0.008942, audio_tagging_loss=0.008898, over 15863.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09342, pruned_loss=0.01424, audio_tagging_loss=0.009272, over 3028991.12 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:25:42,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2251106.6666666665, ans=0.0 2023-11-23 05:25:48,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2251106.6666666665, ans=0.2 2023-11-23 05:26:07,692 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 05:26:09,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2251240.0, ans=0.125 2023-11-23 05:26:15,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 8.282e+01 9.010e+01 9.966e+01 1.225e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 05:26:18,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2251306.6666666665, ans=0.1 2023-11-23 05:26:21,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337700 2023-11-23 05:26:22,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2251306.6666666665, ans=0.2 2023-11-23 05:26:25,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2251306.6666666665, ans=0.2 2023-11-23 05:26:44,930 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1050, loss[loss=0.06467, simple_loss=0.09185, pruned_loss=0.0119, audio_tagging_loss=0.006848, over 14758.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09309, pruned_loss=0.01429, audio_tagging_loss=0.009109, over 3038332.57 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:26:50,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2251440.0, ans=0.0 2023-11-23 05:26:52,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2251440.0, ans=0.125 2023-11-23 05:27:02,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2251506.6666666665, ans=0.125 2023-11-23 05:27:19,782 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:27:24,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-23 05:27:25,256 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:27:26,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337750 2023-11-23 05:27:26,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2251640.0, ans=0.2 2023-11-23 05:27:33,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2251640.0, ans=0.125 2023-11-23 05:27:37,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2251706.6666666665, ans=0.0 2023-11-23 05:27:49,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2023-11-23 05:27:50,154 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1100, loss[loss=0.06155, simple_loss=0.08393, pruned_loss=0.01049, audio_tagging_loss=0.009102, over 14483.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09303, pruned_loss=0.01428, audio_tagging_loss=0.00902, over 3043805.54 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:27:52,669 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 05:27:55,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2251773.3333333335, ans=0.0 2023-11-23 05:27:56,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2251773.3333333335, ans=0.125 2023-11-23 05:28:25,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.248e+01 8.961e+01 9.560e+01 1.246e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 05:28:31,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337800 2023-11-23 05:28:54,949 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1150, loss[loss=0.07059, simple_loss=0.1, pruned_loss=0.01312, audio_tagging_loss=0.007445, over 15808.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09275, pruned_loss=0.01428, audio_tagging_loss=0.009067, over 3049946.43 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:29:36,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337850 2023-11-23 05:29:43,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2252306.6666666665, ans=0.0 2023-11-23 05:29:46,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2252373.3333333335, ans=0.1 2023-11-23 05:29:56,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2252373.3333333335, ans=0.125 2023-11-23 05:30:00,345 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1200, loss[loss=0.06794, simple_loss=0.08937, pruned_loss=0.014, audio_tagging_loss=0.009262, over 16822.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09245, pruned_loss=0.01423, audio_tagging_loss=0.009088, over 3047507.10 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:30:09,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2252440.0, ans=0.125 2023-11-23 05:30:26,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-23 05:30:27,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-23 05:30:35,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.344e+01 9.032e+01 9.683e+01 1.496e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-23 05:30:40,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337900 2023-11-23 05:31:03,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-23 05:31:04,490 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1250, loss[loss=0.05677, simple_loss=0.07263, pruned_loss=0.01147, audio_tagging_loss=0.008983, over 14108.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09243, pruned_loss=0.01418, audio_tagging_loss=0.009026, over 3043887.11 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:31:05,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2252773.3333333335, ans=0.125 2023-11-23 05:31:28,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-23 05:31:45,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 337950 2023-11-23 05:31:59,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2253040.0, ans=0.125 2023-11-23 05:32:04,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2253040.0, ans=0.04949747468305833 2023-11-23 05:32:07,639 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1300, loss[loss=0.09703, simple_loss=0.127, pruned_loss=0.0236, audio_tagging_loss=0.009912, over 15535.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09248, pruned_loss=0.01408, audio_tagging_loss=0.009004, over 3046770.44 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:32:14,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2023-11-23 05:32:25,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-11-23 05:32:44,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.370e+01 8.150e+01 8.858e+01 9.270e+01 1.252e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 05:32:44,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2253240.0, ans=0.0 2023-11-23 05:32:49,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338000 2023-11-23 05:32:55,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2253306.6666666665, ans=0.125 2023-11-23 05:33:08,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2253373.3333333335, ans=0.125 2023-11-23 05:33:13,804 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1350, loss[loss=0.06772, simple_loss=0.09364, pruned_loss=0.01198, audio_tagging_loss=0.008911, over 15015.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09196, pruned_loss=0.01402, audio_tagging_loss=0.008991, over 3038561.59 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:33:26,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2253506.6666666665, ans=0.125 2023-11-23 05:33:43,622 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:33:53,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338050 2023-11-23 05:33:59,463 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 05:34:17,316 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1400, loss[loss=0.06841, simple_loss=0.09004, pruned_loss=0.0149, audio_tagging_loss=0.008494, over 15525.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09207, pruned_loss=0.01395, audio_tagging_loss=0.009019, over 3044282.55 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:34:29,370 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:34:35,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2253840.0, ans=0.0 2023-11-23 05:34:53,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.459e+01 9.010e+01 9.793e+01 1.707e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 05:34:58,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338100 2023-11-23 05:35:11,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2254040.0, ans=0.125 2023-11-23 05:35:12,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2254040.0, ans=0.125 2023-11-23 05:35:21,273 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1450, loss[loss=0.09632, simple_loss=0.1318, pruned_loss=0.02199, audio_tagging_loss=0.008437, over 15138.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09282, pruned_loss=0.01407, audio_tagging_loss=0.009077, over 3050485.26 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:36:01,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338150 2023-11-23 05:36:08,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2254306.6666666665, ans=0.0 2023-11-23 05:36:11,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2254373.3333333335, ans=0.1 2023-11-23 05:36:24,823 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1500, loss[loss=0.06521, simple_loss=0.0802, pruned_loss=0.01384, audio_tagging_loss=0.01127, over 15003.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09289, pruned_loss=0.01421, audio_tagging_loss=0.00916, over 3052080.18 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:36:30,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2254440.0, ans=0.0 2023-11-23 05:36:39,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2254506.6666666665, ans=0.125 2023-11-23 05:36:40,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2254506.6666666665, ans=0.0 2023-11-23 05:36:54,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2254573.3333333335, ans=0.125 2023-11-23 05:37:00,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.437e+01 9.139e+01 9.780e+01 1.863e+02, threshold=1.828e+02, percent-clipped=1.0 2023-11-23 05:37:05,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338200 2023-11-23 05:37:06,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2254640.0, ans=0.0 2023-11-23 05:37:24,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2254706.6666666665, ans=0.0 2023-11-23 05:37:28,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2254773.3333333335, ans=0.025 2023-11-23 05:37:29,281 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1550, loss[loss=0.07381, simple_loss=0.09437, pruned_loss=0.01312, audio_tagging_loss=0.01352, over 16132.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09298, pruned_loss=0.01416, audio_tagging_loss=0.009287, over 3055146.27 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:37:32,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-23 05:37:38,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2254773.3333333335, ans=0.0 2023-11-23 05:37:39,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2254773.3333333335, ans=0.125 2023-11-23 05:37:41,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2254840.0, ans=0.0 2023-11-23 05:37:48,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2254840.0, ans=0.125 2023-11-23 05:37:49,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2254840.0, ans=0.125 2023-11-23 05:38:07,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2254973.3333333335, ans=0.2 2023-11-23 05:38:10,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338250 2023-11-23 05:38:32,700 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1600, loss[loss=0.0535, simple_loss=0.07355, pruned_loss=0.006565, audio_tagging_loss=0.01017, over 17119.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09311, pruned_loss=0.01433, audio_tagging_loss=0.009389, over 3055085.33 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:39:09,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.320e+01 8.917e+01 9.601e+01 1.213e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 05:39:10,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2255306.6666666665, ans=0.125 2023-11-23 05:39:13,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338300 2023-11-23 05:39:22,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2255373.3333333335, ans=0.015 2023-11-23 05:39:35,854 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1650, loss[loss=0.05645, simple_loss=0.07446, pruned_loss=0.009959, audio_tagging_loss=0.009263, over 15061.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09338, pruned_loss=0.01446, audio_tagging_loss=0.009303, over 3057425.12 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:39:58,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2255506.6666666665, ans=0.0 2023-11-23 05:40:01,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2255573.3333333335, ans=0.2 2023-11-23 05:40:15,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338350 2023-11-23 05:40:37,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2255706.6666666665, ans=0.125 2023-11-23 05:40:39,287 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1700, loss[loss=0.07095, simple_loss=0.09054, pruned_loss=0.0153, audio_tagging_loss=0.01037, over 15831.00 frames. ], tot_loss[loss=0.07018, simple_loss=0.0928, pruned_loss=0.01439, audio_tagging_loss=0.009387, over 3051697.09 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:40:43,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=2255773.3333333335, ans=12.0 2023-11-23 05:40:53,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-23 05:41:00,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2255840.0, ans=0.125 2023-11-23 05:41:02,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2255906.6666666665, ans=0.0 2023-11-23 05:41:16,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.211e+01 9.004e+01 9.824e+01 1.268e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 05:41:18,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2255973.3333333335, ans=0.0 2023-11-23 05:41:19,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338400 2023-11-23 05:41:26,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2255973.3333333335, ans=0.125 2023-11-23 05:41:35,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2256040.0, ans=0.2 2023-11-23 05:41:42,298 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1750, loss[loss=0.07156, simple_loss=0.1021, pruned_loss=0.01368, audio_tagging_loss=0.006851, over 15607.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09407, pruned_loss=0.01436, audio_tagging_loss=0.009239, over 3059232.32 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:41:45,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2256106.6666666665, ans=0.125 2023-11-23 05:41:49,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2256106.6666666665, ans=0.0 2023-11-23 05:41:54,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2256173.3333333335, ans=0.125 2023-11-23 05:41:54,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2256173.3333333335, ans=0.0 2023-11-23 05:41:59,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2256173.3333333335, ans=0.125 2023-11-23 05:42:22,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338450 2023-11-23 05:42:45,355 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1800, loss[loss=0.06779, simple_loss=0.09568, pruned_loss=0.0112, audio_tagging_loss=0.008755, over 14808.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09313, pruned_loss=0.01431, audio_tagging_loss=0.00923, over 3048737.06 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:42:48,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2256440.0, ans=15.0 2023-11-23 05:42:57,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-23 05:43:21,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.187e+01 9.056e+01 9.555e+01 1.230e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 05:43:25,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338500 2023-11-23 05:43:30,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2256640.0, ans=0.2 2023-11-23 05:43:41,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2256706.6666666665, ans=0.125 2023-11-23 05:43:48,538 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1850, loss[loss=0.0809, simple_loss=0.1211, pruned_loss=0.01405, audio_tagging_loss=0.006289, over 15300.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09346, pruned_loss=0.01436, audio_tagging_loss=0.009116, over 3048044.56 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:43:48,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2256773.3333333335, ans=0.0 2023-11-23 05:44:24,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2256906.6666666665, ans=0.0 2023-11-23 05:44:29,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338550 2023-11-23 05:44:51,061 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1900, loss[loss=0.07117, simple_loss=0.08623, pruned_loss=0.01761, audio_tagging_loss=0.01045, over 15122.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09266, pruned_loss=0.01414, audio_tagging_loss=0.009149, over 3048899.62 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:44:51,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2257106.6666666665, ans=0.2 2023-11-23 05:45:28,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.593e+01 9.216e+01 1.001e+02 1.158e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-23 05:45:29,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2257306.6666666665, ans=0.125 2023-11-23 05:45:32,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338600 2023-11-23 05:45:53,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-23 05:45:54,438 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 1950, loss[loss=0.08619, simple_loss=0.1197, pruned_loss=0.01951, audio_tagging_loss=0.006829, over 14570.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09252, pruned_loss=0.01412, audio_tagging_loss=0.009062, over 3047839.26 frames. ], batch size: 53, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:46:05,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2257440.0, ans=0.125 2023-11-23 05:46:22,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2257573.3333333335, ans=0.015 2023-11-23 05:46:32,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2257640.0, ans=0.0 2023-11-23 05:46:34,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338650 2023-11-23 05:46:41,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2257640.0, ans=0.0 2023-11-23 05:46:55,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2257706.6666666665, ans=0.2 2023-11-23 05:46:57,979 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2000, loss[loss=0.05655, simple_loss=0.07936, pruned_loss=0.009435, audio_tagging_loss=0.007429, over 16064.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09206, pruned_loss=0.01407, audio_tagging_loss=0.009187, over 3052599.01 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:47:00,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2257773.3333333335, ans=0.1 2023-11-23 05:47:10,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2257840.0, ans=0.2 2023-11-23 05:47:12,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-23 05:47:14,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2257840.0, ans=0.125 2023-11-23 05:47:27,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2257906.6666666665, ans=0.125 2023-11-23 05:47:31,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2257906.6666666665, ans=0.125 2023-11-23 05:47:31,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2257906.6666666665, ans=0.0 2023-11-23 05:47:33,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.322e+01 9.117e+01 1.012e+02 1.277e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-23 05:47:37,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338700 2023-11-23 05:47:46,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2257973.3333333335, ans=0.0 2023-11-23 05:47:50,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2258040.0, ans=0.04949747468305833 2023-11-23 05:47:52,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2258040.0, ans=0.125 2023-11-23 05:47:58,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2258040.0, ans=0.0 2023-11-23 05:47:58,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2258040.0, ans=0.04949747468305833 2023-11-23 05:48:00,698 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2050, loss[loss=0.06276, simple_loss=0.08906, pruned_loss=0.008842, audio_tagging_loss=0.009386, over 15030.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09215, pruned_loss=0.01412, audio_tagging_loss=0.009045, over 3044824.68 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:48:10,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2258106.6666666665, ans=0.07 2023-11-23 05:48:17,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2258173.3333333335, ans=0.125 2023-11-23 05:48:17,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2258173.3333333335, ans=0.125 2023-11-23 05:48:18,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-23 05:48:23,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2258173.3333333335, ans=0.2 2023-11-23 05:48:24,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2258240.0, ans=0.125 2023-11-23 05:48:29,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-23 05:48:41,290 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338750 2023-11-23 05:48:48,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2258306.6666666665, ans=0.125 2023-11-23 05:49:03,143 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2100, loss[loss=0.09206, simple_loss=0.1178, pruned_loss=0.02396, audio_tagging_loss=0.009189, over 14743.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09151, pruned_loss=0.01405, audio_tagging_loss=0.009126, over 3042946.96 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:49:09,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2258440.0, ans=0.0 2023-11-23 05:49:19,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2258506.6666666665, ans=0.125 2023-11-23 05:49:20,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2258506.6666666665, ans=0.2 2023-11-23 05:49:39,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2258573.3333333335, ans=0.125 2023-11-23 05:49:40,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.396e+01 9.168e+01 1.020e+02 1.328e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-23 05:49:41,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2258640.0, ans=0.125 2023-11-23 05:49:43,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2258640.0, ans=0.0 2023-11-23 05:49:43,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2023-11-23 05:49:43,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338800 2023-11-23 05:49:54,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2258706.6666666665, ans=0.0 2023-11-23 05:50:01,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2258706.6666666665, ans=0.2 2023-11-23 05:50:07,655 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2150, loss[loss=0.07461, simple_loss=0.09658, pruned_loss=0.0148, audio_tagging_loss=0.01152, over 15466.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09109, pruned_loss=0.01388, audio_tagging_loss=0.009107, over 3044312.40 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:50:17,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=12.0 2023-11-23 05:50:35,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2258906.6666666665, ans=0.125 2023-11-23 05:50:43,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2258906.6666666665, ans=0.125 2023-11-23 05:50:45,280 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 05:50:47,805 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338850 2023-11-23 05:50:52,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2258973.3333333335, ans=0.1 2023-11-23 05:50:59,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2259040.0, ans=0.125 2023-11-23 05:51:11,715 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2200, loss[loss=0.06672, simple_loss=0.09159, pruned_loss=0.009794, audio_tagging_loss=0.01113, over 15439.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09168, pruned_loss=0.01428, audio_tagging_loss=0.009164, over 3046071.19 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:51:28,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2023-11-23 05:51:50,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.527e+01 9.201e+01 9.842e+01 1.279e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-23 05:51:53,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338900 2023-11-23 05:51:59,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2259306.6666666665, ans=0.0 2023-11-23 05:52:09,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2259373.3333333335, ans=0.125 2023-11-23 05:52:16,091 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2250, loss[loss=0.05755, simple_loss=0.07828, pruned_loss=0.0093, audio_tagging_loss=0.009109, over 15354.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.0929, pruned_loss=0.01463, audio_tagging_loss=0.009158, over 3047723.51 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:52:16,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2259440.0, ans=0.125 2023-11-23 05:52:23,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2259440.0, ans=0.125 2023-11-23 05:52:33,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2259506.6666666665, ans=0.0 2023-11-23 05:52:58,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 338950 2023-11-23 05:53:14,923 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 05:53:22,022 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2300, loss[loss=0.08232, simple_loss=0.1014, pruned_loss=0.02156, audio_tagging_loss=0.01004, over 15526.00 frames. ], tot_loss[loss=0.07061, simple_loss=0.09358, pruned_loss=0.0146, audio_tagging_loss=0.009215, over 3055678.87 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:53:34,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2259840.0, ans=10.0 2023-11-23 05:53:40,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2259840.0, ans=0.125 2023-11-23 05:53:41,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-23 05:54:00,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.330e+01 8.807e+01 9.666e+01 1.881e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-23 05:54:02,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339000 2023-11-23 05:54:19,382 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 05:54:25,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2023-11-23 05:54:27,448 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2350, loss[loss=0.06993, simple_loss=0.08184, pruned_loss=0.01666, audio_tagging_loss=0.01235, over 14356.00 frames. ], tot_loss[loss=0.07076, simple_loss=0.09399, pruned_loss=0.0145, audio_tagging_loss=0.009258, over 3056441.47 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:54:30,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2260106.6666666665, ans=0.125 2023-11-23 05:54:45,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2023-11-23 05:54:51,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2260240.0, ans=0.0 2023-11-23 05:54:59,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2260240.0, ans=0.125 2023-11-23 05:55:03,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2260240.0, ans=0.2 2023-11-23 05:55:08,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339050 2023-11-23 05:55:08,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2260306.6666666665, ans=0.125 2023-11-23 05:55:10,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2260306.6666666665, ans=0.0 2023-11-23 05:55:30,914 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2400, loss[loss=0.07339, simple_loss=0.09817, pruned_loss=0.01478, audio_tagging_loss=0.009522, over 14209.00 frames. ], tot_loss[loss=0.07066, simple_loss=0.0939, pruned_loss=0.01445, audio_tagging_loss=0.00927, over 3046334.90 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:55:43,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2260506.6666666665, ans=0.0 2023-11-23 05:55:51,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2260506.6666666665, ans=0.125 2023-11-23 05:56:09,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.287e+01 8.924e+01 9.833e+01 1.260e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 05:56:12,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339100 2023-11-23 05:56:12,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-23 05:56:29,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2260706.6666666665, ans=0.0 2023-11-23 05:56:34,912 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2450, loss[loss=0.07146, simple_loss=0.09827, pruned_loss=0.01258, audio_tagging_loss=0.009751, over 14670.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.09299, pruned_loss=0.01418, audio_tagging_loss=0.009408, over 3052825.12 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 05:56:51,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2260840.0, ans=0.125 2023-11-23 05:57:15,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339150 2023-11-23 05:57:31,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2023-11-23 05:57:38,393 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2500, loss[loss=0.07966, simple_loss=0.1068, pruned_loss=0.01712, audio_tagging_loss=0.009125, over 15041.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.0936, pruned_loss=0.0144, audio_tagging_loss=0.009317, over 3054929.77 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:58:02,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2261240.0, ans=0.0 2023-11-23 05:58:18,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.151e+01 8.943e+01 9.911e+01 1.178e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 05:58:19,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339200 2023-11-23 05:58:19,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2261306.6666666665, ans=0.2 2023-11-23 05:58:42,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-23 05:58:42,696 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2550, loss[loss=0.07796, simple_loss=0.1102, pruned_loss=0.01371, audio_tagging_loss=0.009151, over 15715.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.09377, pruned_loss=0.01435, audio_tagging_loss=0.009175, over 3056294.35 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:58:56,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2261506.6666666665, ans=0.125 2023-11-23 05:59:15,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2261573.3333333335, ans=0.125 2023-11-23 05:59:23,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339250 2023-11-23 05:59:45,919 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2600, loss[loss=0.05205, simple_loss=0.06459, pruned_loss=0.009554, audio_tagging_loss=0.0102, over 15618.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09292, pruned_loss=0.01427, audio_tagging_loss=0.009086, over 3050082.18 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 05:59:49,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2261773.3333333335, ans=0.2 2023-11-23 05:59:53,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2261773.3333333335, ans=0.2 2023-11-23 06:00:02,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2261840.0, ans=0.125 2023-11-23 06:00:17,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2261906.6666666665, ans=0.125 2023-11-23 06:00:19,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-23 06:00:25,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.329e+01 8.953e+01 9.650e+01 1.435e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 06:00:27,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339300 2023-11-23 06:00:33,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2261973.3333333335, ans=0.125 2023-11-23 06:00:50,463 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2650, loss[loss=0.08637, simple_loss=0.1116, pruned_loss=0.02207, audio_tagging_loss=0.008517, over 16406.00 frames. ], tot_loss[loss=0.0699, simple_loss=0.09316, pruned_loss=0.01432, audio_tagging_loss=0.009, over 3051775.71 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 06:00:59,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2262106.6666666665, ans=0.125 2023-11-23 06:01:19,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-23 06:01:22,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2262240.0, ans=0.95 2023-11-23 06:01:31,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339350 2023-11-23 06:01:32,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-23 06:01:53,644 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2700, loss[loss=0.07895, simple_loss=0.1158, pruned_loss=0.01663, audio_tagging_loss=0.004441, over 15281.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09298, pruned_loss=0.01427, audio_tagging_loss=0.008988, over 3050102.43 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 06:02:31,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2262640.0, ans=0.125 2023-11-23 06:02:33,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.218e+01 8.946e+01 9.636e+01 1.335e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 06:02:35,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339400 2023-11-23 06:02:58,419 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2750, loss[loss=0.07216, simple_loss=0.1013, pruned_loss=0.01441, audio_tagging_loss=0.007093, over 14670.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09236, pruned_loss=0.0143, audio_tagging_loss=0.008941, over 3045331.71 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 06:03:24,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2262906.6666666665, ans=0.0 2023-11-23 06:03:35,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2023-11-23 06:03:39,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339450 2023-11-23 06:03:52,608 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:03:53,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-23 06:03:59,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2263040.0, ans=0.0 2023-11-23 06:04:03,032 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2800, loss[loss=0.04424, simple_loss=0.0578, pruned_loss=0.003772, audio_tagging_loss=0.01157, over 15078.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09152, pruned_loss=0.01412, audio_tagging_loss=0.008996, over 3045880.60 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 32.0 2023-11-23 06:04:36,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-23 06:04:43,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2263306.6666666665, ans=0.125 2023-11-23 06:04:43,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.600e+01 8.163e+01 8.713e+01 9.405e+01 1.279e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-23 06:04:44,131 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339500 2023-11-23 06:04:59,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2263373.3333333335, ans=0.1 2023-11-23 06:05:06,456 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2850, loss[loss=0.06825, simple_loss=0.08378, pruned_loss=0.01444, audio_tagging_loss=0.01192, over 14526.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09063, pruned_loss=0.014, audio_tagging_loss=0.009048, over 3043952.47 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 06:05:09,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2263440.0, ans=0.1 2023-11-23 06:05:17,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2263506.6666666665, ans=0.125 2023-11-23 06:05:40,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2263573.3333333335, ans=0.2 2023-11-23 06:05:48,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339550 2023-11-23 06:06:11,462 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2900, loss[loss=0.06476, simple_loss=0.08724, pruned_loss=0.01507, audio_tagging_loss=0.006067, over 14030.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09103, pruned_loss=0.01391, audio_tagging_loss=0.009018, over 3048523.03 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 16.0 2023-11-23 06:06:29,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2263840.0, ans=0.1 2023-11-23 06:06:37,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2263906.6666666665, ans=0.0 2023-11-23 06:06:52,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.150e+01 8.750e+01 9.470e+01 1.345e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-23 06:06:53,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339600 2023-11-23 06:06:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2263973.3333333335, ans=0.1 2023-11-23 06:06:57,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2263973.3333333335, ans=0.125 2023-11-23 06:07:15,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2264040.0, ans=0.1 2023-11-23 06:07:17,750 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 2950, loss[loss=0.05585, simple_loss=0.06964, pruned_loss=0.008751, audio_tagging_loss=0.01228, over 14628.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.09139, pruned_loss=0.01394, audio_tagging_loss=0.009018, over 3047731.56 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:07:18,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2023-11-23 06:07:54,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2264306.6666666665, ans=0.125 2023-11-23 06:07:59,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339650 2023-11-23 06:08:22,222 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3000, loss[loss=0.05301, simple_loss=0.06688, pruned_loss=0.009768, audio_tagging_loss=0.009803, over 14662.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09194, pruned_loss=0.01404, audio_tagging_loss=0.009086, over 3051715.01 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:08:22,222 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 06:08:46,512 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6086, 3.5939, 3.8643, 3.4784], device='cuda:1') 2023-11-23 06:08:47,366 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5421, 4.4157, 3.9809, 4.2857], device='cuda:1') 2023-11-23 06:08:47,754 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8454, 5.7599, 5.6532, 5.5468], device='cuda:1') 2023-11-23 06:09:05,311 INFO [train_asr.py:1253] (1/4) Epoch 29, validation: loss=0.05823, simple_loss=0.05127, pruned_loss=0.005185, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-23 06:09:05,312 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 06:09:11,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2264440.0, ans=0.125 2023-11-23 06:09:25,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-23 06:09:38,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2264573.3333333335, ans=0.0 2023-11-23 06:09:38,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-23 06:09:45,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2264640.0, ans=0.0 2023-11-23 06:09:46,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.432e+01 8.978e+01 9.876e+01 1.396e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 06:09:46,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339700 2023-11-23 06:09:50,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2264640.0, ans=0.125 2023-11-23 06:09:54,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2264640.0, ans=0.1 2023-11-23 06:10:07,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2264706.6666666665, ans=0.125 2023-11-23 06:10:10,832 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3050, loss[loss=0.06123, simple_loss=0.07943, pruned_loss=0.01116, audio_tagging_loss=0.01035, over 15288.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09258, pruned_loss=0.01424, audio_tagging_loss=0.009122, over 3057179.33 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:10:16,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2264773.3333333335, ans=0.05 2023-11-23 06:10:29,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2264840.0, ans=0.125 2023-11-23 06:10:33,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2264840.0, ans=0.125 2023-11-23 06:10:36,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2264906.6666666665, ans=0.125 2023-11-23 06:10:40,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-11-23 06:10:46,322 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:10:52,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339750 2023-11-23 06:10:55,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2264973.3333333335, ans=0.125 2023-11-23 06:11:01,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2265040.0, ans=0.2 2023-11-23 06:11:03,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2265040.0, ans=0.04949747468305833 2023-11-23 06:11:03,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2265040.0, ans=0.125 2023-11-23 06:11:14,611 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3100, loss[loss=0.06887, simple_loss=0.0877, pruned_loss=0.01495, audio_tagging_loss=0.01007, over 14976.00 frames. ], tot_loss[loss=0.06971, simple_loss=0.0928, pruned_loss=0.0141, audio_tagging_loss=0.009213, over 3052777.51 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 8.0 2023-11-23 06:11:23,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-23 06:11:24,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2265106.6666666665, ans=0.0 2023-11-23 06:11:49,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.73 vs. limit=10.0 2023-11-23 06:11:55,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339800 2023-11-23 06:11:56,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.232e+01 8.826e+01 9.441e+01 1.388e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 06:12:07,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2265373.3333333335, ans=0.0 2023-11-23 06:12:18,057 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3150, loss[loss=0.07, simple_loss=0.08724, pruned_loss=0.01547, audio_tagging_loss=0.01091, over 16293.00 frames. ], tot_loss[loss=0.06984, simple_loss=0.09283, pruned_loss=0.01424, audio_tagging_loss=0.009184, over 3052151.64 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 8.0 2023-11-23 06:12:24,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2265440.0, ans=0.125 2023-11-23 06:12:59,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339850 2023-11-23 06:13:07,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2265640.0, ans=0.125 2023-11-23 06:13:11,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2265706.6666666665, ans=0.125 2023-11-23 06:13:20,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2265706.6666666665, ans=0.125 2023-11-23 06:13:23,908 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3200, loss[loss=0.06598, simple_loss=0.09735, pruned_loss=0.009952, audio_tagging_loss=0.007355, over 16093.00 frames. ], tot_loss[loss=0.07045, simple_loss=0.09355, pruned_loss=0.01428, audio_tagging_loss=0.009398, over 3048779.72 frames. ], batch size: 62, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:13:43,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2265840.0, ans=0.1 2023-11-23 06:14:04,475 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339900 2023-11-23 06:14:06,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.473e+01 8.333e+01 9.166e+01 9.819e+01 1.242e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 06:14:19,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2266040.0, ans=0.07 2023-11-23 06:14:26,809 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3250, loss[loss=0.05994, simple_loss=0.07016, pruned_loss=0.01145, audio_tagging_loss=0.01342, over 15791.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09303, pruned_loss=0.01418, audio_tagging_loss=0.009559, over 3051478.27 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:14:28,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2266106.6666666665, ans=0.125 2023-11-23 06:14:33,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2266106.6666666665, ans=0.125 2023-11-23 06:14:34,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-23 06:14:38,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2266173.3333333335, ans=0.2 2023-11-23 06:14:42,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2266173.3333333335, ans=0.5 2023-11-23 06:14:46,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2266173.3333333335, ans=0.125 2023-11-23 06:14:48,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2266173.3333333335, ans=0.0 2023-11-23 06:15:08,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 339950 2023-11-23 06:15:11,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2266306.6666666665, ans=0.1 2023-11-23 06:15:21,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-23 06:15:30,519 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3300, loss[loss=0.06993, simple_loss=0.08013, pruned_loss=0.01643, audio_tagging_loss=0.01343, over 15421.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09193, pruned_loss=0.01401, audio_tagging_loss=0.009633, over 3043681.26 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:15:33,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2266440.0, ans=0.125 2023-11-23 06:15:33,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2266440.0, ans=0.0 2023-11-23 06:15:37,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2266440.0, ans=0.04949747468305833 2023-11-23 06:15:52,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2266506.6666666665, ans=0.125 2023-11-23 06:16:06,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2266573.3333333335, ans=0.0 2023-11-23 06:16:11,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2266640.0, ans=0.2 2023-11-23 06:16:12,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340000 2023-11-23 06:16:13,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.298e+01 8.900e+01 9.803e+01 1.322e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 06:16:13,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2266640.0, ans=0.1 2023-11-23 06:16:18,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2266640.0, ans=0.0 2023-11-23 06:16:37,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2266706.6666666665, ans=0.125 2023-11-23 06:16:40,081 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3350, loss[loss=0.06284, simple_loss=0.07053, pruned_loss=0.01557, audio_tagging_loss=0.012, over 14143.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09202, pruned_loss=0.01397, audio_tagging_loss=0.009454, over 3050989.85 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:16:40,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2266773.3333333335, ans=0.1 2023-11-23 06:16:54,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2266840.0, ans=0.125 2023-11-23 06:17:02,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2266840.0, ans=0.05 2023-11-23 06:17:03,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2266906.6666666665, ans=0.0 2023-11-23 06:17:12,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2266906.6666666665, ans=0.125 2023-11-23 06:17:16,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-23 06:17:20,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340050 2023-11-23 06:17:23,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 06:17:38,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2267040.0, ans=0.125 2023-11-23 06:17:43,964 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3400, loss[loss=0.06484, simple_loss=0.08482, pruned_loss=0.01155, audio_tagging_loss=0.01088, over 15396.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.0915, pruned_loss=0.01377, audio_tagging_loss=0.009398, over 3055780.20 frames. ], batch size: 58, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:18:13,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2267240.0, ans=0.125 2023-11-23 06:18:15,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-23 06:18:16,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2267240.0, ans=0.0 2023-11-23 06:18:25,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340100 2023-11-23 06:18:26,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.359e+01 8.997e+01 9.733e+01 1.260e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 06:18:30,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2267306.6666666665, ans=0.5 2023-11-23 06:18:33,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2267306.6666666665, ans=0.0 2023-11-23 06:18:47,377 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3450, loss[loss=0.07852, simple_loss=0.1062, pruned_loss=0.01862, audio_tagging_loss=0.006791, over 15123.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.09135, pruned_loss=0.01383, audio_tagging_loss=0.009268, over 3054755.35 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:18:57,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2267440.0, ans=0.07 2023-11-23 06:19:01,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=22.5 2023-11-23 06:19:05,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2267506.6666666665, ans=0.125 2023-11-23 06:19:14,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2267573.3333333335, ans=0.0 2023-11-23 06:19:29,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340150 2023-11-23 06:19:53,629 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3500, loss[loss=0.08631, simple_loss=0.1223, pruned_loss=0.01691, audio_tagging_loss=0.008253, over 16311.00 frames. ], tot_loss[loss=0.0689, simple_loss=0.09168, pruned_loss=0.01386, audio_tagging_loss=0.009201, over 3056321.38 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:20:04,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2267773.3333333335, ans=0.09899494936611666 2023-11-23 06:20:11,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2267840.0, ans=0.125 2023-11-23 06:20:12,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2267840.0, ans=0.125 2023-11-23 06:20:25,087 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:20:32,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2267973.3333333335, ans=0.125 2023-11-23 06:20:34,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340200 2023-11-23 06:20:35,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.175e+01 8.873e+01 9.452e+01 1.244e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 06:20:58,519 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3550, loss[loss=0.07536, simple_loss=0.09293, pruned_loss=0.01972, audio_tagging_loss=0.00918, over 15630.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09073, pruned_loss=0.01374, audio_tagging_loss=0.009187, over 3047300.33 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:21:04,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2268106.6666666665, ans=0.125 2023-11-23 06:21:13,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2268173.3333333335, ans=0.125 2023-11-23 06:21:14,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2268173.3333333335, ans=0.125 2023-11-23 06:21:37,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2268306.6666666665, ans=0.125 2023-11-23 06:21:39,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340250 2023-11-23 06:21:41,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2268306.6666666665, ans=0.125 2023-11-23 06:21:55,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2268373.3333333335, ans=0.125 2023-11-23 06:22:01,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2268440.0, ans=0.2 2023-11-23 06:22:01,960 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3600, loss[loss=0.06611, simple_loss=0.09008, pruned_loss=0.01045, audio_tagging_loss=0.01062, over 15903.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09161, pruned_loss=0.01373, audio_tagging_loss=0.009146, over 3050861.89 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:22:03,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2268440.0, ans=0.1 2023-11-23 06:22:26,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2268506.6666666665, ans=0.1 2023-11-23 06:22:29,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2268573.3333333335, ans=0.125 2023-11-23 06:22:43,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340300 2023-11-23 06:22:44,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.233e+01 8.921e+01 9.911e+01 1.235e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 06:22:59,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2268706.6666666665, ans=0.0 2023-11-23 06:23:02,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2268706.6666666665, ans=0.0 2023-11-23 06:23:07,321 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3650, loss[loss=0.0793, simple_loss=0.1046, pruned_loss=0.01908, audio_tagging_loss=0.00793, over 14880.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09197, pruned_loss=0.01405, audio_tagging_loss=0.009136, over 3048189.81 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:23:20,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-23 06:23:34,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2268906.6666666665, ans=0.2 2023-11-23 06:23:38,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2268906.6666666665, ans=0.0 2023-11-23 06:23:40,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2268906.6666666665, ans=0.1 2023-11-23 06:23:45,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2268973.3333333335, ans=0.125 2023-11-23 06:23:46,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2268973.3333333335, ans=0.5 2023-11-23 06:23:47,639 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340350 2023-11-23 06:24:09,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2023-11-23 06:24:11,244 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3700, loss[loss=0.05742, simple_loss=0.07323, pruned_loss=0.009606, audio_tagging_loss=0.0112, over 16231.00 frames. ], tot_loss[loss=0.06939, simple_loss=0.09218, pruned_loss=0.01413, audio_tagging_loss=0.009167, over 3045941.71 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:24:30,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2269173.3333333335, ans=0.125 2023-11-23 06:24:51,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2269306.6666666665, ans=0.0 2023-11-23 06:24:52,588 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340400 2023-11-23 06:24:55,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.367e+01 8.831e+01 9.767e+01 1.153e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 06:25:02,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2269373.3333333335, ans=0.125 2023-11-23 06:25:04,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-23 06:25:09,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-23 06:25:16,077 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3750, loss[loss=0.06312, simple_loss=0.08991, pruned_loss=0.01044, audio_tagging_loss=0.007718, over 15334.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09263, pruned_loss=0.01426, audio_tagging_loss=0.009211, over 3049499.88 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:25:25,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2269440.0, ans=0.035 2023-11-23 06:25:57,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340450 2023-11-23 06:25:59,023 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:26:13,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2269706.6666666665, ans=0.125 2023-11-23 06:26:14,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2269706.6666666665, ans=0.025 2023-11-23 06:26:20,467 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3800, loss[loss=0.08662, simple_loss=0.1174, pruned_loss=0.01837, audio_tagging_loss=0.009554, over 15335.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09221, pruned_loss=0.01417, audio_tagging_loss=0.009212, over 3050815.71 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:26:21,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2269773.3333333335, ans=0.0 2023-11-23 06:27:01,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2023-11-23 06:27:01,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340500 2023-11-23 06:27:04,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.380e+01 8.994e+01 9.826e+01 1.163e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 06:27:17,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2270040.0, ans=0.0 2023-11-23 06:27:18,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2270040.0, ans=0.125 2023-11-23 06:27:26,035 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3850, loss[loss=0.06667, simple_loss=0.08659, pruned_loss=0.01292, audio_tagging_loss=0.01046, over 14910.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09246, pruned_loss=0.01417, audio_tagging_loss=0.009293, over 3054503.00 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:27:28,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2270106.6666666665, ans=0.0 2023-11-23 06:27:48,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2270173.3333333335, ans=0.125 2023-11-23 06:27:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2270173.3333333335, ans=0.125 2023-11-23 06:27:53,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2270240.0, ans=12.0 2023-11-23 06:27:57,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2270240.0, ans=0.0 2023-11-23 06:28:08,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340550 2023-11-23 06:28:12,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2270306.6666666665, ans=0.125 2023-11-23 06:28:14,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2270306.6666666665, ans=0.125 2023-11-23 06:28:31,082 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3900, loss[loss=0.07235, simple_loss=0.09311, pruned_loss=0.01729, audio_tagging_loss=0.008497, over 15679.00 frames. ], tot_loss[loss=0.0705, simple_loss=0.09364, pruned_loss=0.01449, audio_tagging_loss=0.009194, over 3052252.04 frames. ], batch size: 58, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:28:47,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2270506.6666666665, ans=0.1 2023-11-23 06:28:55,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2270573.3333333335, ans=0.0 2023-11-23 06:29:06,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2270573.3333333335, ans=0.0 2023-11-23 06:29:11,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340600 2023-11-23 06:29:14,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 8.394e+01 8.868e+01 9.858e+01 1.292e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 06:29:26,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2270706.6666666665, ans=0.05 2023-11-23 06:29:35,326 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 3950, loss[loss=0.08884, simple_loss=0.1237, pruned_loss=0.01799, audio_tagging_loss=0.009016, over 15047.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.0927, pruned_loss=0.01443, audio_tagging_loss=0.009266, over 3049669.47 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:29:38,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2270773.3333333335, ans=0.1 2023-11-23 06:29:48,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-23 06:29:58,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-23 06:29:59,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-23 06:30:15,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2270973.3333333335, ans=0.0 2023-11-23 06:30:16,662 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340650 2023-11-23 06:30:26,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2271040.0, ans=0.0 2023-11-23 06:30:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2271040.0, ans=0.125 2023-11-23 06:30:40,564 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4000, loss[loss=0.0548, simple_loss=0.0698, pruned_loss=0.008736, audio_tagging_loss=0.01116, over 14424.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.09334, pruned_loss=0.01446, audio_tagging_loss=0.00931, over 3059648.61 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:31:15,111 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 06:31:22,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340700 2023-11-23 06:31:25,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.305e+01 8.425e+01 8.951e+01 9.689e+01 1.650e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 06:31:28,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2271306.6666666665, ans=0.125 2023-11-23 06:31:36,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2271373.3333333335, ans=0.0 2023-11-23 06:31:42,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2271373.3333333335, ans=0.1 2023-11-23 06:31:44,145 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4050, loss[loss=0.08272, simple_loss=0.1147, pruned_loss=0.01917, audio_tagging_loss=0.006215, over 13535.00 frames. ], tot_loss[loss=0.07073, simple_loss=0.09361, pruned_loss=0.01454, audio_tagging_loss=0.009383, over 3061882.81 frames. ], batch size: 52, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:31:45,423 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:31:48,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2271440.0, ans=0.2 2023-11-23 06:31:53,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2271440.0, ans=0.125 2023-11-23 06:32:26,191 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340750 2023-11-23 06:32:48,637 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4100, loss[loss=0.06392, simple_loss=0.09295, pruned_loss=0.007574, audio_tagging_loss=0.009873, over 15390.00 frames. ], tot_loss[loss=0.07032, simple_loss=0.09341, pruned_loss=0.01427, audio_tagging_loss=0.009344, over 3061678.23 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:33:00,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2271773.3333333335, ans=0.0 2023-11-23 06:33:00,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-23 06:33:05,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2271840.0, ans=0.125 2023-11-23 06:33:30,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340800 2023-11-23 06:33:34,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.442e+01 8.896e+01 9.495e+01 1.100e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 06:33:37,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2271973.3333333335, ans=0.0 2023-11-23 06:33:39,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2272040.0, ans=0.125 2023-11-23 06:33:54,748 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4150, loss[loss=0.06204, simple_loss=0.08158, pruned_loss=0.01022, audio_tagging_loss=0.01103, over 14520.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09338, pruned_loss=0.01414, audio_tagging_loss=0.009217, over 3056033.69 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:33:55,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2272106.6666666665, ans=10.0 2023-11-23 06:34:16,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2272173.3333333335, ans=0.1 2023-11-23 06:34:18,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2023-11-23 06:34:32,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-11-23 06:34:36,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340850 2023-11-23 06:34:38,711 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:34:46,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-23 06:34:47,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2023-11-23 06:34:57,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2272440.0, ans=0.07 2023-11-23 06:34:58,255 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4200, loss[loss=0.06103, simple_loss=0.08418, pruned_loss=0.01156, audio_tagging_loss=0.007382, over 14809.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09174, pruned_loss=0.01395, audio_tagging_loss=0.009246, over 3053396.38 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:34:59,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2272440.0, ans=0.125 2023-11-23 06:35:04,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.10 vs. limit=10.0 2023-11-23 06:35:05,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-23 06:35:35,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2272573.3333333335, ans=0.05 2023-11-23 06:35:40,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340900 2023-11-23 06:35:43,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.311e+01 8.855e+01 9.584e+01 1.424e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 06:35:51,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2272706.6666666665, ans=0.125 2023-11-23 06:35:57,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2272706.6666666665, ans=0.125 2023-11-23 06:36:02,081 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4250, loss[loss=0.09386, simple_loss=0.1203, pruned_loss=0.0238, audio_tagging_loss=0.009921, over 14703.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09078, pruned_loss=0.01379, audio_tagging_loss=0.009259, over 3052163.56 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:36:43,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 340950 2023-11-23 06:36:46,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2272973.3333333335, ans=0.125 2023-11-23 06:36:51,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2272973.3333333335, ans=0.1 2023-11-23 06:36:51,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2272973.3333333335, ans=0.09899494936611666 2023-11-23 06:36:55,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2273040.0, ans=0.2 2023-11-23 06:37:01,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2273040.0, ans=0.1 2023-11-23 06:37:07,811 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4300, loss[loss=0.08236, simple_loss=0.1122, pruned_loss=0.01933, audio_tagging_loss=0.00693, over 15043.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09186, pruned_loss=0.01409, audio_tagging_loss=0.009108, over 3047563.23 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:37:49,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341000 2023-11-23 06:37:49,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2273306.6666666665, ans=0.0 2023-11-23 06:37:53,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.312e+01 8.993e+01 9.823e+01 1.635e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 06:37:54,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2273306.6666666665, ans=0.125 2023-11-23 06:38:00,773 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 06:38:06,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2023-11-23 06:38:11,581 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4350, loss[loss=0.09116, simple_loss=0.121, pruned_loss=0.02196, audio_tagging_loss=0.00868, over 15072.00 frames. ], tot_loss[loss=0.06976, simple_loss=0.09284, pruned_loss=0.01431, audio_tagging_loss=0.009033, over 3047341.95 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:38:20,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2273440.0, ans=0.125 2023-11-23 06:38:42,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2273573.3333333335, ans=0.5 2023-11-23 06:38:52,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341050 2023-11-23 06:39:02,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2273706.6666666665, ans=0.0 2023-11-23 06:39:07,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2273706.6666666665, ans=0.04949747468305833 2023-11-23 06:39:11,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2273706.6666666665, ans=0.125 2023-11-23 06:39:14,627 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4400, loss[loss=0.08284, simple_loss=0.1116, pruned_loss=0.01936, audio_tagging_loss=0.007691, over 15725.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09332, pruned_loss=0.01436, audio_tagging_loss=0.009008, over 3053043.76 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:39:19,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-23 06:39:33,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2273840.0, ans=0.125 2023-11-23 06:39:55,476 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341100 2023-11-23 06:39:58,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-23 06:40:00,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.327e+01 8.740e+01 9.683e+01 1.205e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 06:40:14,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-11-23 06:40:19,116 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4450, loss[loss=0.06914, simple_loss=0.08882, pruned_loss=0.01344, audio_tagging_loss=0.01129, over 13901.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09343, pruned_loss=0.01425, audio_tagging_loss=0.00899, over 3048850.11 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:40:46,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2274240.0, ans=0.0 2023-11-23 06:40:57,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2274306.6666666665, ans=0.125 2023-11-23 06:40:58,326 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341150 2023-11-23 06:41:02,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2274306.6666666665, ans=0.125 2023-11-23 06:41:22,097 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4500, loss[loss=0.05936, simple_loss=0.07535, pruned_loss=0.008267, audio_tagging_loss=0.01342, over 14973.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09241, pruned_loss=0.01409, audio_tagging_loss=0.008983, over 3046615.98 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:41:23,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2274440.0, ans=0.1 2023-11-23 06:41:27,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2274440.0, ans=0.125 2023-11-23 06:41:29,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2274440.0, ans=0.1 2023-11-23 06:41:38,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2274506.6666666665, ans=0.125 2023-11-23 06:41:41,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2274506.6666666665, ans=0.025 2023-11-23 06:41:58,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2274573.3333333335, ans=0.0 2023-11-23 06:42:03,437 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341200 2023-11-23 06:42:08,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.193e+01 8.685e+01 9.597e+01 1.191e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-23 06:42:12,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2274706.6666666665, ans=0.2 2023-11-23 06:42:20,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2274706.6666666665, ans=0.0 2023-11-23 06:42:22,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2274706.6666666665, ans=0.0 2023-11-23 06:42:25,801 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4550, loss[loss=0.07535, simple_loss=0.09571, pruned_loss=0.01718, audio_tagging_loss=0.01031, over 13708.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09204, pruned_loss=0.01388, audio_tagging_loss=0.009036, over 3046702.58 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:42:45,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2274840.0, ans=0.1 2023-11-23 06:42:50,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2274906.6666666665, ans=0.02 2023-11-23 06:43:03,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2274973.3333333335, ans=0.1 2023-11-23 06:43:06,617 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341250 2023-11-23 06:43:11,434 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 06:43:26,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-23 06:43:28,948 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4600, loss[loss=0.0387, simple_loss=0.03907, pruned_loss=0.007767, audio_tagging_loss=0.01139, over 16504.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09091, pruned_loss=0.01383, audio_tagging_loss=0.00917, over 3046661.75 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:43:53,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-23 06:44:06,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2275306.6666666665, ans=0.125 2023-11-23 06:44:08,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341300 2023-11-23 06:44:09,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=12.0 2023-11-23 06:44:13,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 8.306e+01 8.944e+01 9.725e+01 1.153e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 06:44:23,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2275373.3333333335, ans=0.125 2023-11-23 06:44:29,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-23 06:44:32,728 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4650, loss[loss=0.06634, simple_loss=0.09389, pruned_loss=0.01223, audio_tagging_loss=0.007164, over 15273.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09148, pruned_loss=0.01401, audio_tagging_loss=0.00921, over 3043248.96 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:44:33,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2023-11-23 06:44:42,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2275440.0, ans=0.125 2023-11-23 06:44:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2275573.3333333335, ans=0.0 2023-11-23 06:45:00,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2275573.3333333335, ans=0.2 2023-11-23 06:45:07,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2275573.3333333335, ans=0.125 2023-11-23 06:45:08,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2275573.3333333335, ans=0.125 2023-11-23 06:45:13,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341350 2023-11-23 06:45:16,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2275640.0, ans=0.0 2023-11-23 06:45:36,205 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4700, loss[loss=0.07545, simple_loss=0.1049, pruned_loss=0.01315, audio_tagging_loss=0.009846, over 16746.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09133, pruned_loss=0.01412, audio_tagging_loss=0.009336, over 3047941.32 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:46:18,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341400 2023-11-23 06:46:23,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.437e+01 9.019e+01 9.512e+01 1.168e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 06:46:23,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2275973.3333333335, ans=0.2 2023-11-23 06:46:27,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-23 06:46:40,569 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4750, loss[loss=0.07105, simple_loss=0.08612, pruned_loss=0.01698, audio_tagging_loss=0.01101, over 13814.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09172, pruned_loss=0.01407, audio_tagging_loss=0.009399, over 3047282.84 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:47:18,924 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 06:47:22,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341450 2023-11-23 06:47:35,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2276373.3333333335, ans=0.07 2023-11-23 06:47:46,386 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4800, loss[loss=0.07216, simple_loss=0.09802, pruned_loss=0.01405, audio_tagging_loss=0.009095, over 15615.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09271, pruned_loss=0.01426, audio_tagging_loss=0.009414, over 3047290.68 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:48:04,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2276506.6666666665, ans=0.1 2023-11-23 06:48:14,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2023-11-23 06:48:15,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2276573.3333333335, ans=0.125 2023-11-23 06:48:26,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2276640.0, ans=0.125 2023-11-23 06:48:27,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341500 2023-11-23 06:48:33,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.380e+01 8.789e+01 9.333e+01 1.267e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-23 06:48:38,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2276706.6666666665, ans=0.2 2023-11-23 06:48:50,498 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4850, loss[loss=0.07132, simple_loss=0.08905, pruned_loss=0.01786, audio_tagging_loss=0.00893, over 14672.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09206, pruned_loss=0.0142, audio_tagging_loss=0.009469, over 3047208.61 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:49:01,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2276840.0, ans=0.0 2023-11-23 06:49:13,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2276840.0, ans=0.0 2023-11-23 06:49:19,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2276906.6666666665, ans=0.0 2023-11-23 06:49:32,079 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341550 2023-11-23 06:49:35,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2276973.3333333335, ans=0.125 2023-11-23 06:49:38,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2276973.3333333335, ans=0.125 2023-11-23 06:49:53,995 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4900, loss[loss=0.05533, simple_loss=0.06762, pruned_loss=0.01414, audio_tagging_loss=0.007383, over 13819.00 frames. ], tot_loss[loss=0.07012, simple_loss=0.09284, pruned_loss=0.01432, audio_tagging_loss=0.009381, over 3040751.97 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:50:35,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341600 2023-11-23 06:50:38,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-23 06:50:41,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.445e+01 8.899e+01 9.661e+01 1.338e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 06:50:47,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2277373.3333333335, ans=0.0 2023-11-23 06:50:59,326 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 4950, loss[loss=0.05763, simple_loss=0.07212, pruned_loss=0.01266, audio_tagging_loss=0.008911, over 14886.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09233, pruned_loss=0.01429, audio_tagging_loss=0.009252, over 3043294.47 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:51:19,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2277506.6666666665, ans=0.0 2023-11-23 06:51:40,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341650 2023-11-23 06:52:03,289 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5000, loss[loss=0.05942, simple_loss=0.07655, pruned_loss=0.01281, audio_tagging_loss=0.008335, over 15182.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09155, pruned_loss=0.01404, audio_tagging_loss=0.009148, over 3043591.16 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:52:07,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-23 06:52:12,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2277773.3333333335, ans=0.125 2023-11-23 06:52:14,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2277840.0, ans=0.125 2023-11-23 06:52:36,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2277906.6666666665, ans=0.2 2023-11-23 06:52:41,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2277973.3333333335, ans=0.125 2023-11-23 06:52:42,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2277973.3333333335, ans=0.125 2023-11-23 06:52:43,631 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341700 2023-11-23 06:52:50,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.065e+01 8.894e+01 1.001e+02 1.433e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 06:52:53,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2278040.0, ans=0.0 2023-11-23 06:52:54,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2278040.0, ans=0.125 2023-11-23 06:53:06,446 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5050, loss[loss=0.08319, simple_loss=0.1204, pruned_loss=0.01731, audio_tagging_loss=0.005663, over 14952.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09166, pruned_loss=0.01397, audio_tagging_loss=0.009044, over 3043594.61 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:53:21,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2278173.3333333335, ans=0.025 2023-11-23 06:53:41,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2278240.0, ans=0.1 2023-11-23 06:53:47,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341750 2023-11-23 06:53:56,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-23 06:54:06,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2278373.3333333335, ans=0.125 2023-11-23 06:54:10,989 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5100, loss[loss=0.07426, simple_loss=0.1006, pruned_loss=0.01398, audio_tagging_loss=0.01, over 16252.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.09147, pruned_loss=0.01394, audio_tagging_loss=0.009096, over 3046388.11 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:54:22,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2278506.6666666665, ans=0.95 2023-11-23 06:54:35,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2278573.3333333335, ans=0.125 2023-11-23 06:54:42,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2278573.3333333335, ans=0.125 2023-11-23 06:54:47,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2278640.0, ans=0.125 2023-11-23 06:54:52,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341800 2023-11-23 06:54:59,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 7.874e+01 8.463e+01 9.175e+01 1.215e+02, threshold=1.693e+02, percent-clipped=0.0 2023-11-23 06:55:00,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2278640.0, ans=0.0 2023-11-23 06:55:15,517 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5150, loss[loss=0.07387, simple_loss=0.09588, pruned_loss=0.01666, audio_tagging_loss=0.009271, over 14481.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.0916, pruned_loss=0.01396, audio_tagging_loss=0.008975, over 3039877.54 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:55:23,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2278773.3333333335, ans=0.125 2023-11-23 06:55:57,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341850 2023-11-23 06:56:14,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2279040.0, ans=0.0 2023-11-23 06:56:15,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2279040.0, ans=0.2 2023-11-23 06:56:20,312 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5200, loss[loss=0.0723, simple_loss=0.1037, pruned_loss=0.01343, audio_tagging_loss=0.007019, over 15522.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09237, pruned_loss=0.0139, audio_tagging_loss=0.009086, over 3044524.95 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 06:56:28,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2279106.6666666665, ans=0.125 2023-11-23 06:56:41,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2279173.3333333335, ans=0.125 2023-11-23 06:56:58,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2279306.6666666665, ans=0.2 2023-11-23 06:57:00,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2023-11-23 06:57:01,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341900 2023-11-23 06:57:08,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-23 06:57:09,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.360e+01 8.790e+01 9.384e+01 1.181e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-23 06:57:25,757 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5250, loss[loss=0.07309, simple_loss=0.1042, pruned_loss=0.01362, audio_tagging_loss=0.007356, over 14168.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09338, pruned_loss=0.01416, audio_tagging_loss=0.008958, over 3039385.41 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:57:37,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2279506.6666666665, ans=0.2 2023-11-23 06:57:42,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=12.0 2023-11-23 06:57:44,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2279506.6666666665, ans=0.125 2023-11-23 06:57:47,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2279506.6666666665, ans=0.0 2023-11-23 06:57:50,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2279573.3333333335, ans=0.125 2023-11-23 06:58:06,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 341950 2023-11-23 06:58:29,709 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5300, loss[loss=0.05047, simple_loss=0.06363, pruned_loss=0.01071, audio_tagging_loss=0.007944, over 16136.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.0931, pruned_loss=0.01409, audio_tagging_loss=0.00905, over 3044071.40 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:58:29,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2279773.3333333335, ans=0.125 2023-11-23 06:58:33,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2279773.3333333335, ans=0.125 2023-11-23 06:58:34,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2279773.3333333335, ans=0.0 2023-11-23 06:58:37,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2279773.3333333335, ans=0.2 2023-11-23 06:59:10,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2279973.3333333335, ans=0.2 2023-11-23 06:59:10,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2279973.3333333335, ans=0.125 2023-11-23 06:59:11,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342000 2023-11-23 06:59:12,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2279973.3333333335, ans=0.0 2023-11-23 06:59:18,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.762e+01 8.363e+01 8.739e+01 9.478e+01 1.273e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 06:59:33,708 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5350, loss[loss=0.06706, simple_loss=0.09352, pruned_loss=0.01125, audio_tagging_loss=0.009055, over 14899.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.0929, pruned_loss=0.01383, audio_tagging_loss=0.00895, over 3042532.98 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 06:59:33,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2280106.6666666665, ans=0.125 2023-11-23 07:00:09,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2280240.0, ans=0.125 2023-11-23 07:00:09,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2280240.0, ans=0.1 2023-11-23 07:00:15,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342050 2023-11-23 07:00:28,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2280373.3333333335, ans=0.5 2023-11-23 07:00:38,805 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5400, loss[loss=0.04929, simple_loss=0.06013, pruned_loss=0.009177, audio_tagging_loss=0.01005, over 14277.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09381, pruned_loss=0.01407, audio_tagging_loss=0.008911, over 3049531.27 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:00:41,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2023-11-23 07:00:50,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2280506.6666666665, ans=0.1 2023-11-23 07:00:56,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-23 07:01:00,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2280506.6666666665, ans=0.125 2023-11-23 07:01:10,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2280573.3333333335, ans=0.125 2023-11-23 07:01:11,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2280573.3333333335, ans=0.125 2023-11-23 07:01:18,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2280640.0, ans=0.1 2023-11-23 07:01:19,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342100 2023-11-23 07:01:27,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.345e+01 9.025e+01 9.492e+01 1.163e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 07:01:43,278 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5450, loss[loss=0.07892, simple_loss=0.1037, pruned_loss=0.01785, audio_tagging_loss=0.009235, over 14486.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09402, pruned_loss=0.01414, audio_tagging_loss=0.008981, over 3045609.11 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:01:43,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2280773.3333333335, ans=0.125 2023-11-23 07:01:51,067 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:01:57,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-23 07:01:58,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2280840.0, ans=0.0 2023-11-23 07:02:17,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2280906.6666666665, ans=0.2 2023-11-23 07:02:24,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342150 2023-11-23 07:02:28,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2280973.3333333335, ans=0.125 2023-11-23 07:02:33,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-23 07:02:37,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2281040.0, ans=0.025 2023-11-23 07:02:37,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2281040.0, ans=0.0 2023-11-23 07:02:38,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2281040.0, ans=0.0 2023-11-23 07:02:46,736 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5500, loss[loss=0.0603, simple_loss=0.07759, pruned_loss=0.01262, audio_tagging_loss=0.00888, over 14731.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09404, pruned_loss=0.01413, audio_tagging_loss=0.009003, over 3046072.26 frames. ], batch size: 58, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:03:01,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2281173.3333333335, ans=0.0 2023-11-23 07:03:02,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-23 07:03:27,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2281306.6666666665, ans=0.0 2023-11-23 07:03:28,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342200 2023-11-23 07:03:35,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=12.0 2023-11-23 07:03:35,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.376e+01 8.901e+01 9.455e+01 1.073e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 07:03:37,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-23 07:03:43,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-23 07:03:49,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-23 07:03:50,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2281440.0, ans=0.1 2023-11-23 07:03:51,684 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5550, loss[loss=0.08132, simple_loss=0.1067, pruned_loss=0.01779, audio_tagging_loss=0.01018, over 16150.00 frames. ], tot_loss[loss=0.0703, simple_loss=0.09398, pruned_loss=0.01414, audio_tagging_loss=0.009159, over 3054041.99 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:04:03,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2281506.6666666665, ans=0.0 2023-11-23 07:04:23,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2281573.3333333335, ans=0.0 2023-11-23 07:04:31,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342250 2023-11-23 07:04:55,551 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5600, loss[loss=0.07777, simple_loss=0.1089, pruned_loss=0.01569, audio_tagging_loss=0.007646, over 13925.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09387, pruned_loss=0.01415, audio_tagging_loss=0.009194, over 3048075.66 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 07:05:01,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-23 07:05:14,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2281840.0, ans=0.0 2023-11-23 07:05:17,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2281840.0, ans=0.1 2023-11-23 07:05:36,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342300 2023-11-23 07:05:38,720 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 07:05:43,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.225e+01 8.879e+01 9.451e+01 1.680e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-23 07:05:57,940 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5650, loss[loss=0.09628, simple_loss=0.1341, pruned_loss=0.02114, audio_tagging_loss=0.008074, over 15423.00 frames. ], tot_loss[loss=0.07116, simple_loss=0.09474, pruned_loss=0.01451, audio_tagging_loss=0.009279, over 3051043.90 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 32.0 2023-11-23 07:06:00,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2282106.6666666665, ans=0.125 2023-11-23 07:06:10,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2282173.3333333335, ans=0.125 2023-11-23 07:06:13,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2282173.3333333335, ans=0.125 2023-11-23 07:06:17,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-23 07:06:19,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2282173.3333333335, ans=0.04949747468305833 2023-11-23 07:06:21,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2282173.3333333335, ans=0.125 2023-11-23 07:06:23,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2282240.0, ans=0.125 2023-11-23 07:06:25,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2282240.0, ans=0.07 2023-11-23 07:06:27,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2282240.0, ans=0.125 2023-11-23 07:06:29,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-23 07:06:35,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-23 07:06:39,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342350 2023-11-23 07:06:39,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-23 07:06:45,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2282306.6666666665, ans=0.125 2023-11-23 07:07:01,215 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5700, loss[loss=0.06133, simple_loss=0.0814, pruned_loss=0.01272, audio_tagging_loss=0.007911, over 14388.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.09398, pruned_loss=0.01442, audio_tagging_loss=0.009279, over 3053957.58 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:07:07,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2282440.0, ans=0.0 2023-11-23 07:07:24,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2282506.6666666665, ans=0.125 2023-11-23 07:07:27,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2282573.3333333335, ans=0.125 2023-11-23 07:07:41,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342400 2023-11-23 07:07:50,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.447e+01 8.964e+01 9.780e+01 1.182e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 07:07:56,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2282706.6666666665, ans=0.125 2023-11-23 07:08:06,322 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5750, loss[loss=0.04041, simple_loss=0.05036, pruned_loss=0.005711, audio_tagging_loss=0.009514, over 13406.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09299, pruned_loss=0.01437, audio_tagging_loss=0.009133, over 3051854.41 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:08:06,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2282773.3333333335, ans=0.125 2023-11-23 07:08:17,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:08:29,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2282906.6666666665, ans=0.125 2023-11-23 07:08:42,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2282906.6666666665, ans=0.0 2023-11-23 07:08:48,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342450 2023-11-23 07:09:09,963 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5800, loss[loss=0.06726, simple_loss=0.09208, pruned_loss=0.01457, audio_tagging_loss=0.006652, over 15234.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09256, pruned_loss=0.01438, audio_tagging_loss=0.009065, over 3045258.85 frames. ], batch size: 58, lr: 2.35e-03, grad_scale: 16.0 2023-11-23 07:09:15,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2283106.6666666665, ans=0.1 2023-11-23 07:09:15,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-23 07:09:35,338 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:09:51,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342500 2023-11-23 07:09:53,318 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:09:55,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-23 07:09:58,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2023-11-23 07:09:59,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2023-11-23 07:10:00,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.596e+01 8.186e+01 8.952e+01 9.751e+01 1.304e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 07:10:10,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2283373.3333333335, ans=0.1 2023-11-23 07:10:14,073 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5850, loss[loss=0.06616, simple_loss=0.09414, pruned_loss=0.01039, audio_tagging_loss=0.0087, over 13353.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09339, pruned_loss=0.01445, audio_tagging_loss=0.008918, over 3043783.81 frames. ], batch size: 53, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:10:14,249 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:10:24,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2283440.0, ans=0.1 2023-11-23 07:10:25,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2023-11-23 07:10:25,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2283440.0, ans=0.2 2023-11-23 07:10:49,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2283573.3333333335, ans=10.0 2023-11-23 07:10:53,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2283640.0, ans=0.0 2023-11-23 07:10:55,920 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342550 2023-11-23 07:10:58,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2283640.0, ans=0.1 2023-11-23 07:11:20,070 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5900, loss[loss=0.05217, simple_loss=0.05507, pruned_loss=0.01073, audio_tagging_loss=0.0139, over 14306.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09295, pruned_loss=0.01425, audio_tagging_loss=0.009006, over 3046451.08 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:11:37,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2283840.0, ans=0.1 2023-11-23 07:11:40,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-23 07:11:59,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2283973.3333333335, ans=0.125 2023-11-23 07:12:01,260 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342600 2023-11-23 07:12:10,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.390e+01 8.881e+01 9.843e+01 1.391e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-23 07:12:17,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2284040.0, ans=0.125 2023-11-23 07:12:24,205 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 5950, loss[loss=0.08131, simple_loss=0.1093, pruned_loss=0.0149, audio_tagging_loss=0.01177, over 15232.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.09291, pruned_loss=0.01439, audio_tagging_loss=0.009029, over 3052560.30 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:12:27,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2023-11-23 07:12:28,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2284106.6666666665, ans=0.0 2023-11-23 07:13:05,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342650 2023-11-23 07:13:14,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2284373.3333333335, ans=0.125 2023-11-23 07:13:15,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2284373.3333333335, ans=0.125 2023-11-23 07:13:27,488 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6000, loss[loss=0.06776, simple_loss=0.0929, pruned_loss=0.01378, audio_tagging_loss=0.007535, over 14613.00 frames. ], tot_loss[loss=0.06984, simple_loss=0.09336, pruned_loss=0.01423, audio_tagging_loss=0.008925, over 3050161.59 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:13:27,489 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 07:14:01,113 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4643, 3.7650, 4.2955, 3.4331], device='cuda:1') 2023-11-23 07:14:10,547 INFO [train_asr.py:1253] (1/4) Epoch 29, validation: loss=0.05847, simple_loss=0.05123, pruned_loss=0.005068, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-23 07:14:10,547 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 07:14:51,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342700 2023-11-23 07:14:55,063 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 07:15:00,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.161e+01 8.758e+01 9.491e+01 1.345e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-23 07:15:03,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2023-11-23 07:15:07,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-23 07:15:14,043 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6050, loss[loss=0.08416, simple_loss=0.1065, pruned_loss=0.01924, audio_tagging_loss=0.01168, over 15921.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09386, pruned_loss=0.01426, audio_tagging_loss=0.008921, over 3050883.24 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:15:15,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-23 07:15:55,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342750 2023-11-23 07:16:08,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2285040.0, ans=10.0 2023-11-23 07:16:17,498 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6100, loss[loss=0.04768, simple_loss=0.0518, pruned_loss=0.009939, audio_tagging_loss=0.01184, over 15419.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09279, pruned_loss=0.01415, audio_tagging_loss=0.009039, over 3045162.70 frames. ], batch size: 60, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:16:26,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2285106.6666666665, ans=0.0 2023-11-23 07:16:28,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2285106.6666666665, ans=0.0 2023-11-23 07:16:31,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2285173.3333333335, ans=0.125 2023-11-23 07:16:38,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2285173.3333333335, ans=0.2 2023-11-23 07:16:39,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2285173.3333333335, ans=0.125 2023-11-23 07:16:59,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342800 2023-11-23 07:16:59,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-11-23 07:17:08,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.337e+01 9.020e+01 9.718e+01 1.256e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 07:17:23,747 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6150, loss[loss=0.07406, simple_loss=0.1012, pruned_loss=0.01438, audio_tagging_loss=0.009072, over 14489.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09165, pruned_loss=0.01393, audio_tagging_loss=0.009109, over 3049689.36 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:17:27,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2285440.0, ans=0.125 2023-11-23 07:17:36,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2285506.6666666665, ans=0.125 2023-11-23 07:17:44,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2285506.6666666665, ans=0.2 2023-11-23 07:17:50,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2285573.3333333335, ans=0.1 2023-11-23 07:17:50,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2285573.3333333335, ans=0.95 2023-11-23 07:17:54,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2285573.3333333335, ans=15.0 2023-11-23 07:17:54,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-23 07:18:05,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342850 2023-11-23 07:18:06,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2285640.0, ans=0.1 2023-11-23 07:18:26,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2285706.6666666665, ans=0.125 2023-11-23 07:18:28,759 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6200, loss[loss=0.06288, simple_loss=0.08556, pruned_loss=0.01016, audio_tagging_loss=0.009944, over 15672.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09135, pruned_loss=0.01391, audio_tagging_loss=0.009284, over 3046744.69 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:18:30,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2285773.3333333335, ans=0.0 2023-11-23 07:18:38,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2285773.3333333335, ans=0.1 2023-11-23 07:18:41,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2285840.0, ans=0.1 2023-11-23 07:18:48,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2285840.0, ans=6.0 2023-11-23 07:18:50,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2285840.0, ans=0.0 2023-11-23 07:19:09,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342900 2023-11-23 07:19:15,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2285973.3333333335, ans=0.04949747468305833 2023-11-23 07:19:16,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2285973.3333333335, ans=0.1 2023-11-23 07:19:19,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.276e+01 8.850e+01 9.566e+01 1.179e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 07:19:31,986 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6250, loss[loss=0.07259, simple_loss=0.1023, pruned_loss=0.01307, audio_tagging_loss=0.008386, over 14892.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.0909, pruned_loss=0.01373, audio_tagging_loss=0.009387, over 3043738.44 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:19:34,627 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:19:37,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2286106.6666666665, ans=0.0 2023-11-23 07:19:37,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2286106.6666666665, ans=0.125 2023-11-23 07:19:46,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2286173.3333333335, ans=0.2 2023-11-23 07:20:00,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2286240.0, ans=0.0 2023-11-23 07:20:05,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2286240.0, ans=0.125 2023-11-23 07:20:14,102 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 342950 2023-11-23 07:20:37,768 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6300, loss[loss=0.07897, simple_loss=0.1086, pruned_loss=0.01593, audio_tagging_loss=0.008733, over 16051.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09216, pruned_loss=0.01392, audio_tagging_loss=0.0094, over 3046305.20 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:21:18,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343000 2023-11-23 07:21:27,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2286640.0, ans=0.125 2023-11-23 07:21:29,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.227e+01 8.680e+01 9.495e+01 1.255e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-23 07:21:34,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2286706.6666666665, ans=0.2 2023-11-23 07:21:41,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-23 07:21:42,430 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6350, loss[loss=0.05961, simple_loss=0.07693, pruned_loss=0.009664, audio_tagging_loss=0.01148, over 15184.00 frames. ], tot_loss[loss=0.06967, simple_loss=0.09248, pruned_loss=0.01402, audio_tagging_loss=0.009408, over 3051630.43 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:22:20,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2286973.3333333335, ans=0.125 2023-11-23 07:22:20,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2286973.3333333335, ans=0.125 2023-11-23 07:22:23,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343050 2023-11-23 07:22:42,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-11-23 07:22:43,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2023-11-23 07:22:46,254 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6400, loss[loss=0.06276, simple_loss=0.07811, pruned_loss=0.01543, audio_tagging_loss=0.008271, over 15005.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09242, pruned_loss=0.01398, audio_tagging_loss=0.009503, over 3048830.95 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:22:52,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2287106.6666666665, ans=0.2 2023-11-23 07:22:53,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2287106.6666666665, ans=0.125 2023-11-23 07:23:05,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2287173.3333333335, ans=0.2 2023-11-23 07:23:06,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2287173.3333333335, ans=0.125 2023-11-23 07:23:13,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2287240.0, ans=0.0 2023-11-23 07:23:21,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2023-11-23 07:23:27,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2287306.6666666665, ans=0.0 2023-11-23 07:23:28,275 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343100 2023-11-23 07:23:37,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.257e+01 8.871e+01 9.376e+01 1.262e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 07:23:51,284 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6450, loss[loss=0.08663, simple_loss=0.1108, pruned_loss=0.02037, audio_tagging_loss=0.01087, over 16674.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.09364, pruned_loss=0.01414, audio_tagging_loss=0.009459, over 3047563.50 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:24:13,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-23 07:24:32,188 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343150 2023-11-23 07:24:42,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2287706.6666666665, ans=0.0 2023-11-23 07:24:55,832 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6500, loss[loss=0.05461, simple_loss=0.0698, pruned_loss=0.009555, audio_tagging_loss=0.01015, over 17792.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.09321, pruned_loss=0.01404, audio_tagging_loss=0.009337, over 3049830.94 frames. ], batch size: 68, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:24:57,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2287773.3333333335, ans=0.125 2023-11-23 07:25:05,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-23 07:25:34,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2287973.3333333335, ans=12.0 2023-11-23 07:25:37,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343200 2023-11-23 07:25:37,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2287973.3333333335, ans=0.0 2023-11-23 07:25:47,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.151e+01 8.781e+01 9.682e+01 1.216e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-23 07:25:51,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2288040.0, ans=0.2 2023-11-23 07:25:56,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2288040.0, ans=0.0 2023-11-23 07:26:00,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2288106.6666666665, ans=15.0 2023-11-23 07:26:00,746 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6550, loss[loss=0.0681, simple_loss=0.09098, pruned_loss=0.01346, audio_tagging_loss=0.009151, over 15562.00 frames. ], tot_loss[loss=0.0699, simple_loss=0.09318, pruned_loss=0.01414, audio_tagging_loss=0.009166, over 3051118.53 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:26:03,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2288106.6666666665, ans=0.125 2023-11-23 07:26:34,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2288240.0, ans=10.0 2023-11-23 07:26:42,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343250 2023-11-23 07:26:47,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2288306.6666666665, ans=0.125 2023-11-23 07:27:05,398 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6600, loss[loss=0.06493, simple_loss=0.08423, pruned_loss=0.01399, audio_tagging_loss=0.008823, over 14820.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09254, pruned_loss=0.01408, audio_tagging_loss=0.009121, over 3041611.76 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:27:37,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2288573.3333333335, ans=0.0 2023-11-23 07:27:46,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343300 2023-11-23 07:27:59,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.340e+01 9.003e+01 9.613e+01 1.234e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 07:28:01,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2288706.6666666665, ans=0.2 2023-11-23 07:28:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2288706.6666666665, ans=0.0 2023-11-23 07:28:04,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2288706.6666666665, ans=0.125 2023-11-23 07:28:11,238 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6650, loss[loss=0.06254, simple_loss=0.06914, pruned_loss=0.01277, audio_tagging_loss=0.0152, over 14389.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09362, pruned_loss=0.0143, audio_tagging_loss=0.009051, over 3049254.41 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:28:16,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2288773.3333333335, ans=0.125 2023-11-23 07:28:52,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343350 2023-11-23 07:29:14,915 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6700, loss[loss=0.06901, simple_loss=0.08887, pruned_loss=0.0146, audio_tagging_loss=0.009973, over 14489.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09344, pruned_loss=0.01437, audio_tagging_loss=0.009036, over 3039905.53 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:29:56,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343400 2023-11-23 07:30:07,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.148e+01 8.906e+01 9.681e+01 1.677e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 07:30:11,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2289373.3333333335, ans=0.2 2023-11-23 07:30:19,376 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6750, loss[loss=0.06128, simple_loss=0.08664, pruned_loss=0.01075, audio_tagging_loss=0.007205, over 14089.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09254, pruned_loss=0.01415, audio_tagging_loss=0.009121, over 3041716.56 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:30:24,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2289440.0, ans=0.125 2023-11-23 07:30:31,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2289440.0, ans=0.0 2023-11-23 07:31:00,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343450 2023-11-23 07:31:19,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2289706.6666666665, ans=0.1 2023-11-23 07:31:24,894 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6800, loss[loss=0.07383, simple_loss=0.1019, pruned_loss=0.01422, audio_tagging_loss=0.008685, over 15547.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09262, pruned_loss=0.01406, audio_tagging_loss=0.009137, over 3035684.00 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:31:31,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2289773.3333333335, ans=0.125 2023-11-23 07:31:36,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2023-11-23 07:31:37,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2289840.0, ans=0.125 2023-11-23 07:31:39,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2289840.0, ans=0.125 2023-11-23 07:32:06,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343500 2023-11-23 07:32:17,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.151e+01 8.874e+01 9.532e+01 1.488e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 07:32:24,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2290040.0, ans=0.2 2023-11-23 07:32:26,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2290040.0, ans=0.125 2023-11-23 07:32:28,852 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6850, loss[loss=0.07569, simple_loss=0.1054, pruned_loss=0.01658, audio_tagging_loss=0.006395, over 15416.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09232, pruned_loss=0.01399, audio_tagging_loss=0.009116, over 3037897.00 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:32:49,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-23 07:33:11,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343550 2023-11-23 07:33:11,377 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:33:32,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-23 07:33:33,152 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6900, loss[loss=0.07359, simple_loss=0.09922, pruned_loss=0.01458, audio_tagging_loss=0.009403, over 14819.00 frames. ], tot_loss[loss=0.06957, simple_loss=0.09288, pruned_loss=0.01406, audio_tagging_loss=0.00907, over 3040895.18 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:33:34,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2290440.0, ans=0.1 2023-11-23 07:33:38,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-23 07:33:51,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2290506.6666666665, ans=0.125 2023-11-23 07:34:14,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343600 2023-11-23 07:34:18,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2290640.0, ans=0.125 2023-11-23 07:34:22,074 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 07:34:23,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2290706.6666666665, ans=0.125 2023-11-23 07:34:26,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.208e+01 8.939e+01 9.660e+01 1.242e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-23 07:34:39,259 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 6950, loss[loss=0.08443, simple_loss=0.107, pruned_loss=0.01868, audio_tagging_loss=0.01224, over 15356.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09321, pruned_loss=0.01433, audio_tagging_loss=0.009025, over 3043727.96 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:34:48,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2290773.3333333335, ans=0.125 2023-11-23 07:35:19,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343650 2023-11-23 07:35:20,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2290973.3333333335, ans=0.07 2023-11-23 07:35:21,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2290973.3333333335, ans=0.125 2023-11-23 07:35:28,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2290973.3333333335, ans=0.125 2023-11-23 07:35:34,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2291040.0, ans=0.0 2023-11-23 07:35:42,388 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7000, loss[loss=0.07476, simple_loss=0.1092, pruned_loss=0.0126, audio_tagging_loss=0.007578, over 15818.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09333, pruned_loss=0.01446, audio_tagging_loss=0.009022, over 3041659.90 frames. ], batch size: 60, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:35:46,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2291106.6666666665, ans=10.0 2023-11-23 07:35:48,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2291106.6666666665, ans=0.0 2023-11-23 07:35:48,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2291106.6666666665, ans=0.125 2023-11-23 07:36:04,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2291173.3333333335, ans=0.125 2023-11-23 07:36:09,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2291240.0, ans=0.025 2023-11-23 07:36:12,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2291240.0, ans=0.2 2023-11-23 07:36:12,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2291240.0, ans=0.125 2023-11-23 07:36:13,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2291240.0, ans=0.0 2023-11-23 07:36:24,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343700 2023-11-23 07:36:34,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.255e+01 8.925e+01 9.735e+01 1.377e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 07:36:42,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2291373.3333333335, ans=0.2 2023-11-23 07:36:43,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2291373.3333333335, ans=0.0 2023-11-23 07:36:45,889 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7050, loss[loss=0.0712, simple_loss=0.09467, pruned_loss=0.0138, audio_tagging_loss=0.01006, over 15386.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09266, pruned_loss=0.01438, audio_tagging_loss=0.009112, over 3038240.14 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:36:47,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2291440.0, ans=0.2 2023-11-23 07:36:54,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2291440.0, ans=0.0 2023-11-23 07:36:58,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-23 07:37:27,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343750 2023-11-23 07:37:52,034 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7100, loss[loss=0.06848, simple_loss=0.0882, pruned_loss=0.01396, audio_tagging_loss=0.01042, over 15360.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09274, pruned_loss=0.0142, audio_tagging_loss=0.009166, over 3044096.23 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:38:03,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2291840.0, ans=0.0 2023-11-23 07:38:10,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2291840.0, ans=0.125 2023-11-23 07:38:27,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2291906.6666666665, ans=0.125 2023-11-23 07:38:28,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2291973.3333333335, ans=0.0 2023-11-23 07:38:32,199 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343800 2023-11-23 07:38:45,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.319e+01 8.862e+01 9.684e+01 1.135e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 07:38:54,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2292040.0, ans=0.125 2023-11-23 07:38:56,642 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7150, loss[loss=0.06626, simple_loss=0.08252, pruned_loss=0.0135, audio_tagging_loss=0.0115, over 15481.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.09324, pruned_loss=0.01432, audio_tagging_loss=0.009129, over 3046213.29 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:39:05,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2292106.6666666665, ans=0.125 2023-11-23 07:39:12,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2292173.3333333335, ans=0.125 2023-11-23 07:39:21,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-11-23 07:39:38,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343850 2023-11-23 07:39:38,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-23 07:39:42,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-23 07:39:45,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2292306.6666666665, ans=0.125 2023-11-23 07:39:46,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2292373.3333333335, ans=0.1 2023-11-23 07:40:00,100 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7200, loss[loss=0.05839, simple_loss=0.07842, pruned_loss=0.01142, audio_tagging_loss=0.007762, over 15099.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09298, pruned_loss=0.01426, audio_tagging_loss=0.009135, over 3040912.44 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:40:06,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2292440.0, ans=0.125 2023-11-23 07:40:22,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2292506.6666666665, ans=0.125 2023-11-23 07:40:36,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2292573.3333333335, ans=0.125 2023-11-23 07:40:41,596 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343900 2023-11-23 07:40:52,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.883e+01 8.353e+01 8.830e+01 9.779e+01 1.772e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 07:41:05,593 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7250, loss[loss=0.07896, simple_loss=0.1026, pruned_loss=0.01931, audio_tagging_loss=0.008356, over 14392.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09306, pruned_loss=0.01432, audio_tagging_loss=0.009193, over 3040036.26 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:41:15,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2292773.3333333335, ans=0.0 2023-11-23 07:41:40,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2292906.6666666665, ans=0.125 2023-11-23 07:41:45,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 343950 2023-11-23 07:41:50,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2292973.3333333335, ans=0.1 2023-11-23 07:42:00,082 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:42:05,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2293040.0, ans=0.125 2023-11-23 07:42:05,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2293040.0, ans=0.0 2023-11-23 07:42:10,330 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7300, loss[loss=0.06936, simple_loss=0.09862, pruned_loss=0.01279, audio_tagging_loss=0.007265, over 16016.00 frames. ], tot_loss[loss=0.06911, simple_loss=0.0917, pruned_loss=0.01407, audio_tagging_loss=0.009191, over 3030768.45 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:42:11,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2293106.6666666665, ans=0.125 2023-11-23 07:42:16,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2293106.6666666665, ans=0.125 2023-11-23 07:42:27,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2293173.3333333335, ans=0.125 2023-11-23 07:42:36,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2293240.0, ans=0.125 2023-11-23 07:42:47,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2293306.6666666665, ans=0.0 2023-11-23 07:42:51,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344000 2023-11-23 07:43:06,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.353e+01 8.994e+01 9.658e+01 1.142e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 07:43:17,906 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7350, loss[loss=0.0646, simple_loss=0.08719, pruned_loss=0.01247, audio_tagging_loss=0.008534, over 14605.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09185, pruned_loss=0.01418, audio_tagging_loss=0.009143, over 3031061.05 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:43:42,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2293506.6666666665, ans=0.125 2023-11-23 07:43:59,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2023-11-23 07:44:00,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344050 2023-11-23 07:44:07,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2293640.0, ans=0.07 2023-11-23 07:44:18,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2293706.6666666665, ans=0.1 2023-11-23 07:44:18,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2293706.6666666665, ans=0.1 2023-11-23 07:44:23,604 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7400, loss[loss=0.06187, simple_loss=0.08303, pruned_loss=0.009686, audio_tagging_loss=0.01067, over 15954.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09119, pruned_loss=0.01386, audio_tagging_loss=0.00912, over 3033728.05 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:44:39,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2293840.0, ans=0.125 2023-11-23 07:44:49,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-11-23 07:44:57,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2293906.6666666665, ans=0.2 2023-11-23 07:45:04,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344100 2023-11-23 07:45:11,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.81 vs. limit=22.5 2023-11-23 07:45:15,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2294040.0, ans=0.125 2023-11-23 07:45:17,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.613e+01 8.228e+01 8.684e+01 9.636e+01 1.290e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-23 07:45:28,112 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7450, loss[loss=0.06854, simple_loss=0.08398, pruned_loss=0.01465, audio_tagging_loss=0.0119, over 16413.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09052, pruned_loss=0.01365, audio_tagging_loss=0.009107, over 3039225.98 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:45:34,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2294106.6666666665, ans=0.0 2023-11-23 07:45:36,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2294106.6666666665, ans=0.125 2023-11-23 07:45:36,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2294106.6666666665, ans=0.0 2023-11-23 07:45:54,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2294240.0, ans=0.125 2023-11-23 07:46:03,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2294240.0, ans=0.125 2023-11-23 07:46:09,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344150 2023-11-23 07:46:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2294306.6666666665, ans=0.0 2023-11-23 07:46:15,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2294306.6666666665, ans=0.125 2023-11-23 07:46:26,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-23 07:46:31,662 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7500, loss[loss=0.07101, simple_loss=0.09105, pruned_loss=0.01605, audio_tagging_loss=0.009434, over 14574.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09111, pruned_loss=0.01372, audio_tagging_loss=0.00905, over 3044423.08 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:46:39,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2294440.0, ans=0.125 2023-11-23 07:46:40,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2294440.0, ans=0.125 2023-11-23 07:47:10,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2294640.0, ans=0.0 2023-11-23 07:47:13,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344200 2023-11-23 07:47:25,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.286e+01 8.811e+01 9.468e+01 1.775e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-23 07:47:27,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2294706.6666666665, ans=0.125 2023-11-23 07:47:27,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2294706.6666666665, ans=0.125 2023-11-23 07:47:33,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2294706.6666666665, ans=0.0 2023-11-23 07:47:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2294773.3333333335, ans=0.2 2023-11-23 07:47:36,010 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7550, loss[loss=0.06539, simple_loss=0.08562, pruned_loss=0.01188, audio_tagging_loss=0.01071, over 15571.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09108, pruned_loss=0.01375, audio_tagging_loss=0.009051, over 3047652.42 frames. ], batch size: 60, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:47:39,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2294773.3333333335, ans=0.1 2023-11-23 07:47:39,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2294773.3333333335, ans=0.05 2023-11-23 07:48:11,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2294906.6666666665, ans=0.0 2023-11-23 07:48:17,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344250 2023-11-23 07:48:21,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-23 07:48:37,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2295040.0, ans=0.125 2023-11-23 07:48:41,141 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7600, loss[loss=0.07688, simple_loss=0.0993, pruned_loss=0.01868, audio_tagging_loss=0.008543, over 15273.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09108, pruned_loss=0.01386, audio_tagging_loss=0.009074, over 3041560.96 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:48:43,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2295106.6666666665, ans=0.1 2023-11-23 07:48:47,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2295106.6666666665, ans=0.125 2023-11-23 07:48:54,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2295173.3333333335, ans=0.07 2023-11-23 07:48:56,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2295173.3333333335, ans=0.125 2023-11-23 07:49:00,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2295173.3333333335, ans=0.025 2023-11-23 07:49:12,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295240.0, ans=0.1 2023-11-23 07:49:22,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344300 2023-11-23 07:49:34,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2295373.3333333335, ans=0.2 2023-11-23 07:49:35,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.274e+01 8.656e+01 9.484e+01 1.340e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-23 07:49:40,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2295373.3333333335, ans=10.0 2023-11-23 07:49:40,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2295373.3333333335, ans=0.125 2023-11-23 07:49:44,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2295440.0, ans=0.0 2023-11-23 07:49:45,374 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7650, loss[loss=0.07367, simple_loss=0.09925, pruned_loss=0.01555, audio_tagging_loss=0.008498, over 15154.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09099, pruned_loss=0.01375, audio_tagging_loss=0.009076, over 3044742.11 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:49:45,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2295440.0, ans=0.0 2023-11-23 07:50:26,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344350 2023-11-23 07:50:26,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2295640.0, ans=0.1 2023-11-23 07:50:28,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2295640.0, ans=0.125 2023-11-23 07:50:44,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2295706.6666666665, ans=0.125 2023-11-23 07:50:48,878 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7700, loss[loss=0.09416, simple_loss=0.1348, pruned_loss=0.02013, audio_tagging_loss=0.00661, over 16634.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09253, pruned_loss=0.0141, audio_tagging_loss=0.009067, over 3044926.99 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:51:00,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2295773.3333333335, ans=0.1 2023-11-23 07:51:10,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-23 07:51:12,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2295840.0, ans=0.07 2023-11-23 07:51:23,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2295906.6666666665, ans=0.0 2023-11-23 07:51:29,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2295973.3333333335, ans=0.0 2023-11-23 07:51:30,212 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344400 2023-11-23 07:51:44,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.349e+01 8.922e+01 9.569e+01 1.181e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 07:51:54,180 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7750, loss[loss=0.09406, simple_loss=0.1325, pruned_loss=0.02158, audio_tagging_loss=0.006213, over 15584.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09165, pruned_loss=0.01387, audio_tagging_loss=0.009235, over 3040364.46 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:51:54,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2296106.6666666665, ans=0.0 2023-11-23 07:51:59,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2296106.6666666665, ans=0.125 2023-11-23 07:52:01,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2296106.6666666665, ans=0.0 2023-11-23 07:52:09,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2296173.3333333335, ans=0.125 2023-11-23 07:52:18,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 07:52:20,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2296240.0, ans=0.125 2023-11-23 07:52:33,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-23 07:52:35,127 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344450 2023-11-23 07:52:42,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-23 07:52:43,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2296306.6666666665, ans=0.125 2023-11-23 07:52:52,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2296373.3333333335, ans=0.125 2023-11-23 07:52:57,580 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7800, loss[loss=0.05639, simple_loss=0.07558, pruned_loss=0.009775, audio_tagging_loss=0.008827, over 15039.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09081, pruned_loss=0.01375, audio_tagging_loss=0.009214, over 3032270.30 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:52:57,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2296440.0, ans=0.07 2023-11-23 07:53:08,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2296440.0, ans=15.0 2023-11-23 07:53:24,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2023-11-23 07:53:33,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2296573.3333333335, ans=0.1 2023-11-23 07:53:39,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344500 2023-11-23 07:53:43,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.96 vs. limit=15.0 2023-11-23 07:53:45,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2296640.0, ans=0.125 2023-11-23 07:53:47,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-23 07:53:49,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2296706.6666666665, ans=0.125 2023-11-23 07:53:53,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.147e+01 8.634e+01 9.399e+01 1.178e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-23 07:54:01,872 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7850, loss[loss=0.0519, simple_loss=0.06522, pruned_loss=0.009922, audio_tagging_loss=0.00937, over 15488.00 frames. ], tot_loss[loss=0.0689, simple_loss=0.09161, pruned_loss=0.01389, audio_tagging_loss=0.009207, over 3028156.25 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:54:11,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2296773.3333333335, ans=0.0 2023-11-23 07:54:17,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2296840.0, ans=0.0 2023-11-23 07:54:28,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2296906.6666666665, ans=0.125 2023-11-23 07:54:42,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-11-23 07:54:43,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344550 2023-11-23 07:54:45,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2296973.3333333335, ans=0.125 2023-11-23 07:55:07,318 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7900, loss[loss=0.08053, simple_loss=0.1061, pruned_loss=0.01898, audio_tagging_loss=0.008473, over 15414.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09226, pruned_loss=0.01394, audio_tagging_loss=0.009227, over 3034212.08 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:55:14,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=22.5 2023-11-23 07:55:22,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-23 07:55:39,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-23 07:55:40,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-23 07:55:44,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2297306.6666666665, ans=0.025 2023-11-23 07:55:45,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2297306.6666666665, ans=0.125 2023-11-23 07:55:47,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2297306.6666666665, ans=0.2 2023-11-23 07:55:48,151 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344600 2023-11-23 07:55:55,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2297306.6666666665, ans=0.0 2023-11-23 07:56:02,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.370e+01 9.081e+01 9.662e+01 1.223e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-23 07:56:11,595 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 7950, loss[loss=0.08444, simple_loss=0.1134, pruned_loss=0.01672, audio_tagging_loss=0.01103, over 15664.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09149, pruned_loss=0.01363, audio_tagging_loss=0.009352, over 3030140.40 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 07:56:26,800 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 07:56:27,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2297506.6666666665, ans=0.0 2023-11-23 07:56:52,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344650 2023-11-23 07:56:55,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-23 07:56:56,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2297640.0, ans=0.0 2023-11-23 07:56:59,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-23 07:57:02,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2297706.6666666665, ans=0.09899494936611666 2023-11-23 07:57:15,736 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8000, loss[loss=0.05923, simple_loss=0.08227, pruned_loss=0.007863, audio_tagging_loss=0.01023, over 16619.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09093, pruned_loss=0.01365, audio_tagging_loss=0.009457, over 3032256.65 frames. ], batch size: 63, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:57:18,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2297773.3333333335, ans=0.125 2023-11-23 07:57:20,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2297773.3333333335, ans=0.2 2023-11-23 07:57:20,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2297773.3333333335, ans=0.0 2023-11-23 07:57:25,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2297773.3333333335, ans=0.0 2023-11-23 07:57:26,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2297773.3333333335, ans=0.0 2023-11-23 07:57:30,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2297840.0, ans=0.125 2023-11-23 07:57:30,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2023-11-23 07:57:44,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2297906.6666666665, ans=0.125 2023-11-23 07:57:49,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2297906.6666666665, ans=0.0 2023-11-23 07:57:56,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344700 2023-11-23 07:58:03,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2297973.3333333335, ans=0.125 2023-11-23 07:58:10,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.163e+01 8.824e+01 9.486e+01 1.093e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 07:58:11,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2298040.0, ans=0.0 2023-11-23 07:58:21,514 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8050, loss[loss=0.04927, simple_loss=0.05679, pruned_loss=0.008619, audio_tagging_loss=0.01225, over 15456.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.0909, pruned_loss=0.01373, audio_tagging_loss=0.009451, over 3036582.26 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:58:22,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-23 07:58:27,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2298106.6666666665, ans=0.025 2023-11-23 07:58:29,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2298106.6666666665, ans=0.0 2023-11-23 07:58:49,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2298240.0, ans=0.5 2023-11-23 07:58:52,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2298240.0, ans=0.125 2023-11-23 07:58:55,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2298240.0, ans=0.125 2023-11-23 07:59:02,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344750 2023-11-23 07:59:02,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2298306.6666666665, ans=0.0 2023-11-23 07:59:04,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2298306.6666666665, ans=0.0 2023-11-23 07:59:15,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2298373.3333333335, ans=0.1 2023-11-23 07:59:23,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2298373.3333333335, ans=0.125 2023-11-23 07:59:25,543 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8100, loss[loss=0.04705, simple_loss=0.05908, pruned_loss=0.006777, audio_tagging_loss=0.01074, over 16045.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09081, pruned_loss=0.01369, audio_tagging_loss=0.009328, over 3040877.22 frames. ], batch size: 65, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 07:59:30,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2298440.0, ans=0.125 2023-11-23 07:59:39,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2023-11-23 08:00:01,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=22.5 2023-11-23 08:00:07,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344800 2023-11-23 08:00:15,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2298640.0, ans=0.125 2023-11-23 08:00:18,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-23 08:00:22,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.423e+01 8.748e+01 9.445e+01 1.319e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-23 08:00:28,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2298773.3333333335, ans=0.0 2023-11-23 08:00:29,747 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8150, loss[loss=0.06812, simple_loss=0.08907, pruned_loss=0.01396, audio_tagging_loss=0.009625, over 14644.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09182, pruned_loss=0.01408, audio_tagging_loss=0.00923, over 3046198.75 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 08:00:42,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2298840.0, ans=0.125 2023-11-23 08:00:53,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-23 08:01:10,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-23 08:01:11,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344850 2023-11-23 08:01:30,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2299040.0, ans=0.0 2023-11-23 08:01:34,466 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8200, loss[loss=0.05151, simple_loss=0.06696, pruned_loss=0.009182, audio_tagging_loss=0.008848, over 14847.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09219, pruned_loss=0.01414, audio_tagging_loss=0.009133, over 3046443.12 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 08:01:34,530 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:01:40,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2299106.6666666665, ans=0.125 2023-11-23 08:01:48,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-23 08:02:14,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344900 2023-11-23 08:02:22,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2299306.6666666665, ans=0.125 2023-11-23 08:02:31,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.311e+01 8.960e+01 9.864e+01 1.172e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 08:02:35,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-23 08:02:38,994 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8250, loss[loss=0.05378, simple_loss=0.06413, pruned_loss=0.01, audio_tagging_loss=0.01171, over 15561.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09184, pruned_loss=0.01407, audio_tagging_loss=0.009078, over 3042052.68 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 08:02:41,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2299440.0, ans=0.0 2023-11-23 08:03:10,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2299573.3333333335, ans=0.0 2023-11-23 08:03:21,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 344950 2023-11-23 08:03:32,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2299706.6666666665, ans=0.125 2023-11-23 08:03:32,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2299706.6666666665, ans=0.0 2023-11-23 08:03:42,980 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8300, loss[loss=0.05227, simple_loss=0.06709, pruned_loss=0.008843, audio_tagging_loss=0.009886, over 15190.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09176, pruned_loss=0.01407, audio_tagging_loss=0.009046, over 3043029.70 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 08:03:44,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2299773.3333333335, ans=0.1 2023-11-23 08:03:54,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2299840.0, ans=0.125 2023-11-23 08:04:23,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-23 08:04:23,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345000 2023-11-23 08:04:38,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.704e+01 8.216e+01 8.783e+01 9.423e+01 1.176e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-23 08:04:46,572 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8350, loss[loss=0.05945, simple_loss=0.08339, pruned_loss=0.009449, audio_tagging_loss=0.0083, over 15078.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09132, pruned_loss=0.014, audio_tagging_loss=0.009086, over 3038700.08 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 16.0 2023-11-23 08:05:05,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2300173.3333333335, ans=0.0 2023-11-23 08:05:25,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2300306.6666666665, ans=0.2 2023-11-23 08:05:27,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345050 2023-11-23 08:05:51,640 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8400, loss[loss=0.04941, simple_loss=0.06553, pruned_loss=0.007931, audio_tagging_loss=0.008718, over 15170.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09068, pruned_loss=0.01396, audio_tagging_loss=0.009096, over 3036590.28 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:05:54,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2023-11-23 08:06:10,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2300506.6666666665, ans=0.0 2023-11-23 08:06:16,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2300573.3333333335, ans=0.125 2023-11-23 08:06:19,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2300573.3333333335, ans=0.0 2023-11-23 08:06:33,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345100 2023-11-23 08:06:38,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-11-23 08:06:44,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2300706.6666666665, ans=0.05 2023-11-23 08:06:47,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.014e+01 8.846e+01 9.500e+01 1.669e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 08:06:51,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2300706.6666666665, ans=0.0 2023-11-23 08:06:55,293 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8450, loss[loss=0.07516, simple_loss=0.0993, pruned_loss=0.01575, audio_tagging_loss=0.009761, over 13928.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.09191, pruned_loss=0.01432, audio_tagging_loss=0.009069, over 3043146.81 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:07:00,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2300773.3333333335, ans=0.125 2023-11-23 08:07:07,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2300840.0, ans=0.125 2023-11-23 08:07:18,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2300840.0, ans=0.125 2023-11-23 08:07:36,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345150 2023-11-23 08:07:42,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2300973.3333333335, ans=0.125 2023-11-23 08:07:48,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2301040.0, ans=0.125 2023-11-23 08:07:51,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-23 08:07:54,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2301040.0, ans=0.1 2023-11-23 08:07:58,683 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8500, loss[loss=0.05149, simple_loss=0.06058, pruned_loss=0.008186, audio_tagging_loss=0.01301, over 16162.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09255, pruned_loss=0.01438, audio_tagging_loss=0.009111, over 3050337.26 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:08:05,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2023-11-23 08:08:13,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2301173.3333333335, ans=0.1 2023-11-23 08:08:39,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345200 2023-11-23 08:08:55,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.259e+01 8.832e+01 9.401e+01 1.675e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 08:08:58,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2301373.3333333335, ans=0.0 2023-11-23 08:09:03,815 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8550, loss[loss=0.08032, simple_loss=0.1053, pruned_loss=0.01868, audio_tagging_loss=0.009006, over 15315.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09173, pruned_loss=0.01422, audio_tagging_loss=0.009216, over 3051609.18 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:09:18,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2301506.6666666665, ans=0.05 2023-11-23 08:09:22,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2301506.6666666665, ans=0.04949747468305833 2023-11-23 08:09:40,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2301640.0, ans=0.125 2023-11-23 08:09:44,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345250 2023-11-23 08:09:46,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2023-11-23 08:09:57,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2301706.6666666665, ans=0.1 2023-11-23 08:10:07,649 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8600, loss[loss=0.06892, simple_loss=0.08699, pruned_loss=0.01506, audio_tagging_loss=0.01037, over 14945.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09149, pruned_loss=0.01414, audio_tagging_loss=0.009233, over 3047835.20 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:10:10,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2301773.3333333335, ans=0.125 2023-11-23 08:10:11,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2301773.3333333335, ans=0.0 2023-11-23 08:10:49,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345300 2023-11-23 08:10:56,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2301973.3333333335, ans=0.125 2023-11-23 08:10:56,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2301973.3333333335, ans=0.125 2023-11-23 08:11:00,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2302040.0, ans=0.0 2023-11-23 08:11:03,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.320e+01 8.873e+01 9.523e+01 1.247e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 08:11:09,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2302040.0, ans=0.125 2023-11-23 08:11:11,299 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8650, loss[loss=0.07492, simple_loss=0.1025, pruned_loss=0.01433, audio_tagging_loss=0.009334, over 14307.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09228, pruned_loss=0.01417, audio_tagging_loss=0.009223, over 3053560.43 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:11:11,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2302106.6666666665, ans=0.1 2023-11-23 08:11:42,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2302240.0, ans=0.125 2023-11-23 08:11:52,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345350 2023-11-23 08:12:07,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:12:12,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2023-11-23 08:12:16,459 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8700, loss[loss=0.06911, simple_loss=0.08644, pruned_loss=0.01444, audio_tagging_loss=0.01145, over 14713.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09181, pruned_loss=0.01403, audio_tagging_loss=0.009298, over 3045282.28 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:12:25,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2302440.0, ans=0.125 2023-11-23 08:12:28,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2302506.6666666665, ans=0.2 2023-11-23 08:12:42,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2023-11-23 08:12:48,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2302573.3333333335, ans=0.0 2023-11-23 08:12:52,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2302573.3333333335, ans=0.025 2023-11-23 08:12:54,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2302640.0, ans=0.125 2023-11-23 08:12:54,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2302640.0, ans=0.125 2023-11-23 08:12:57,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345400 2023-11-23 08:13:11,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2302706.6666666665, ans=0.1 2023-11-23 08:13:13,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.372e+01 9.197e+01 9.763e+01 1.279e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-23 08:13:20,769 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8750, loss[loss=0.07056, simple_loss=0.09268, pruned_loss=0.01431, audio_tagging_loss=0.009901, over 15255.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.09196, pruned_loss=0.01401, audio_tagging_loss=0.009355, over 3043594.84 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2023-11-23 08:13:34,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-23 08:13:59,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2302973.3333333335, ans=0.2 2023-11-23 08:14:01,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345450 2023-11-23 08:14:05,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-23 08:14:24,231 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8800, loss[loss=0.07298, simple_loss=0.1005, pruned_loss=0.01434, audio_tagging_loss=0.008407, over 14440.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09221, pruned_loss=0.01399, audio_tagging_loss=0.009463, over 3044951.13 frames. ], batch size: 53, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:14:24,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-23 08:14:35,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-23 08:14:46,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2303173.3333333335, ans=0.1 2023-11-23 08:14:50,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2303240.0, ans=15.0 2023-11-23 08:15:04,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345500 2023-11-23 08:15:07,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2303306.6666666665, ans=0.1 2023-11-23 08:15:20,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.339e+01 9.015e+01 9.606e+01 1.190e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 08:15:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2303373.3333333335, ans=0.1 2023-11-23 08:15:28,082 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8850, loss[loss=0.08532, simple_loss=0.1115, pruned_loss=0.02166, audio_tagging_loss=0.007931, over 15399.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09122, pruned_loss=0.01393, audio_tagging_loss=0.009524, over 3047143.23 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:15:39,095 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:15:48,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2303506.6666666665, ans=0.0 2023-11-23 08:15:52,246 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:16:08,795 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345550 2023-11-23 08:16:08,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2303640.0, ans=0.125 2023-11-23 08:16:10,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2303640.0, ans=0.125 2023-11-23 08:16:24,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2303706.6666666665, ans=0.125 2023-11-23 08:16:25,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2303706.6666666665, ans=0.0 2023-11-23 08:16:31,255 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8900, loss[loss=0.08357, simple_loss=0.1056, pruned_loss=0.01982, audio_tagging_loss=0.01097, over 15259.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09175, pruned_loss=0.01396, audio_tagging_loss=0.009339, over 3047540.50 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:16:42,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2303840.0, ans=0.2 2023-11-23 08:16:52,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2303840.0, ans=0.125 2023-11-23 08:17:11,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345600 2023-11-23 08:17:18,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2303973.3333333335, ans=0.125 2023-11-23 08:17:20,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2304040.0, ans=0.025 2023-11-23 08:17:27,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.364e+01 9.019e+01 9.831e+01 1.164e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 08:17:34,811 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 8950, loss[loss=0.05314, simple_loss=0.0626, pruned_loss=0.009617, audio_tagging_loss=0.01223, over 14773.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09108, pruned_loss=0.01397, audio_tagging_loss=0.009302, over 3046444.26 frames. ], batch size: 59, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:17:36,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2304106.6666666665, ans=0.0 2023-11-23 08:17:37,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2304106.6666666665, ans=0.1 2023-11-23 08:17:37,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2304106.6666666665, ans=0.125 2023-11-23 08:17:47,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2304173.3333333335, ans=0.0 2023-11-23 08:18:16,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345650 2023-11-23 08:18:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2304373.3333333335, ans=0.125 2023-11-23 08:18:40,002 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9000, loss[loss=0.05162, simple_loss=0.07647, pruned_loss=0.006209, audio_tagging_loss=0.007178, over 15524.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09095, pruned_loss=0.01377, audio_tagging_loss=0.009127, over 3050591.98 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:18:40,003 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 08:19:01,457 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.2291, 4.5553, 4.6208, 4.5573], device='cuda:1') 2023-11-23 08:19:12,413 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0395, 5.8987, 5.7833, 5.6356], device='cuda:1') 2023-11-23 08:19:23,258 INFO [train_asr.py:1253] (1/4) Epoch 29, validation: loss=0.05895, simple_loss=0.05118, pruned_loss=0.005121, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-23 08:19:23,259 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 08:19:24,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-23 08:19:44,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2304506.6666666665, ans=0.0 2023-11-23 08:19:57,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-11-23 08:20:04,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345700 2023-11-23 08:20:08,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2304640.0, ans=0.125 2023-11-23 08:20:14,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2304706.6666666665, ans=0.125 2023-11-23 08:20:19,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.225e+01 9.124e+01 9.733e+01 1.162e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 08:20:26,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-23 08:20:27,069 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9050, loss[loss=0.07174, simple_loss=0.1017, pruned_loss=0.01289, audio_tagging_loss=0.007982, over 14381.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.09245, pruned_loss=0.0141, audio_tagging_loss=0.009015, over 3061136.63 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:20:42,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2304840.0, ans=0.2 2023-11-23 08:20:58,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2304906.6666666665, ans=0.1 2023-11-23 08:21:01,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2304906.6666666665, ans=0.125 2023-11-23 08:21:08,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345750 2023-11-23 08:21:31,411 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9100, loss[loss=0.06723, simple_loss=0.08331, pruned_loss=0.01446, audio_tagging_loss=0.01112, over 14628.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09256, pruned_loss=0.0141, audio_tagging_loss=0.008922, over 3059746.88 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:21:41,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2305106.6666666665, ans=0.125 2023-11-23 08:21:48,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2305173.3333333335, ans=0.2 2023-11-23 08:21:48,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2305173.3333333335, ans=0.125 2023-11-23 08:22:02,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2305240.0, ans=0.05 2023-11-23 08:22:12,294 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345800 2023-11-23 08:22:20,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-23 08:22:28,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.662e+01 8.178e+01 8.924e+01 9.840e+01 1.213e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 08:22:34,157 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9150, loss[loss=0.06423, simple_loss=0.07779, pruned_loss=0.01379, audio_tagging_loss=0.01154, over 15728.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09269, pruned_loss=0.01422, audio_tagging_loss=0.008945, over 3061672.33 frames. ], batch size: 61, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:22:36,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2305440.0, ans=0.125 2023-11-23 08:22:50,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=12.0 2023-11-23 08:22:54,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-23 08:22:57,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2023-11-23 08:23:04,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-23 08:23:15,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345850 2023-11-23 08:23:15,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2305640.0, ans=0.125 2023-11-23 08:23:35,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2305706.6666666665, ans=0.0 2023-11-23 08:23:37,709 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9200, loss[loss=0.08012, simple_loss=0.1039, pruned_loss=0.01739, audio_tagging_loss=0.01076, over 14737.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09277, pruned_loss=0.0142, audio_tagging_loss=0.008895, over 3053870.84 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:23:50,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2305840.0, ans=12.0 2023-11-23 08:23:55,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2305840.0, ans=0.09899494936611666 2023-11-23 08:23:57,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2305840.0, ans=0.1 2023-11-23 08:24:01,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2305840.0, ans=0.0 2023-11-23 08:24:10,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2305906.6666666665, ans=0.05 2023-11-23 08:24:16,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2305973.3333333335, ans=0.1 2023-11-23 08:24:18,696 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345900 2023-11-23 08:24:30,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2306040.0, ans=0.04949747468305833 2023-11-23 08:24:35,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.303e+01 8.816e+01 9.419e+01 1.160e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 08:24:42,454 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9250, loss[loss=0.0879, simple_loss=0.1284, pruned_loss=0.01709, audio_tagging_loss=0.006591, over 15425.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09258, pruned_loss=0.01416, audio_tagging_loss=0.00892, over 3051784.86 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:24:47,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2306106.6666666665, ans=10.0 2023-11-23 08:24:48,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2306106.6666666665, ans=0.09899494936611666 2023-11-23 08:24:51,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2306106.6666666665, ans=0.125 2023-11-23 08:25:16,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-11-23 08:25:22,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 345950 2023-11-23 08:25:23,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-11-23 08:25:27,116 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:25:34,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2306373.3333333335, ans=0.0 2023-11-23 08:25:35,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2306373.3333333335, ans=0.2 2023-11-23 08:25:44,887 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9300, loss[loss=0.067, simple_loss=0.08565, pruned_loss=0.01432, audio_tagging_loss=0.009853, over 14702.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09188, pruned_loss=0.01414, audio_tagging_loss=0.009044, over 3058727.68 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:25:47,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2306440.0, ans=0.125 2023-11-23 08:25:51,126 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:26:03,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2306506.6666666665, ans=0.125 2023-11-23 08:26:10,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2306573.3333333335, ans=0.125 2023-11-23 08:26:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2306573.3333333335, ans=0.0 2023-11-23 08:26:25,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-23 08:26:25,808 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346000 2023-11-23 08:26:41,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.561e+01 8.507e+01 8.976e+01 9.625e+01 1.274e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-23 08:26:45,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2023-11-23 08:26:48,035 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9350, loss[loss=0.06582, simple_loss=0.08016, pruned_loss=0.0131, audio_tagging_loss=0.01264, over 15087.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.093, pruned_loss=0.01426, audio_tagging_loss=0.00913, over 3056634.76 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:26:57,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2306773.3333333335, ans=0.125 2023-11-23 08:27:29,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346050 2023-11-23 08:27:29,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2306973.3333333335, ans=0.1 2023-11-23 08:27:34,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-11-23 08:27:39,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-23 08:27:47,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2307040.0, ans=0.015 2023-11-23 08:27:52,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2307106.6666666665, ans=0.1 2023-11-23 08:27:53,081 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9400, loss[loss=0.07508, simple_loss=0.1045, pruned_loss=0.01509, audio_tagging_loss=0.007736, over 15830.00 frames. ], tot_loss[loss=0.07035, simple_loss=0.0934, pruned_loss=0.0144, audio_tagging_loss=0.009251, over 3054510.69 frames. ], batch size: 60, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:28:22,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2307240.0, ans=0.125 2023-11-23 08:28:29,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2307306.6666666665, ans=0.125 2023-11-23 08:28:33,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346100 2023-11-23 08:28:36,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2307306.6666666665, ans=0.95 2023-11-23 08:28:39,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2307306.6666666665, ans=0.1 2023-11-23 08:28:44,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2307373.3333333335, ans=0.125 2023-11-23 08:28:51,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.336e+01 8.834e+01 9.625e+01 1.227e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 08:28:52,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-11-23 08:28:53,800 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:28:56,213 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9450, loss[loss=0.08188, simple_loss=0.1122, pruned_loss=0.01773, audio_tagging_loss=0.008046, over 15182.00 frames. ], tot_loss[loss=0.07075, simple_loss=0.0942, pruned_loss=0.0145, audio_tagging_loss=0.009147, over 3047883.28 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:29:09,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2307506.6666666665, ans=0.1 2023-11-23 08:29:36,708 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346150 2023-11-23 08:29:38,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2307640.0, ans=0.125 2023-11-23 08:29:50,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2307706.6666666665, ans=0.0 2023-11-23 08:29:58,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=12.0 2023-11-23 08:29:58,756 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9500, loss[loss=0.07302, simple_loss=0.09611, pruned_loss=0.01329, audio_tagging_loss=0.01167, over 16267.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09331, pruned_loss=0.01445, audio_tagging_loss=0.009215, over 3048917.65 frames. ], batch size: 59, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:29:58,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2307773.3333333335, ans=0.0 2023-11-23 08:30:39,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346200 2023-11-23 08:30:52,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2308040.0, ans=0.0 2023-11-23 08:30:56,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.410e+01 8.909e+01 9.826e+01 1.512e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 08:31:03,148 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9550, loss[loss=0.07465, simple_loss=0.1025, pruned_loss=0.01441, audio_tagging_loss=0.008967, over 16184.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09258, pruned_loss=0.01427, audio_tagging_loss=0.009329, over 3046097.88 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:31:34,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2308240.0, ans=0.125 2023-11-23 08:31:43,420 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346250 2023-11-23 08:32:00,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2308373.3333333335, ans=0.0 2023-11-23 08:32:07,619 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9600, loss[loss=0.07835, simple_loss=0.1019, pruned_loss=0.01822, audio_tagging_loss=0.009163, over 14119.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.09365, pruned_loss=0.01437, audio_tagging_loss=0.009312, over 3046573.76 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:32:12,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2308440.0, ans=0.125 2023-11-23 08:32:34,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-23 08:32:47,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2308640.0, ans=0.0 2023-11-23 08:32:49,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346300 2023-11-23 08:33:06,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.302e+01 9.179e+01 9.827e+01 1.321e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-23 08:33:09,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2308706.6666666665, ans=0.125 2023-11-23 08:33:11,993 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9650, loss[loss=0.06338, simple_loss=0.07818, pruned_loss=0.0121, audio_tagging_loss=0.01219, over 16602.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09316, pruned_loss=0.01424, audio_tagging_loss=0.009319, over 3049214.91 frames. ], batch size: 66, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:33:32,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2023-11-23 08:33:36,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2308840.0, ans=0.1 2023-11-23 08:33:40,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-11-23 08:33:48,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2308906.6666666665, ans=0.0 2023-11-23 08:33:53,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346350 2023-11-23 08:33:56,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2308973.3333333335, ans=0.125 2023-11-23 08:33:58,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2308973.3333333335, ans=0.07 2023-11-23 08:34:02,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2309040.0, ans=0.5 2023-11-23 08:34:08,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2309040.0, ans=0.125 2023-11-23 08:34:15,590 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9700, loss[loss=0.04243, simple_loss=0.05064, pruned_loss=0.003465, audio_tagging_loss=0.01365, over 14528.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09403, pruned_loss=0.01433, audio_tagging_loss=0.009119, over 3050109.55 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:34:30,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2023-11-23 08:34:38,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2309173.3333333335, ans=0.1 2023-11-23 08:34:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2309240.0, ans=0.125 2023-11-23 08:34:58,286 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346400 2023-11-23 08:35:02,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2309306.6666666665, ans=0.1 2023-11-23 08:35:04,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2309306.6666666665, ans=0.2 2023-11-23 08:35:19,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.371e+01 8.992e+01 9.698e+01 1.313e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-23 08:35:23,343 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9750, loss[loss=0.08146, simple_loss=0.1167, pruned_loss=0.01415, audio_tagging_loss=0.008976, over 14653.00 frames. ], tot_loss[loss=0.06968, simple_loss=0.09314, pruned_loss=0.01403, audio_tagging_loss=0.009078, over 3050167.83 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:35:31,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2309440.0, ans=0.1 2023-11-23 08:35:31,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2309440.0, ans=0.125 2023-11-23 08:35:44,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2309506.6666666665, ans=0.0 2023-11-23 08:36:04,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346450 2023-11-23 08:36:14,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2309706.6666666665, ans=0.125 2023-11-23 08:36:15,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2309706.6666666665, ans=0.1 2023-11-23 08:36:27,487 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9800, loss[loss=0.07768, simple_loss=0.1135, pruned_loss=0.01669, audio_tagging_loss=0.004258, over 15524.00 frames. ], tot_loss[loss=0.07037, simple_loss=0.09405, pruned_loss=0.01431, audio_tagging_loss=0.00904, over 3044634.46 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:36:38,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2309840.0, ans=0.0 2023-11-23 08:37:01,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2309906.6666666665, ans=0.125 2023-11-23 08:37:09,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346500 2023-11-23 08:37:09,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2309973.3333333335, ans=0.125 2023-11-23 08:37:19,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2310040.0, ans=0.1 2023-11-23 08:37:23,984 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:37:27,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.413e+01 9.129e+01 9.676e+01 1.290e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-23 08:37:27,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2310040.0, ans=0.125 2023-11-23 08:37:31,329 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9850, loss[loss=0.0493, simple_loss=0.06489, pruned_loss=0.008844, audio_tagging_loss=0.008009, over 14657.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.0938, pruned_loss=0.01431, audio_tagging_loss=0.009009, over 3043057.10 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:38:12,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346550 2023-11-23 08:38:24,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-23 08:38:34,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-23 08:38:36,336 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9900, loss[loss=0.07159, simple_loss=0.09354, pruned_loss=0.01594, audio_tagging_loss=0.008878, over 14101.00 frames. ], tot_loss[loss=0.07012, simple_loss=0.09359, pruned_loss=0.01427, audio_tagging_loss=0.00906, over 3038202.33 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:38:47,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-23 08:39:17,735 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346600 2023-11-23 08:39:27,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-23 08:39:36,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.245e+01 9.085e+01 9.612e+01 1.420e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 08:39:40,503 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 9950, loss[loss=0.07758, simple_loss=0.1006, pruned_loss=0.01648, audio_tagging_loss=0.01081, over 15253.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09402, pruned_loss=0.01425, audio_tagging_loss=0.008976, over 3050427.91 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:39:44,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2310773.3333333335, ans=0.0 2023-11-23 08:39:50,714 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:40:18,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2023-11-23 08:40:22,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346650 2023-11-23 08:40:23,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2310973.3333333335, ans=0.125 2023-11-23 08:40:28,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-23 08:40:43,954 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10000, loss[loss=0.06464, simple_loss=0.08739, pruned_loss=0.0117, audio_tagging_loss=0.009246, over 15709.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09397, pruned_loss=0.01434, audio_tagging_loss=0.009007, over 3044904.03 frames. ], batch size: 60, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:40:49,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2311106.6666666665, ans=0.0 2023-11-23 08:41:17,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2311240.0, ans=0.1 2023-11-23 08:41:18,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2311240.0, ans=15.0 2023-11-23 08:41:24,602 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346700 2023-11-23 08:41:33,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2311373.3333333335, ans=0.125 2023-11-23 08:41:34,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2311373.3333333335, ans=0.125 2023-11-23 08:41:42,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-11-23 08:41:44,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.283e+01 8.893e+01 9.595e+01 1.369e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 08:41:47,682 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10050, loss[loss=0.05801, simple_loss=0.07656, pruned_loss=0.008007, audio_tagging_loss=0.01172, over 15204.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09344, pruned_loss=0.01413, audio_tagging_loss=0.008978, over 3048690.93 frames. ], batch size: 59, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:41:52,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2311440.0, ans=0.125 2023-11-23 08:42:28,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346750 2023-11-23 08:42:41,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2311706.6666666665, ans=0.2 2023-11-23 08:42:51,631 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10100, loss[loss=0.07167, simple_loss=0.09397, pruned_loss=0.01769, audio_tagging_loss=0.006994, over 14778.00 frames. ], tot_loss[loss=0.06976, simple_loss=0.09306, pruned_loss=0.01419, audio_tagging_loss=0.009031, over 3046218.99 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:43:00,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2311773.3333333335, ans=0.125 2023-11-23 08:43:09,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2311840.0, ans=0.125 2023-11-23 08:43:25,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2311906.6666666665, ans=0.0 2023-11-23 08:43:32,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346800 2023-11-23 08:43:35,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2311973.3333333335, ans=0.125 2023-11-23 08:43:43,429 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:43:53,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.400e+01 8.955e+01 9.679e+01 1.203e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 08:43:55,578 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10150, loss[loss=0.06474, simple_loss=0.09201, pruned_loss=0.01034, audio_tagging_loss=0.008397, over 15165.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09364, pruned_loss=0.01425, audio_tagging_loss=0.009032, over 3049439.22 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:43:56,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2312106.6666666665, ans=0.2 2023-11-23 08:44:19,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2312173.3333333335, ans=0.125 2023-11-23 08:44:25,138 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:44:26,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2312240.0, ans=0.2 2023-11-23 08:44:32,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2023-11-23 08:44:36,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346850 2023-11-23 08:44:55,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2312373.3333333335, ans=0.0 2023-11-23 08:44:56,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2312373.3333333335, ans=0.125 2023-11-23 08:45:00,215 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10200, loss[loss=0.05722, simple_loss=0.0757, pruned_loss=0.01044, audio_tagging_loss=0.008937, over 14749.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09301, pruned_loss=0.01417, audio_tagging_loss=0.009071, over 3050679.74 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:45:23,094 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 08:45:23,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2312506.6666666665, ans=0.05 2023-11-23 08:45:28,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2312573.3333333335, ans=15.0 2023-11-23 08:45:28,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=22.5 2023-11-23 08:45:33,323 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 08:45:40,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346900 2023-11-23 08:45:48,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2312640.0, ans=0.0 2023-11-23 08:46:01,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.340e+01 8.896e+01 9.645e+01 1.207e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 08:46:04,274 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10250, loss[loss=0.05931, simple_loss=0.08491, pruned_loss=0.009318, audio_tagging_loss=0.007534, over 15307.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09323, pruned_loss=0.01404, audio_tagging_loss=0.009125, over 3050429.38 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:46:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2312773.3333333335, ans=0.1 2023-11-23 08:46:09,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2312773.3333333335, ans=0.125 2023-11-23 08:46:20,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-23 08:46:21,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2312840.0, ans=0.125 2023-11-23 08:46:23,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2312840.0, ans=0.125 2023-11-23 08:46:26,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2312840.0, ans=0.125 2023-11-23 08:46:37,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2312906.6666666665, ans=0.2 2023-11-23 08:46:45,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 346950 2023-11-23 08:47:08,014 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10300, loss[loss=0.05063, simple_loss=0.06056, pruned_loss=0.009718, audio_tagging_loss=0.01063, over 14724.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09239, pruned_loss=0.01406, audio_tagging_loss=0.009195, over 3047744.55 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:47:28,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2313173.3333333335, ans=0.0 2023-11-23 08:47:48,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347000 2023-11-23 08:47:57,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2313306.6666666665, ans=0.125 2023-11-23 08:48:04,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2313373.3333333335, ans=0.1 2023-11-23 08:48:08,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2313373.3333333335, ans=0.125 2023-11-23 08:48:10,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.826e+01 8.491e+01 9.106e+01 9.880e+01 1.580e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-23 08:48:12,721 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10350, loss[loss=0.05934, simple_loss=0.07832, pruned_loss=0.01074, audio_tagging_loss=0.009436, over 15656.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09282, pruned_loss=0.01419, audio_tagging_loss=0.009245, over 3047960.49 frames. ], batch size: 61, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:48:22,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-23 08:48:44,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2313573.3333333335, ans=0.1 2023-11-23 08:48:52,795 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347050 2023-11-23 08:49:03,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2313706.6666666665, ans=0.2 2023-11-23 08:49:09,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2313706.6666666665, ans=0.125 2023-11-23 08:49:16,475 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10400, loss[loss=0.06917, simple_loss=0.09093, pruned_loss=0.01605, audio_tagging_loss=0.007657, over 15010.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09287, pruned_loss=0.01423, audio_tagging_loss=0.009342, over 3046347.61 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:49:21,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2313773.3333333335, ans=0.0 2023-11-23 08:49:24,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2313773.3333333335, ans=0.125 2023-11-23 08:49:25,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2313773.3333333335, ans=0.125 2023-11-23 08:49:43,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2313906.6666666665, ans=0.125 2023-11-23 08:49:58,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347100 2023-11-23 08:50:03,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2313973.3333333335, ans=0.05 2023-11-23 08:50:18,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.096e+01 8.361e+01 8.771e+01 9.632e+01 1.204e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-23 08:50:20,802 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10450, loss[loss=0.06332, simple_loss=0.07665, pruned_loss=0.01395, audio_tagging_loss=0.01105, over 15717.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09322, pruned_loss=0.01432, audio_tagging_loss=0.009299, over 3052286.68 frames. ], batch size: 62, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:50:24,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2314106.6666666665, ans=0.125 2023-11-23 08:50:46,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2314240.0, ans=0.125 2023-11-23 08:51:02,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347150 2023-11-23 08:51:04,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2314306.6666666665, ans=0.125 2023-11-23 08:51:23,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2314373.3333333335, ans=0.0 2023-11-23 08:51:26,482 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10500, loss[loss=0.06799, simple_loss=0.09411, pruned_loss=0.01343, audio_tagging_loss=0.007504, over 14938.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.09232, pruned_loss=0.01423, audio_tagging_loss=0.009235, over 3052124.96 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:51:41,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=12.0 2023-11-23 08:52:06,426 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347200 2023-11-23 08:52:18,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2314706.6666666665, ans=0.125 2023-11-23 08:52:29,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.399e+01 8.762e+01 9.493e+01 1.186e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-23 08:52:30,293 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10550, loss[loss=0.05497, simple_loss=0.06354, pruned_loss=0.01171, audio_tagging_loss=0.01149, over 15407.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09299, pruned_loss=0.01416, audio_tagging_loss=0.009082, over 3052464.08 frames. ], batch size: 61, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:52:41,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2314840.0, ans=0.125 2023-11-23 08:52:51,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2314840.0, ans=0.125 2023-11-23 08:53:12,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347250 2023-11-23 08:53:16,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-23 08:53:28,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2315040.0, ans=0.125 2023-11-23 08:53:33,969 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10600, loss[loss=0.06772, simple_loss=0.09227, pruned_loss=0.01342, audio_tagging_loss=0.008161, over 15548.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09258, pruned_loss=0.01398, audio_tagging_loss=0.008951, over 3048745.43 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:53:34,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2315106.6666666665, ans=0.125 2023-11-23 08:53:39,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2315106.6666666665, ans=0.0 2023-11-23 08:53:41,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=22.5 2023-11-23 08:53:54,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-23 08:54:10,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-23 08:54:11,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2315240.0, ans=0.2 2023-11-23 08:54:15,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347300 2023-11-23 08:54:16,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2315306.6666666665, ans=0.125 2023-11-23 08:54:22,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2315306.6666666665, ans=0.0 2023-11-23 08:54:28,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2315373.3333333335, ans=0.07 2023-11-23 08:54:37,202 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.247e+01 8.943e+01 9.697e+01 1.268e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-23 08:54:39,146 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10650, loss[loss=0.07664, simple_loss=0.1111, pruned_loss=0.01376, audio_tagging_loss=0.007325, over 15552.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09267, pruned_loss=0.01396, audio_tagging_loss=0.00897, over 3044457.34 frames. ], batch size: 53, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:54:52,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-23 08:54:53,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-23 08:54:57,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=2315506.6666666665, ans=12.0 2023-11-23 08:54:59,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2315506.6666666665, ans=0.07 2023-11-23 08:55:19,792 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347350 2023-11-23 08:55:20,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2315640.0, ans=0.125 2023-11-23 08:55:36,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-23 08:55:43,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2315773.3333333335, ans=10.0 2023-11-23 08:55:44,137 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10700, loss[loss=0.06167, simple_loss=0.08406, pruned_loss=0.009077, audio_tagging_loss=0.01056, over 15347.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.09401, pruned_loss=0.01435, audio_tagging_loss=0.008886, over 3040512.16 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:55:53,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2315773.3333333335, ans=0.5 2023-11-23 08:56:05,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-23 08:56:10,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2315906.6666666665, ans=0.0 2023-11-23 08:56:25,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347400 2023-11-23 08:56:32,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2315973.3333333335, ans=0.1 2023-11-23 08:56:33,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-23 08:56:37,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2316040.0, ans=0.125 2023-11-23 08:56:46,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.151e+01 8.772e+01 9.535e+01 1.207e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-23 08:56:47,778 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10750, loss[loss=0.09481, simple_loss=0.1258, pruned_loss=0.02544, audio_tagging_loss=0.006478, over 14759.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09392, pruned_loss=0.01429, audio_tagging_loss=0.008889, over 3037259.55 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 08:57:12,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2316173.3333333335, ans=0.125 2023-11-23 08:57:14,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2316240.0, ans=0.125 2023-11-23 08:57:15,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2316240.0, ans=0.125 2023-11-23 08:57:19,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-23 08:57:20,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2316240.0, ans=0.09899494936611666 2023-11-23 08:57:28,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347450 2023-11-23 08:57:35,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2316306.6666666665, ans=0.1 2023-11-23 08:57:38,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2316373.3333333335, ans=0.125 2023-11-23 08:57:44,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2316373.3333333335, ans=0.125 2023-11-23 08:57:50,641 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10800, loss[loss=0.08858, simple_loss=0.1279, pruned_loss=0.01853, audio_tagging_loss=0.006087, over 15856.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09402, pruned_loss=0.01416, audio_tagging_loss=0.008827, over 3049133.34 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:58:06,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2023-11-23 08:58:07,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2316506.6666666665, ans=0.0 2023-11-23 08:58:13,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2316506.6666666665, ans=0.125 2023-11-23 08:58:24,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2316573.3333333335, ans=0.0 2023-11-23 08:58:29,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2316640.0, ans=0.125 2023-11-23 08:58:31,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347500 2023-11-23 08:58:34,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2316640.0, ans=0.2 2023-11-23 08:58:55,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.196e+01 8.848e+01 9.514e+01 1.233e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 08:58:56,362 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10850, loss[loss=0.06133, simple_loss=0.0784, pruned_loss=0.01056, audio_tagging_loss=0.01157, over 14740.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09304, pruned_loss=0.01405, audio_tagging_loss=0.008937, over 3047089.92 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 08:59:10,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-23 08:59:17,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2316840.0, ans=0.125 2023-11-23 08:59:33,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2316973.3333333335, ans=0.125 2023-11-23 08:59:38,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347550 2023-11-23 08:59:57,026 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:00:00,715 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10900, loss[loss=0.06954, simple_loss=0.09465, pruned_loss=0.01437, audio_tagging_loss=0.007846, over 15340.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09233, pruned_loss=0.01389, audio_tagging_loss=0.009022, over 3053385.27 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 09:00:02,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2317106.6666666665, ans=0.0 2023-11-23 09:00:13,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2317173.3333333335, ans=0.0 2023-11-23 09:00:18,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2317173.3333333335, ans=0.0 2023-11-23 09:00:43,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347600 2023-11-23 09:01:04,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.420e+01 8.915e+01 9.673e+01 1.270e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 09:01:05,476 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 10950, loss[loss=0.06649, simple_loss=0.08158, pruned_loss=0.01322, audio_tagging_loss=0.01249, over 14119.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09185, pruned_loss=0.01379, audio_tagging_loss=0.009141, over 3051333.63 frames. ], batch size: 54, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 09:01:11,458 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:01:32,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-23 09:01:47,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347650 2023-11-23 09:02:00,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2317706.6666666665, ans=0.125 2023-11-23 09:02:11,973 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11000, loss[loss=0.06646, simple_loss=0.09767, pruned_loss=0.01101, audio_tagging_loss=0.006616, over 16487.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09166, pruned_loss=0.01392, audio_tagging_loss=0.009122, over 3053082.31 frames. ], batch size: 62, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:02:21,749 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:02:28,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2317840.0, ans=0.125 2023-11-23 09:02:32,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-23 09:02:53,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347700 2023-11-23 09:02:59,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2317973.3333333335, ans=0.0 2023-11-23 09:03:16,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.585e+01 9.370e+01 1.001e+02 1.222e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-23 09:03:16,130 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11050, loss[loss=0.07475, simple_loss=0.09517, pruned_loss=0.0182, audio_tagging_loss=0.008968, over 15157.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09125, pruned_loss=0.01383, audio_tagging_loss=0.009259, over 3049904.97 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:03:55,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-23 09:03:57,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347750 2023-11-23 09:04:16,054 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:04:19,446 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11100, loss[loss=0.06808, simple_loss=0.09107, pruned_loss=0.01036, audio_tagging_loss=0.01218, over 14934.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09098, pruned_loss=0.01388, audio_tagging_loss=0.009479, over 3047296.64 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:04:22,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2318440.0, ans=0.0 2023-11-23 09:04:23,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2318440.0, ans=0.0 2023-11-23 09:04:23,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2318440.0, ans=0.0 2023-11-23 09:05:01,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347800 2023-11-23 09:05:07,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2023-11-23 09:05:14,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2318706.6666666665, ans=0.2 2023-11-23 09:05:21,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2318706.6666666665, ans=0.0 2023-11-23 09:05:25,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.535e+01 9.196e+01 1.008e+02 1.260e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-23 09:05:25,417 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11150, loss[loss=0.07061, simple_loss=0.08818, pruned_loss=0.01604, audio_tagging_loss=0.01048, over 15066.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09132, pruned_loss=0.01395, audio_tagging_loss=0.009528, over 3054891.94 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:05:26,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-23 09:05:29,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2318773.3333333335, ans=0.2 2023-11-23 09:05:29,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2318773.3333333335, ans=0.125 2023-11-23 09:05:37,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2318840.0, ans=0.125 2023-11-23 09:05:42,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2023-11-23 09:05:44,628 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:05:55,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2318906.6666666665, ans=0.5 2023-11-23 09:06:05,616 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347850 2023-11-23 09:06:16,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2319040.0, ans=0.0 2023-11-23 09:06:29,315 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11200, loss[loss=0.09274, simple_loss=0.1282, pruned_loss=0.02126, audio_tagging_loss=0.007402, over 16264.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09137, pruned_loss=0.014, audio_tagging_loss=0.009599, over 3050164.43 frames. ], batch size: 60, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 09:06:40,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2319173.3333333335, ans=0.125 2023-11-23 09:06:47,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2319173.3333333335, ans=0.2 2023-11-23 09:06:54,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2319240.0, ans=0.125 2023-11-23 09:06:54,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2319240.0, ans=0.1 2023-11-23 09:07:08,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-23 09:07:09,972 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347900 2023-11-23 09:07:24,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2319373.3333333335, ans=0.0 2023-11-23 09:07:32,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.283e+01 8.930e+01 9.840e+01 1.502e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 09:07:32,534 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11250, loss[loss=0.05902, simple_loss=0.08143, pruned_loss=0.01094, audio_tagging_loss=0.007368, over 14749.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.0903, pruned_loss=0.01378, audio_tagging_loss=0.009573, over 3042925.54 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 32.0 2023-11-23 09:07:36,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2319440.0, ans=0.2 2023-11-23 09:08:13,105 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 347950 2023-11-23 09:08:17,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2023-11-23 09:08:18,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-23 09:08:26,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2319706.6666666665, ans=0.125 2023-11-23 09:08:31,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-23 09:08:34,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-23 09:08:36,522 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11300, loss[loss=0.07593, simple_loss=0.09737, pruned_loss=0.01913, audio_tagging_loss=0.008119, over 14833.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09037, pruned_loss=0.01372, audio_tagging_loss=0.009467, over 3042148.81 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:08:49,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-23 09:08:54,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-23 09:09:00,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2319906.6666666665, ans=0.125 2023-11-23 09:09:04,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2023-11-23 09:09:11,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2319906.6666666665, ans=0.0 2023-11-23 09:09:16,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348000 2023-11-23 09:09:26,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2319973.3333333335, ans=0.0 2023-11-23 09:09:42,692 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11350, loss[loss=0.05009, simple_loss=0.0674, pruned_loss=0.006963, audio_tagging_loss=0.009425, over 14857.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09161, pruned_loss=0.01386, audio_tagging_loss=0.009246, over 3050780.78 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:09:43,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.172e+01 9.052e+01 9.730e+01 1.154e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 09:09:57,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2320173.3333333335, ans=0.0 2023-11-23 09:10:02,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2320173.3333333335, ans=0.125 2023-11-23 09:10:04,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=15.0 2023-11-23 09:10:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2320240.0, ans=0.125 2023-11-23 09:10:06,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-23 09:10:17,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2320240.0, ans=0.0 2023-11-23 09:10:23,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348050 2023-11-23 09:10:38,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2320373.3333333335, ans=0.0 2023-11-23 09:10:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2320373.3333333335, ans=0.2 2023-11-23 09:10:45,525 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11400, loss[loss=0.09026, simple_loss=0.1253, pruned_loss=0.01998, audio_tagging_loss=0.007628, over 15628.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.09166, pruned_loss=0.01381, audio_tagging_loss=0.009126, over 3044454.90 frames. ], batch size: 56, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:11:10,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2320573.3333333335, ans=0.2 2023-11-23 09:11:16,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2320573.3333333335, ans=0.0 2023-11-23 09:11:26,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348100 2023-11-23 09:11:28,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2320640.0, ans=0.2 2023-11-23 09:11:31,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-23 09:11:35,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2320706.6666666665, ans=0.0 2023-11-23 09:11:43,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=22.5 2023-11-23 09:11:48,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=22.5 2023-11-23 09:11:49,201 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11450, loss[loss=0.06148, simple_loss=0.08637, pruned_loss=0.00919, audio_tagging_loss=0.009106, over 15467.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09209, pruned_loss=0.01381, audio_tagging_loss=0.009059, over 3048027.23 frames. ], batch size: 57, lr: 2.33e-03, grad_scale: 8.0 2023-11-23 09:11:52,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.173e+01 8.697e+01 9.540e+01 1.271e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-23 09:12:02,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2320840.0, ans=0.1 2023-11-23 09:12:15,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2320906.6666666665, ans=0.125 2023-11-23 09:12:20,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2023-11-23 09:12:23,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2320906.6666666665, ans=0.0 2023-11-23 09:12:29,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348150 2023-11-23 09:12:44,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2321040.0, ans=0.0 2023-11-23 09:12:52,909 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11500, loss[loss=0.07857, simple_loss=0.1085, pruned_loss=0.01632, audio_tagging_loss=0.007978, over 16219.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.0915, pruned_loss=0.0137, audio_tagging_loss=0.009054, over 3040276.45 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 8.0 2023-11-23 09:13:03,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2321106.6666666665, ans=0.2 2023-11-23 09:13:06,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2321173.3333333335, ans=0.125 2023-11-23 09:13:12,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-23 09:13:17,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2321240.0, ans=0.0 2023-11-23 09:13:17,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2321240.0, ans=0.125 2023-11-23 09:13:27,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2321240.0, ans=0.125 2023-11-23 09:13:30,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2023-11-23 09:13:32,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2321306.6666666665, ans=0.0 2023-11-23 09:13:33,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348200 2023-11-23 09:13:50,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2321373.3333333335, ans=0.0 2023-11-23 09:13:50,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2321373.3333333335, ans=0.0 2023-11-23 09:13:56,833 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11550, loss[loss=0.06851, simple_loss=0.09608, pruned_loss=0.01119, audio_tagging_loss=0.009272, over 14968.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09083, pruned_loss=0.01357, audio_tagging_loss=0.009074, over 3041895.71 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 8.0 2023-11-23 09:13:59,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.220e+01 8.786e+01 9.512e+01 1.197e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-23 09:14:20,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2321506.6666666665, ans=0.1 2023-11-23 09:14:27,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2321573.3333333335, ans=0.0 2023-11-23 09:14:35,087 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:14:37,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348250 2023-11-23 09:14:45,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2321640.0, ans=0.0 2023-11-23 09:15:00,599 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11600, loss[loss=0.06972, simple_loss=0.09762, pruned_loss=0.01242, audio_tagging_loss=0.008488, over 15708.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09149, pruned_loss=0.01363, audio_tagging_loss=0.009052, over 3041935.55 frames. ], batch size: 55, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:15:00,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2321773.3333333335, ans=0.125 2023-11-23 09:15:02,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-23 09:15:13,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2321840.0, ans=0.125 2023-11-23 09:15:16,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2321840.0, ans=0.125 2023-11-23 09:15:17,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2321840.0, ans=0.0 2023-11-23 09:15:24,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2321906.6666666665, ans=0.1 2023-11-23 09:15:41,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348300 2023-11-23 09:15:54,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2322040.0, ans=0.125 2023-11-23 09:16:04,480 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11650, loss[loss=0.0757, simple_loss=0.101, pruned_loss=0.01728, audio_tagging_loss=0.007917, over 15612.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09159, pruned_loss=0.01375, audio_tagging_loss=0.009045, over 3041026.90 frames. ], batch size: 58, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:16:06,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.255e+01 8.853e+01 9.730e+01 1.242e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 09:16:18,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2322173.3333333335, ans=0.0 2023-11-23 09:16:23,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-23 09:16:42,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-23 09:16:45,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2322306.6666666665, ans=0.125 2023-11-23 09:16:46,111 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348350 2023-11-23 09:16:53,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2322306.6666666665, ans=0.1 2023-11-23 09:16:57,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2322373.3333333335, ans=0.2 2023-11-23 09:16:59,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2322373.3333333335, ans=0.125 2023-11-23 09:17:08,255 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11700, loss[loss=0.07461, simple_loss=0.1024, pruned_loss=0.01554, audio_tagging_loss=0.007853, over 15709.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09153, pruned_loss=0.01376, audio_tagging_loss=0.009086, over 3040875.99 frames. ], batch size: 59, lr: 2.33e-03, grad_scale: 16.0 2023-11-23 09:17:10,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2322440.0, ans=0.2 2023-11-23 09:17:15,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-11-23 09:17:46,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2322640.0, ans=0.125 2023-11-23 09:17:46,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2322640.0, ans=0.1 2023-11-23 09:17:50,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348400 2023-11-23 09:18:12,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2322773.3333333335, ans=0.0 2023-11-23 09:18:13,156 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11750, loss[loss=0.08315, simple_loss=0.1148, pruned_loss=0.01726, audio_tagging_loss=0.00851, over 14415.00 frames. ], tot_loss[loss=0.0692, simple_loss=0.09239, pruned_loss=0.01391, audio_tagging_loss=0.009096, over 3040117.60 frames. ], batch size: 58, lr: 2.32e-03, grad_scale: 16.0 2023-11-23 09:18:16,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.216e+01 8.845e+01 9.440e+01 1.082e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 09:18:17,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2322773.3333333335, ans=0.0 2023-11-23 09:18:18,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2322773.3333333335, ans=0.125 2023-11-23 09:18:26,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2322840.0, ans=0.015 2023-11-23 09:18:26,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2322840.0, ans=0.125 2023-11-23 09:18:55,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348450 2023-11-23 09:18:59,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2322973.3333333335, ans=0.125 2023-11-23 09:19:11,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2323040.0, ans=0.2 2023-11-23 09:19:18,713 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11800, loss[loss=0.05853, simple_loss=0.0726, pruned_loss=0.01379, audio_tagging_loss=0.008445, over 14915.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09238, pruned_loss=0.01391, audio_tagging_loss=0.009063, over 3044960.06 frames. ], batch size: 58, lr: 2.32e-03, grad_scale: 16.0 2023-11-23 09:19:21,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2323106.6666666665, ans=0.2 2023-11-23 09:19:32,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2323173.3333333335, ans=0.125 2023-11-23 09:19:36,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2323173.3333333335, ans=0.1 2023-11-23 09:19:37,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:19:38,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2323173.3333333335, ans=10.0 2023-11-23 09:19:51,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-23 09:19:59,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348500 2023-11-23 09:20:01,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2323306.6666666665, ans=10.0 2023-11-23 09:20:15,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2323373.3333333335, ans=0.125 2023-11-23 09:20:22,058 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11850, loss[loss=0.06248, simple_loss=0.07125, pruned_loss=0.01426, audio_tagging_loss=0.01259, over 15132.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09215, pruned_loss=0.01399, audio_tagging_loss=0.009241, over 3042442.21 frames. ], batch size: 58, lr: 2.32e-03, grad_scale: 16.0 2023-11-23 09:20:23,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2323440.0, ans=0.1 2023-11-23 09:20:24,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.461e+01 9.109e+01 9.788e+01 1.263e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-23 09:20:42,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2323506.6666666665, ans=0.2 2023-11-23 09:20:58,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2323573.3333333335, ans=0.0 2023-11-23 09:21:03,615 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348550 2023-11-23 09:21:22,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2323706.6666666665, ans=0.125 2023-11-23 09:21:25,833 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11900, loss[loss=0.0807, simple_loss=0.1076, pruned_loss=0.01679, audio_tagging_loss=0.01009, over 15792.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09147, pruned_loss=0.01386, audio_tagging_loss=0.009429, over 3047890.11 frames. ], batch size: 60, lr: 2.32e-03, grad_scale: 16.0 2023-11-23 09:21:34,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2023-11-23 09:21:39,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2323840.0, ans=0.0 2023-11-23 09:21:41,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2323840.0, ans=6.0 2023-11-23 09:21:51,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=22.5 2023-11-23 09:21:54,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2023-11-23 09:21:55,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2323906.6666666665, ans=0.1 2023-11-23 09:22:06,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348600 2023-11-23 09:22:12,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2323973.3333333335, ans=0.2 2023-11-23 09:22:31,772 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 11950, loss[loss=0.07252, simple_loss=0.09574, pruned_loss=0.01594, audio_tagging_loss=0.008713, over 15652.00 frames. ], tot_loss[loss=0.06916, simple_loss=0.09149, pruned_loss=0.01394, audio_tagging_loss=0.009476, over 3055022.03 frames. ], batch size: 57, lr: 2.32e-03, grad_scale: 16.0 2023-11-23 09:22:34,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.390e+01 8.982e+01 9.534e+01 1.652e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 09:22:37,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-23 09:22:40,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2324106.6666666665, ans=0.125 2023-11-23 09:23:11,111 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348650 2023-11-23 09:23:17,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2324306.6666666665, ans=0.0 2023-11-23 09:23:32,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2324440.0, ans=0.125 2023-11-23 09:23:33,671 INFO [train_asr.py:1221] (1/4) Epoch 29, batch 12000, loss[loss=0.05986, simple_loss=0.08162, pruned_loss=0.01042, audio_tagging_loss=0.008633, over 15480.00 frames. ], tot_loss[loss=0.06911, simple_loss=0.09159, pruned_loss=0.0138, audio_tagging_loss=0.009516, over 3051238.79 frames. ], batch size: 57, lr: 2.32e-03, grad_scale: 32.0 2023-11-23 09:23:33,672 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 09:24:15,948 INFO [train_asr.py:1253] (1/4) Epoch 29, validation: loss=0.05844, simple_loss=0.05118, pruned_loss=0.005114, audio_tagging_loss=0.02774, over 4681554.00 frames. 2023-11-23 09:24:15,949 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 09:25:23,496 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 0, loss[loss=0.08231, simple_loss=0.09201, pruned_loss=0.01564, audio_tagging_loss=0.02067, over 15163.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.09201, pruned_loss=0.01564, audio_tagging_loss=0.02067, over 15163.00 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:25:23,497 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 09:26:02,080 INFO [train_asr.py:1253] (1/4) Epoch 30, validation: loss=0.05824, simple_loss=0.05113, pruned_loss=0.005061, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-23 09:26:02,081 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 09:26:09,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2324600.0, ans=0.125 2023-11-23 09:26:11,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348700 2023-11-23 09:26:18,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-23 09:26:35,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.954e+01 9.660e+01 1.053e+02 1.291e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-23 09:26:49,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2324800.0, ans=0.09899494936611666 2023-11-23 09:27:01,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2324866.6666666665, ans=0.125 2023-11-23 09:27:03,259 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:27:05,460 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 50, loss[loss=0.08845, simple_loss=0.1034, pruned_loss=0.01671, audio_tagging_loss=0.02004, over 14345.00 frames. ], tot_loss[loss=0.07662, simple_loss=0.08928, pruned_loss=0.01391, audio_tagging_loss=0.01807, over 680406.59 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:27:12,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2023-11-23 09:27:15,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348750 2023-11-23 09:27:16,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2325000.0, ans=0.125 2023-11-23 09:27:20,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2325000.0, ans=0.035 2023-11-23 09:27:26,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-23 09:27:28,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2325000.0, ans=0.125 2023-11-23 09:27:34,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2325066.6666666665, ans=0.125 2023-11-23 09:27:35,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-23 09:27:40,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2325066.6666666665, ans=0.125 2023-11-23 09:27:57,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2325200.0, ans=0.2 2023-11-23 09:28:08,352 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 100, loss[loss=0.0626, simple_loss=0.07079, pruned_loss=0.01131, audio_tagging_loss=0.0159, over 14629.00 frames. ], tot_loss[loss=0.07555, simple_loss=0.08949, pruned_loss=0.01351, audio_tagging_loss=0.0173, over 1208394.69 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:28:18,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348800 2023-11-23 09:28:46,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.985e+01 9.565e+01 1.028e+02 2.272e+02, threshold=1.913e+02, percent-clipped=1.0 2023-11-23 09:29:04,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-23 09:29:05,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2325533.3333333335, ans=0.125 2023-11-23 09:29:13,426 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 150, loss[loss=0.08242, simple_loss=0.1106, pruned_loss=0.01747, audio_tagging_loss=0.009657, over 15735.00 frames. ], tot_loss[loss=0.07399, simple_loss=0.09065, pruned_loss=0.01332, audio_tagging_loss=0.01535, over 1618938.71 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:29:13,758 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:29:23,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2325600.0, ans=0.125 2023-11-23 09:29:24,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348850 2023-11-23 09:29:33,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2325666.6666666665, ans=0.125 2023-11-23 09:29:39,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2325733.3333333335, ans=0.1 2023-11-23 09:29:45,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2325733.3333333335, ans=0.1 2023-11-23 09:29:54,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2325800.0, ans=0.125 2023-11-23 09:29:58,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2325800.0, ans=10.0 2023-11-23 09:30:05,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2325866.6666666665, ans=0.125 2023-11-23 09:30:18,418 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 200, loss[loss=0.08783, simple_loss=0.1149, pruned_loss=0.01987, audio_tagging_loss=0.01049, over 15653.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.09216, pruned_loss=0.01376, audio_tagging_loss=0.0136, over 1941432.82 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:30:28,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348900 2023-11-23 09:30:55,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.392e+01 9.007e+01 9.652e+01 1.196e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 09:30:59,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2326133.3333333335, ans=0.0 2023-11-23 09:31:01,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2326133.3333333335, ans=0.125 2023-11-23 09:31:03,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2326133.3333333335, ans=0.125 2023-11-23 09:31:21,699 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 250, loss[loss=0.06791, simple_loss=0.08561, pruned_loss=0.01357, audio_tagging_loss=0.01154, over 15185.00 frames. ], tot_loss[loss=0.07229, simple_loss=0.09211, pruned_loss=0.01394, audio_tagging_loss=0.01229, over 2190318.66 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:31:31,592 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 348950 2023-11-23 09:31:46,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2326400.0, ans=0.0 2023-11-23 09:31:47,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-23 09:31:52,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2326400.0, ans=0.2 2023-11-23 09:32:07,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-23 09:32:20,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2326533.3333333335, ans=0.0 2023-11-23 09:32:23,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2326533.3333333335, ans=0.125 2023-11-23 09:32:26,079 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 300, loss[loss=0.07767, simple_loss=0.1082, pruned_loss=0.01451, audio_tagging_loss=0.009063, over 16456.00 frames. ], tot_loss[loss=0.07189, simple_loss=0.09297, pruned_loss=0.01404, audio_tagging_loss=0.01137, over 2380423.75 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:32:34,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2326600.0, ans=0.2 2023-11-23 09:32:36,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349000 2023-11-23 09:32:38,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2326666.6666666665, ans=0.125 2023-11-23 09:32:52,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2326733.3333333335, ans=0.1 2023-11-23 09:32:54,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2326733.3333333335, ans=0.0 2023-11-23 09:32:58,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2326733.3333333335, ans=0.1 2023-11-23 09:33:02,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.407e+01 8.907e+01 9.577e+01 1.241e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 09:33:30,927 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 350, loss[loss=0.06011, simple_loss=0.07652, pruned_loss=0.01055, audio_tagging_loss=0.0113, over 14849.00 frames. ], tot_loss[loss=0.07091, simple_loss=0.09246, pruned_loss=0.01397, audio_tagging_loss=0.01071, over 2529342.01 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 09:33:40,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349050 2023-11-23 09:33:43,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2327000.0, ans=0.125 2023-11-23 09:33:49,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2327000.0, ans=0.125 2023-11-23 09:34:02,726 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:34:03,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-11-23 09:34:29,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2327200.0, ans=0.125 2023-11-23 09:34:34,493 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 400, loss[loss=0.07054, simple_loss=0.09664, pruned_loss=0.01296, audio_tagging_loss=0.009262, over 15771.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09313, pruned_loss=0.01405, audio_tagging_loss=0.01035, over 2651470.76 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:34:44,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349100 2023-11-23 09:34:58,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-23 09:35:00,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2327400.0, ans=0.125 2023-11-23 09:35:05,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2327400.0, ans=0.125 2023-11-23 09:35:13,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.415e+01 9.145e+01 9.820e+01 1.379e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-23 09:35:30,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2327533.3333333335, ans=0.125 2023-11-23 09:35:33,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2327533.3333333335, ans=0.0 2023-11-23 09:35:39,511 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 450, loss[loss=0.06933, simple_loss=0.09299, pruned_loss=0.01406, audio_tagging_loss=0.008778, over 15095.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09167, pruned_loss=0.01379, audio_tagging_loss=0.01018, over 2735972.65 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:35:47,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2327600.0, ans=0.125 2023-11-23 09:35:50,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349150 2023-11-23 09:36:06,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2327733.3333333335, ans=0.125 2023-11-23 09:36:43,352 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 500, loss[loss=0.07293, simple_loss=0.09046, pruned_loss=0.01853, audio_tagging_loss=0.009169, over 14670.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09227, pruned_loss=0.01395, audio_tagging_loss=0.009804, over 2808844.50 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:36:50,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2327933.3333333335, ans=0.07 2023-11-23 09:36:53,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349200 2023-11-23 09:37:05,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-11-23 09:37:07,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2328066.6666666665, ans=0.125 2023-11-23 09:37:22,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.359e+01 8.809e+01 9.526e+01 1.262e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-23 09:37:30,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2328133.3333333335, ans=0.2 2023-11-23 09:37:31,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2328133.3333333335, ans=0.1 2023-11-23 09:37:48,446 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 550, loss[loss=0.05955, simple_loss=0.0819, pruned_loss=0.01103, audio_tagging_loss=0.007566, over 15843.00 frames. ], tot_loss[loss=0.06964, simple_loss=0.09208, pruned_loss=0.0139, audio_tagging_loss=0.009701, over 2866890.89 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:37:58,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349250 2023-11-23 09:38:08,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2328333.3333333335, ans=0.0 2023-11-23 09:38:18,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2328400.0, ans=0.0 2023-11-23 09:38:19,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-23 09:38:23,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2328400.0, ans=0.125 2023-11-23 09:38:32,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2328466.6666666665, ans=6.0 2023-11-23 09:38:52,688 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 600, loss[loss=0.07344, simple_loss=0.09783, pruned_loss=0.01514, audio_tagging_loss=0.00938, over 16087.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.092, pruned_loss=0.01382, audio_tagging_loss=0.009672, over 2906986.40 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:39:03,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349300 2023-11-23 09:39:11,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2328666.6666666665, ans=0.125 2023-11-23 09:39:11,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-23 09:39:18,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2328733.3333333335, ans=0.0 2023-11-23 09:39:31,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 8.142e+01 8.700e+01 9.475e+01 1.259e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-23 09:39:45,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2328866.6666666665, ans=0.0 2023-11-23 09:39:52,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2328866.6666666665, ans=0.125 2023-11-23 09:39:57,385 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 650, loss[loss=0.08403, simple_loss=0.1113, pruned_loss=0.02078, audio_tagging_loss=0.007594, over 15520.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09224, pruned_loss=0.01392, audio_tagging_loss=0.009614, over 2936733.51 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:39:57,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-23 09:40:05,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2328933.3333333335, ans=0.0 2023-11-23 09:40:07,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349350 2023-11-23 09:40:07,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2328933.3333333335, ans=0.2 2023-11-23 09:40:17,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2329000.0, ans=0.0 2023-11-23 09:40:27,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2329066.6666666665, ans=0.125 2023-11-23 09:40:36,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2329133.3333333335, ans=0.125 2023-11-23 09:40:41,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2329133.3333333335, ans=0.1 2023-11-23 09:40:59,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2329200.0, ans=0.2 2023-11-23 09:40:59,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-23 09:41:02,194 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 700, loss[loss=0.07477, simple_loss=0.09637, pruned_loss=0.01768, audio_tagging_loss=0.008904, over 14331.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09241, pruned_loss=0.01399, audio_tagging_loss=0.009545, over 2961431.92 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:41:06,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2329266.6666666665, ans=0.07 2023-11-23 09:41:12,070 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349400 2023-11-23 09:41:15,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2329333.3333333335, ans=0.125 2023-11-23 09:41:22,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2329333.3333333335, ans=0.0 2023-11-23 09:41:35,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2329400.0, ans=0.025 2023-11-23 09:41:36,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2023-11-23 09:41:40,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2329466.6666666665, ans=0.0 2023-11-23 09:41:41,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.086e+01 8.675e+01 9.792e+01 1.198e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-23 09:41:46,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-23 09:41:47,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2329466.6666666665, ans=0.09899494936611666 2023-11-23 09:41:51,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2329466.6666666665, ans=0.125 2023-11-23 09:41:51,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2329466.6666666665, ans=0.125 2023-11-23 09:41:57,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2329533.3333333335, ans=0.125 2023-11-23 09:42:06,207 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 750, loss[loss=0.07922, simple_loss=0.101, pruned_loss=0.01824, audio_tagging_loss=0.01047, over 14713.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09279, pruned_loss=0.01412, audio_tagging_loss=0.009591, over 2985471.53 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:42:17,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349450 2023-11-23 09:42:23,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-23 09:43:05,546 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:43:12,046 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 800, loss[loss=0.04696, simple_loss=0.05111, pruned_loss=0.009474, audio_tagging_loss=0.01193, over 16023.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09244, pruned_loss=0.01409, audio_tagging_loss=0.00963, over 2998305.62 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:43:18,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2329933.3333333335, ans=0.125 2023-11-23 09:43:21,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349500 2023-11-23 09:43:25,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-23 09:43:38,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-23 09:43:42,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2330066.6666666665, ans=0.125 2023-11-23 09:43:50,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.139e+01 8.858e+01 9.489e+01 1.259e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 09:43:52,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2330133.3333333335, ans=0.04949747468305833 2023-11-23 09:44:09,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-23 09:44:10,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2330200.0, ans=0.125 2023-11-23 09:44:15,728 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 850, loss[loss=0.06336, simple_loss=0.07846, pruned_loss=0.0125, audio_tagging_loss=0.01163, over 16334.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09182, pruned_loss=0.01394, audio_tagging_loss=0.00957, over 3014932.64 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:44:18,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-23 09:44:19,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2330266.6666666665, ans=0.125 2023-11-23 09:44:26,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349550 2023-11-23 09:44:43,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2330400.0, ans=0.2 2023-11-23 09:44:51,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2023-11-23 09:45:02,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2330466.6666666665, ans=0.0 2023-11-23 09:45:03,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2330466.6666666665, ans=0.2 2023-11-23 09:45:09,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2330533.3333333335, ans=0.125 2023-11-23 09:45:09,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2330533.3333333335, ans=0.125 2023-11-23 09:45:19,612 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 900, loss[loss=0.06961, simple_loss=0.09159, pruned_loss=0.0155, audio_tagging_loss=0.008315, over 15314.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09184, pruned_loss=0.01412, audio_tagging_loss=0.009646, over 3021577.34 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:45:22,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-23 09:45:30,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349600 2023-11-23 09:45:41,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2330666.6666666665, ans=0.125 2023-11-23 09:45:45,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:45:48,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2330733.3333333335, ans=0.2 2023-11-23 09:45:51,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-23 09:45:52,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2330733.3333333335, ans=0.125 2023-11-23 09:45:58,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.328e+01 8.818e+01 9.334e+01 1.944e+02, threshold=1.764e+02, percent-clipped=1.0 2023-11-23 09:46:02,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2330800.0, ans=0.125 2023-11-23 09:46:09,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2330866.6666666665, ans=0.125 2023-11-23 09:46:24,191 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 950, loss[loss=0.07403, simple_loss=0.1057, pruned_loss=0.01445, audio_tagging_loss=0.00673, over 15169.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.09224, pruned_loss=0.01419, audio_tagging_loss=0.00938, over 3031409.89 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:46:34,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349650 2023-11-23 09:46:35,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2331000.0, ans=0.125 2023-11-23 09:46:35,959 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:46:44,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-23 09:47:07,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2331133.3333333335, ans=0.125 2023-11-23 09:47:28,202 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1000, loss[loss=0.07575, simple_loss=0.1024, pruned_loss=0.01917, audio_tagging_loss=0.005392, over 15280.00 frames. ], tot_loss[loss=0.07057, simple_loss=0.09394, pruned_loss=0.01447, audio_tagging_loss=0.009134, over 3035804.09 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:47:38,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349700 2023-11-23 09:47:39,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-23 09:47:56,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-23 09:47:57,319 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:48:07,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 8.244e+01 9.173e+01 9.794e+01 1.161e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-23 09:48:11,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-23 09:48:28,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-23 09:48:32,083 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1050, loss[loss=0.06706, simple_loss=0.08582, pruned_loss=0.01295, audio_tagging_loss=0.01119, over 14683.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09337, pruned_loss=0.01438, audio_tagging_loss=0.009128, over 3035525.22 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:48:43,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349750 2023-11-23 09:48:59,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2331733.3333333335, ans=0.04949747468305833 2023-11-23 09:49:09,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2331733.3333333335, ans=0.2 2023-11-23 09:49:13,094 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:49:26,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2331866.6666666665, ans=0.0 2023-11-23 09:49:28,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2331866.6666666665, ans=0.125 2023-11-23 09:49:30,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2331866.6666666665, ans=22.5 2023-11-23 09:49:34,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2331866.6666666665, ans=0.2 2023-11-23 09:49:37,732 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1100, loss[loss=0.06741, simple_loss=0.09317, pruned_loss=0.01413, audio_tagging_loss=0.006698, over 15483.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09229, pruned_loss=0.01414, audio_tagging_loss=0.009119, over 3037507.54 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:49:41,462 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:49:48,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349800 2023-11-23 09:49:54,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2332000.0, ans=0.125 2023-11-23 09:50:12,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2332066.6666666665, ans=0.0 2023-11-23 09:50:15,122 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.049e+01 8.629e+01 9.475e+01 1.161e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-23 09:50:17,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2332133.3333333335, ans=0.125 2023-11-23 09:50:21,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2332133.3333333335, ans=0.0 2023-11-23 09:50:34,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2332200.0, ans=0.125 2023-11-23 09:50:37,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2332200.0, ans=0.2 2023-11-23 09:50:42,001 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1150, loss[loss=0.06945, simple_loss=0.09547, pruned_loss=0.01354, audio_tagging_loss=0.008178, over 16098.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09172, pruned_loss=0.01397, audio_tagging_loss=0.009134, over 3036524.61 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:50:52,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349850 2023-11-23 09:50:58,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-23 09:50:59,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2332333.3333333335, ans=0.1 2023-11-23 09:51:14,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2332400.0, ans=22.5 2023-11-23 09:51:45,108 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1200, loss[loss=0.06825, simple_loss=0.09307, pruned_loss=0.01072, audio_tagging_loss=0.01099, over 14246.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09274, pruned_loss=0.01396, audio_tagging_loss=0.009112, over 3043853.71 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:51:54,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349900 2023-11-23 09:52:07,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2332666.6666666665, ans=0.0 2023-11-23 09:52:08,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2332666.6666666665, ans=0.0 2023-11-23 09:52:12,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2332733.3333333335, ans=0.125 2023-11-23 09:52:24,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.340e+01 9.039e+01 9.535e+01 1.288e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-23 09:52:29,844 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 09:52:38,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2332866.6666666665, ans=0.1 2023-11-23 09:52:40,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2332866.6666666665, ans=0.125 2023-11-23 09:52:41,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2023-11-23 09:52:47,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-23 09:52:49,374 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1250, loss[loss=0.08817, simple_loss=0.1313, pruned_loss=0.01769, audio_tagging_loss=0.004811, over 15259.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.0921, pruned_loss=0.01384, audio_tagging_loss=0.009088, over 3043614.95 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:52:59,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 349950 2023-11-23 09:53:05,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2333000.0, ans=0.125 2023-11-23 09:53:24,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2333066.6666666665, ans=0.0 2023-11-23 09:53:24,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-11-23 09:53:36,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2333133.3333333335, ans=0.125 2023-11-23 09:53:38,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2333133.3333333335, ans=0.125 2023-11-23 09:53:43,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2333200.0, ans=0.125 2023-11-23 09:53:52,862 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1300, loss[loss=0.05872, simple_loss=0.07716, pruned_loss=0.01372, audio_tagging_loss=0.00642, over 14355.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09248, pruned_loss=0.01398, audio_tagging_loss=0.00901, over 3045345.15 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:53:56,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2333266.6666666665, ans=0.0 2023-11-23 09:54:03,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350000 2023-11-23 09:54:04,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2333333.3333333335, ans=0.0 2023-11-23 09:54:06,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2333333.3333333335, ans=0.125 2023-11-23 09:54:13,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2023-11-23 09:54:32,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.017e+01 7.999e+01 8.654e+01 9.564e+01 1.101e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-23 09:54:52,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2333533.3333333335, ans=0.125 2023-11-23 09:54:56,652 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1350, loss[loss=0.06659, simple_loss=0.09462, pruned_loss=0.01065, audio_tagging_loss=0.008634, over 16403.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09186, pruned_loss=0.01378, audio_tagging_loss=0.009131, over 3046493.84 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:55:01,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2333600.0, ans=0.125 2023-11-23 09:55:06,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350050 2023-11-23 09:55:35,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2333800.0, ans=0.125 2023-11-23 09:55:43,881 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 09:56:00,484 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1400, loss[loss=0.07337, simple_loss=0.09952, pruned_loss=0.01496, audio_tagging_loss=0.008644, over 15752.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09216, pruned_loss=0.01382, audio_tagging_loss=0.00914, over 3044914.14 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:56:10,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2333933.3333333335, ans=0.125 2023-11-23 09:56:11,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350100 2023-11-23 09:56:15,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2334000.0, ans=0.125 2023-11-23 09:56:20,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2334000.0, ans=0.125 2023-11-23 09:56:22,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2334000.0, ans=0.125 2023-11-23 09:56:39,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.829e+01 8.084e+01 8.982e+01 9.535e+01 1.222e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 09:56:40,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2334133.3333333335, ans=0.09899494936611666 2023-11-23 09:56:41,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2334133.3333333335, ans=0.09899494936611666 2023-11-23 09:56:42,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-23 09:57:04,683 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1450, loss[loss=0.09431, simple_loss=0.1218, pruned_loss=0.02545, audio_tagging_loss=0.007956, over 15860.00 frames. ], tot_loss[loss=0.06971, simple_loss=0.09279, pruned_loss=0.01411, audio_tagging_loss=0.009209, over 3046095.18 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 09:57:06,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2334266.6666666665, ans=6.0 2023-11-23 09:57:14,475 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350150 2023-11-23 09:57:43,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2334466.6666666665, ans=0.125 2023-11-23 09:57:47,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-23 09:58:04,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-23 09:58:06,384 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1500, loss[loss=0.07881, simple_loss=0.1098, pruned_loss=0.01596, audio_tagging_loss=0.007945, over 14999.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09185, pruned_loss=0.01389, audio_tagging_loss=0.009266, over 3045306.80 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 09:58:16,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350200 2023-11-23 09:58:20,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2334666.6666666665, ans=0.1 2023-11-23 09:58:45,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2334800.0, ans=0.1 2023-11-23 09:58:45,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2334800.0, ans=0.5 2023-11-23 09:58:46,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2334800.0, ans=0.0 2023-11-23 09:58:48,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2334800.0, ans=0.0 2023-11-23 09:58:48,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.399e+01 9.056e+01 9.468e+01 1.678e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 09:59:06,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2334866.6666666665, ans=0.125 2023-11-23 09:59:09,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2023-11-23 09:59:10,059 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1550, loss[loss=0.0718, simple_loss=0.07158, pruned_loss=0.02116, audio_tagging_loss=0.01485, over 14024.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09183, pruned_loss=0.01402, audio_tagging_loss=0.009328, over 3033942.57 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 09:59:10,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2334933.3333333335, ans=0.0 2023-11-23 09:59:19,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2334933.3333333335, ans=0.125 2023-11-23 09:59:20,620 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350250 2023-11-23 09:59:45,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2335066.6666666665, ans=0.05 2023-11-23 10:00:05,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2335200.0, ans=0.125 2023-11-23 10:00:14,397 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1600, loss[loss=0.07832, simple_loss=0.1028, pruned_loss=0.0154, audio_tagging_loss=0.01151, over 14982.00 frames. ], tot_loss[loss=0.06973, simple_loss=0.09251, pruned_loss=0.01413, audio_tagging_loss=0.009346, over 3044273.44 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:00:24,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350300 2023-11-23 10:00:32,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2335333.3333333335, ans=0.125 2023-11-23 10:00:33,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2335333.3333333335, ans=0.2 2023-11-23 10:00:43,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2335400.0, ans=0.125 2023-11-23 10:00:55,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.176e+01 8.486e+01 9.012e+01 9.974e+01 1.250e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 10:00:55,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2335466.6666666665, ans=0.125 2023-11-23 10:01:03,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2335533.3333333335, ans=0.0 2023-11-23 10:01:15,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2335533.3333333335, ans=0.0 2023-11-23 10:01:17,406 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1650, loss[loss=0.08848, simple_loss=0.1239, pruned_loss=0.01852, audio_tagging_loss=0.00802, over 15441.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09253, pruned_loss=0.01404, audio_tagging_loss=0.009357, over 3047385.59 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:01:27,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350350 2023-11-23 10:01:36,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2335666.6666666665, ans=0.1 2023-11-23 10:01:39,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=22.5 2023-11-23 10:01:55,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2023-11-23 10:02:00,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2335800.0, ans=0.1 2023-11-23 10:02:21,298 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1700, loss[loss=0.06292, simple_loss=0.08551, pruned_loss=0.01217, audio_tagging_loss=0.007986, over 16333.00 frames. ], tot_loss[loss=0.06969, simple_loss=0.0926, pruned_loss=0.01409, audio_tagging_loss=0.009304, over 3047845.35 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:02:27,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2335933.3333333335, ans=0.125 2023-11-23 10:02:31,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350400 2023-11-23 10:02:34,913 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 10:02:43,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2336000.0, ans=0.125 2023-11-23 10:03:04,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.246e+01 8.760e+01 9.410e+01 1.135e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-23 10:03:07,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2336133.3333333335, ans=0.125 2023-11-23 10:03:17,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2336200.0, ans=0.125 2023-11-23 10:03:25,691 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1750, loss[loss=0.05879, simple_loss=0.07997, pruned_loss=0.007758, audio_tagging_loss=0.01105, over 15599.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09209, pruned_loss=0.01407, audio_tagging_loss=0.009283, over 3047389.73 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:03:32,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2023-11-23 10:03:36,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350450 2023-11-23 10:03:59,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-23 10:04:04,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2336466.6666666665, ans=0.05 2023-11-23 10:04:07,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2336466.6666666665, ans=0.0 2023-11-23 10:04:24,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2336533.3333333335, ans=0.1 2023-11-23 10:04:31,142 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1800, loss[loss=0.07103, simple_loss=0.09593, pruned_loss=0.01569, audio_tagging_loss=0.007371, over 14032.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09109, pruned_loss=0.01378, audio_tagging_loss=0.009175, over 3047118.28 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:04:32,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2336600.0, ans=0.0 2023-11-23 10:04:41,016 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350500 2023-11-23 10:04:56,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2336733.3333333335, ans=0.125 2023-11-23 10:05:02,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2336733.3333333335, ans=0.1 2023-11-23 10:05:14,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2336800.0, ans=0.0 2023-11-23 10:05:14,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2336800.0, ans=0.125 2023-11-23 10:05:15,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.590e+01 9.171e+01 9.752e+01 1.397e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-23 10:05:15,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2336800.0, ans=0.125 2023-11-23 10:05:35,398 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1850, loss[loss=0.08965, simple_loss=0.1191, pruned_loss=0.01579, audio_tagging_loss=0.01428, over 15060.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09108, pruned_loss=0.01366, audio_tagging_loss=0.009138, over 3044380.42 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:05:37,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-23 10:05:42,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2336933.3333333335, ans=0.125 2023-11-23 10:05:46,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350550 2023-11-23 10:05:46,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2336933.3333333335, ans=0.1 2023-11-23 10:05:52,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2337000.0, ans=0.025 2023-11-23 10:05:59,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2337000.0, ans=0.5 2023-11-23 10:06:18,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-23 10:06:23,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2337133.3333333335, ans=0.125 2023-11-23 10:06:25,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2337200.0, ans=0.04949747468305833 2023-11-23 10:06:38,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2337266.6666666665, ans=0.125 2023-11-23 10:06:39,832 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1900, loss[loss=0.07546, simple_loss=0.1051, pruned_loss=0.01523, audio_tagging_loss=0.00768, over 14200.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09206, pruned_loss=0.01383, audio_tagging_loss=0.009088, over 3049639.26 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:06:42,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2337266.6666666665, ans=10.0 2023-11-23 10:06:48,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-23 10:06:50,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350600 2023-11-23 10:06:51,749 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 10:07:06,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2337400.0, ans=0.125 2023-11-23 10:07:13,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2337400.0, ans=0.2 2023-11-23 10:07:24,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.255e+01 9.028e+01 9.866e+01 1.250e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-23 10:07:27,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-23 10:07:41,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2337533.3333333335, ans=0.0 2023-11-23 10:07:45,188 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 1950, loss[loss=0.0631, simple_loss=0.08561, pruned_loss=0.01018, audio_tagging_loss=0.01011, over 15412.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09199, pruned_loss=0.01387, audio_tagging_loss=0.009075, over 3043805.57 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 8.0 2023-11-23 10:07:46,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2337600.0, ans=0.2 2023-11-23 10:07:55,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350650 2023-11-23 10:08:13,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2337733.3333333335, ans=0.09899494936611666 2023-11-23 10:08:15,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2337733.3333333335, ans=0.07 2023-11-23 10:08:38,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2023-11-23 10:08:51,211 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2000, loss[loss=0.05646, simple_loss=0.07509, pruned_loss=0.01134, audio_tagging_loss=0.007574, over 15128.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.08997, pruned_loss=0.01351, audio_tagging_loss=0.009172, over 3040257.44 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:08:54,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2337933.3333333335, ans=0.125 2023-11-23 10:09:01,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-11-23 10:09:02,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350700 2023-11-23 10:09:26,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2338066.6666666665, ans=0.125 2023-11-23 10:09:28,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2338066.6666666665, ans=0.0 2023-11-23 10:09:35,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.268e+01 8.767e+01 9.455e+01 1.201e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-23 10:09:57,055 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2050, loss[loss=0.06031, simple_loss=0.08129, pruned_loss=0.009712, audio_tagging_loss=0.009952, over 15241.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09127, pruned_loss=0.01372, audio_tagging_loss=0.009054, over 3041976.13 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:10:07,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350750 2023-11-23 10:10:13,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-23 10:10:20,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2338333.3333333335, ans=0.125 2023-11-23 10:10:31,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2338400.0, ans=0.125 2023-11-23 10:10:47,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-23 10:10:51,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2338533.3333333335, ans=0.125 2023-11-23 10:10:59,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2338533.3333333335, ans=0.0 2023-11-23 10:11:01,819 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2100, loss[loss=0.06222, simple_loss=0.09147, pruned_loss=0.01004, audio_tagging_loss=0.006444, over 14648.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09118, pruned_loss=0.01376, audio_tagging_loss=0.008966, over 3038703.37 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:11:11,723 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350800 2023-11-23 10:11:13,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2338666.6666666665, ans=0.125 2023-11-23 10:11:18,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-11-23 10:11:46,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.232e+01 8.810e+01 9.514e+01 1.231e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-23 10:11:59,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2338866.6666666665, ans=0.125 2023-11-23 10:12:06,508 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2150, loss[loss=0.08293, simple_loss=0.1101, pruned_loss=0.0195, audio_tagging_loss=0.008389, over 15627.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09131, pruned_loss=0.01369, audio_tagging_loss=0.008984, over 3044225.59 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:12:17,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350850 2023-11-23 10:12:23,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2339000.0, ans=0.2 2023-11-23 10:12:28,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2339000.0, ans=0.125 2023-11-23 10:12:47,103 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:12:59,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2339200.0, ans=0.125 2023-11-23 10:13:12,307 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2200, loss[loss=0.08541, simple_loss=0.1158, pruned_loss=0.01732, audio_tagging_loss=0.01019, over 15047.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09214, pruned_loss=0.01389, audio_tagging_loss=0.009121, over 3044907.82 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:13:22,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350900 2023-11-23 10:13:23,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2339266.6666666665, ans=22.5 2023-11-23 10:13:36,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2339400.0, ans=0.1 2023-11-23 10:13:41,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2339400.0, ans=0.2 2023-11-23 10:13:55,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.264e+01 8.958e+01 9.575e+01 1.152e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 10:13:57,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2339466.6666666665, ans=0.0 2023-11-23 10:14:03,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2339533.3333333335, ans=0.2 2023-11-23 10:14:06,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2339533.3333333335, ans=0.1 2023-11-23 10:14:17,026 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2250, loss[loss=0.05298, simple_loss=0.07027, pruned_loss=0.008533, audio_tagging_loss=0.00931, over 14758.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.0915, pruned_loss=0.01369, audio_tagging_loss=0.009143, over 3044981.16 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:14:26,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 350950 2023-11-23 10:14:31,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=15.0 2023-11-23 10:14:40,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2339733.3333333335, ans=0.125 2023-11-23 10:15:15,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2023-11-23 10:15:21,423 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2300, loss[loss=0.08033, simple_loss=0.1124, pruned_loss=0.0149, audio_tagging_loss=0.00923, over 15116.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.0921, pruned_loss=0.01382, audio_tagging_loss=0.009189, over 3038819.06 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:15:25,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2339933.3333333335, ans=0.0 2023-11-23 10:15:26,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2339933.3333333335, ans=0.2 2023-11-23 10:15:31,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351000 2023-11-23 10:15:36,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2340000.0, ans=0.0 2023-11-23 10:15:39,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2340000.0, ans=0.125 2023-11-23 10:15:50,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-23 10:15:50,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2340066.6666666665, ans=0.1 2023-11-23 10:15:57,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2340066.6666666665, ans=0.125 2023-11-23 10:15:58,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2340066.6666666665, ans=0.0 2023-11-23 10:16:02,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2340133.3333333335, ans=0.125 2023-11-23 10:16:06,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.543e+01 9.090e+01 9.740e+01 1.795e+02, threshold=1.818e+02, percent-clipped=1.0 2023-11-23 10:16:11,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2340133.3333333335, ans=0.0 2023-11-23 10:16:19,857 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:16:26,858 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2350, loss[loss=0.06831, simple_loss=0.09652, pruned_loss=0.01118, audio_tagging_loss=0.008869, over 15157.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09182, pruned_loss=0.01377, audio_tagging_loss=0.009228, over 3038415.88 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:16:38,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351050 2023-11-23 10:16:39,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-23 10:16:43,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-23 10:16:49,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-11-23 10:17:02,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2340400.0, ans=0.0 2023-11-23 10:17:18,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2340533.3333333335, ans=0.125 2023-11-23 10:17:29,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.42 vs. limit=10.0 2023-11-23 10:17:32,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2340600.0, ans=0.125 2023-11-23 10:17:32,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-11-23 10:17:33,010 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2400, loss[loss=0.05953, simple_loss=0.0797, pruned_loss=0.01139, audio_tagging_loss=0.008291, over 14986.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09087, pruned_loss=0.01372, audio_tagging_loss=0.009332, over 3034448.81 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:17:38,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2340600.0, ans=0.125 2023-11-23 10:17:39,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2340600.0, ans=0.125 2023-11-23 10:17:40,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-23 10:17:42,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351100 2023-11-23 10:17:56,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2340733.3333333335, ans=0.1 2023-11-23 10:17:57,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2340733.3333333335, ans=0.0 2023-11-23 10:18:11,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2340800.0, ans=0.2 2023-11-23 10:18:12,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2340800.0, ans=0.0 2023-11-23 10:18:15,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.630e+01 8.266e+01 8.725e+01 9.443e+01 1.197e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-23 10:18:20,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2340800.0, ans=0.0 2023-11-23 10:18:33,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2023-11-23 10:18:36,480 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2450, loss[loss=0.07762, simple_loss=0.1002, pruned_loss=0.01751, audio_tagging_loss=0.01001, over 15960.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09165, pruned_loss=0.01379, audio_tagging_loss=0.009333, over 3043235.19 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:18:41,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2340933.3333333335, ans=0.0 2023-11-23 10:18:46,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351150 2023-11-23 10:18:47,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2341000.0, ans=0.0 2023-11-23 10:19:15,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2341133.3333333335, ans=0.0 2023-11-23 10:19:20,828 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 10:19:22,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2341133.3333333335, ans=0.1 2023-11-23 10:19:23,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=12.0 2023-11-23 10:19:41,842 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2500, loss[loss=0.06935, simple_loss=0.09109, pruned_loss=0.01163, audio_tagging_loss=0.01218, over 15141.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09218, pruned_loss=0.01368, audio_tagging_loss=0.009364, over 3040239.47 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:19:53,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351200 2023-11-23 10:19:55,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2341333.3333333335, ans=0.1 2023-11-23 10:20:24,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2341466.6666666665, ans=0.0 2023-11-23 10:20:26,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2341466.6666666665, ans=15.0 2023-11-23 10:20:27,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.304e+01 8.850e+01 9.688e+01 1.424e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 10:20:51,091 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2550, loss[loss=0.06112, simple_loss=0.08324, pruned_loss=0.01121, audio_tagging_loss=0.00829, over 15611.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09124, pruned_loss=0.01354, audio_tagging_loss=0.009321, over 3040335.82 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:20:57,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2341600.0, ans=0.0 2023-11-23 10:21:01,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351250 2023-11-23 10:21:25,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2341733.3333333335, ans=0.125 2023-11-23 10:21:28,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2341800.0, ans=0.125 2023-11-23 10:21:56,975 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2600, loss[loss=0.05638, simple_loss=0.07287, pruned_loss=0.008927, audio_tagging_loss=0.01102, over 13787.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09055, pruned_loss=0.0136, audio_tagging_loss=0.00922, over 3031795.70 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:22:03,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2341933.3333333335, ans=0.1 2023-11-23 10:22:06,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351300 2023-11-23 10:22:20,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=2342000.0, ans=22.5 2023-11-23 10:22:39,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2342133.3333333335, ans=0.5 2023-11-23 10:22:42,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.380e+01 9.027e+01 9.591e+01 1.211e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 10:22:55,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2342200.0, ans=0.0 2023-11-23 10:23:02,629 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2650, loss[loss=0.08295, simple_loss=0.1087, pruned_loss=0.0199, audio_tagging_loss=0.008713, over 15539.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09238, pruned_loss=0.01392, audio_tagging_loss=0.009017, over 3034959.74 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:23:14,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351350 2023-11-23 10:23:49,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2342466.6666666665, ans=0.125 2023-11-23 10:23:54,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2342533.3333333335, ans=0.1 2023-11-23 10:24:09,546 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2700, loss[loss=0.0591, simple_loss=0.07901, pruned_loss=0.0108, audio_tagging_loss=0.008795, over 15667.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09215, pruned_loss=0.01386, audio_tagging_loss=0.009, over 3036466.19 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:24:20,920 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351400 2023-11-23 10:24:25,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-23 10:24:26,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-23 10:24:29,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2023-11-23 10:24:31,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2342666.6666666665, ans=0.1 2023-11-23 10:24:52,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2342800.0, ans=0.125 2023-11-23 10:24:55,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.154e+01 8.358e+01 8.964e+01 9.971e+01 1.230e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 10:24:58,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-23 10:25:15,060 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2750, loss[loss=0.07664, simple_loss=0.1032, pruned_loss=0.01917, audio_tagging_loss=0.00586, over 13890.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09228, pruned_loss=0.01396, audio_tagging_loss=0.009041, over 3030907.36 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 16.0 2023-11-23 10:25:16,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2342933.3333333335, ans=0.125 2023-11-23 10:25:16,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-23 10:25:25,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351450 2023-11-23 10:25:29,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-23 10:25:40,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2343066.6666666665, ans=0.1 2023-11-23 10:25:53,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.27 vs. limit=10.0 2023-11-23 10:26:12,087 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:26:19,249 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2800, loss[loss=0.0598, simple_loss=0.08052, pruned_loss=0.01052, audio_tagging_loss=0.009026, over 14803.00 frames. ], tot_loss[loss=0.069, simple_loss=0.0919, pruned_loss=0.01405, audio_tagging_loss=0.009003, over 3038022.30 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2023-11-23 10:26:30,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351500 2023-11-23 10:26:45,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-23 10:27:05,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.185e+01 8.850e+01 9.511e+01 1.118e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 10:27:15,960 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 10:27:24,899 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2850, loss[loss=0.07041, simple_loss=0.09492, pruned_loss=0.01447, audio_tagging_loss=0.008483, over 15943.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09267, pruned_loss=0.01419, audio_tagging_loss=0.008946, over 3033363.00 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:27:35,421 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351550 2023-11-23 10:27:48,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2343666.6666666665, ans=0.125 2023-11-23 10:27:54,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-23 10:28:17,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2343866.6666666665, ans=0.125 2023-11-23 10:28:29,908 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2900, loss[loss=0.05739, simple_loss=0.07965, pruned_loss=0.008367, audio_tagging_loss=0.0092, over 16578.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09237, pruned_loss=0.0141, audio_tagging_loss=0.008994, over 3035557.29 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:28:34,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-11-23 10:28:36,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2343933.3333333335, ans=0.025 2023-11-23 10:28:40,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351600 2023-11-23 10:28:43,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2344000.0, ans=0.0 2023-11-23 10:28:52,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2344000.0, ans=0.015 2023-11-23 10:29:16,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2344133.3333333335, ans=0.125 2023-11-23 10:29:17,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.374e+01 8.898e+01 9.788e+01 1.211e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 10:29:19,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2344133.3333333335, ans=0.2 2023-11-23 10:29:29,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 10:29:35,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=12.0 2023-11-23 10:29:36,328 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 2950, loss[loss=0.057, simple_loss=0.07307, pruned_loss=0.01049, audio_tagging_loss=0.009969, over 15614.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.0937, pruned_loss=0.01443, audio_tagging_loss=0.008954, over 3040156.88 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:29:47,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351650 2023-11-23 10:30:12,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2344400.0, ans=0.0 2023-11-23 10:30:30,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2344533.3333333335, ans=0.0 2023-11-23 10:30:42,037 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3000, loss[loss=0.07449, simple_loss=0.1043, pruned_loss=0.01606, audio_tagging_loss=0.006304, over 16191.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.09368, pruned_loss=0.01449, audio_tagging_loss=0.009077, over 3041111.61 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:30:42,038 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 10:31:20,378 INFO [train_asr.py:1253] (1/4) Epoch 30, validation: loss=0.05789, simple_loss=0.05111, pruned_loss=0.005034, audio_tagging_loss=0.0273, over 4681554.00 frames. 2023-11-23 10:31:20,379 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 10:31:20,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2344600.0, ans=0.0 2023-11-23 10:31:20,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2344600.0, ans=0.2 2023-11-23 10:31:23,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-23 10:31:30,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2344600.0, ans=0.0 2023-11-23 10:31:31,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351700 2023-11-23 10:31:38,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-23 10:32:08,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.435e+01 9.100e+01 1.011e+02 1.193e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-23 10:32:18,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2344866.6666666665, ans=0.0 2023-11-23 10:32:25,955 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3050, loss[loss=0.07346, simple_loss=0.09859, pruned_loss=0.01566, audio_tagging_loss=0.008497, over 15725.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09329, pruned_loss=0.01438, audio_tagging_loss=0.009063, over 3037051.27 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 8.0 2023-11-23 10:32:26,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2344933.3333333335, ans=0.125 2023-11-23 10:32:27,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2344933.3333333335, ans=0.2 2023-11-23 10:32:37,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351750 2023-11-23 10:32:56,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2345066.6666666665, ans=0.125 2023-11-23 10:32:59,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2345066.6666666665, ans=0.2 2023-11-23 10:33:06,557 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:33:11,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2345133.3333333335, ans=0.125 2023-11-23 10:33:13,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2345133.3333333335, ans=0.125 2023-11-23 10:33:18,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2345200.0, ans=0.2 2023-11-23 10:33:32,026 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3100, loss[loss=0.06569, simple_loss=0.08499, pruned_loss=0.01325, audio_tagging_loss=0.009947, over 15339.00 frames. ], tot_loss[loss=0.07025, simple_loss=0.0934, pruned_loss=0.01434, audio_tagging_loss=0.009206, over 3038997.24 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 8.0 2023-11-23 10:33:42,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351800 2023-11-23 10:33:49,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2345333.3333333335, ans=0.1 2023-11-23 10:34:11,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2345466.6666666665, ans=0.0 2023-11-23 10:34:11,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2345466.6666666665, ans=0.1 2023-11-23 10:34:21,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.412e+01 9.012e+01 9.624e+01 1.358e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 10:34:31,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2345533.3333333335, ans=0.0 2023-11-23 10:34:38,150 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3150, loss[loss=0.08006, simple_loss=0.1048, pruned_loss=0.01906, audio_tagging_loss=0.008599, over 14648.00 frames. ], tot_loss[loss=0.07055, simple_loss=0.09381, pruned_loss=0.01437, audio_tagging_loss=0.009278, over 3040789.84 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 8.0 2023-11-23 10:34:48,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351850 2023-11-23 10:34:51,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2345666.6666666665, ans=0.0 2023-11-23 10:34:52,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2345666.6666666665, ans=0.125 2023-11-23 10:35:00,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2345666.6666666665, ans=0.125 2023-11-23 10:35:20,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2345800.0, ans=0.1 2023-11-23 10:35:21,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2345800.0, ans=0.2 2023-11-23 10:35:38,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2345866.6666666665, ans=0.125 2023-11-23 10:35:38,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2345866.6666666665, ans=0.125 2023-11-23 10:35:43,460 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3200, loss[loss=0.07985, simple_loss=0.1092, pruned_loss=0.01796, audio_tagging_loss=0.007273, over 15598.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09376, pruned_loss=0.01435, audio_tagging_loss=0.009363, over 3046781.54 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:35:54,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351900 2023-11-23 10:36:22,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2346133.3333333335, ans=0.125 2023-11-23 10:36:31,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.179e+01 8.758e+01 9.576e+01 1.199e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-23 10:36:49,761 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3250, loss[loss=0.05016, simple_loss=0.06217, pruned_loss=0.007816, audio_tagging_loss=0.01126, over 14614.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09295, pruned_loss=0.01414, audio_tagging_loss=0.009479, over 3043768.39 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:37:00,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 351950 2023-11-23 10:37:10,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2346333.3333333335, ans=0.0 2023-11-23 10:37:28,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2346466.6666666665, ans=0.0 2023-11-23 10:37:28,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2346466.6666666665, ans=0.125 2023-11-23 10:37:54,525 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3300, loss[loss=0.06466, simple_loss=0.08979, pruned_loss=0.0115, audio_tagging_loss=0.008263, over 15104.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09274, pruned_loss=0.01414, audio_tagging_loss=0.009583, over 3052252.60 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:37:55,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2346600.0, ans=0.125 2023-11-23 10:37:57,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2346600.0, ans=15.0 2023-11-23 10:38:04,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352000 2023-11-23 10:38:17,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2346666.6666666665, ans=0.125 2023-11-23 10:38:18,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2346666.6666666665, ans=0.2 2023-11-23 10:38:30,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-23 10:38:34,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2023-11-23 10:38:34,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2346733.3333333335, ans=0.1 2023-11-23 10:38:42,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2346800.0, ans=0.125 2023-11-23 10:38:46,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.301e+01 8.918e+01 9.607e+01 1.152e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 10:38:47,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2023-11-23 10:38:52,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2023-11-23 10:39:02,158 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3350, loss[loss=0.05126, simple_loss=0.0683, pruned_loss=0.008032, audio_tagging_loss=0.009072, over 14241.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09236, pruned_loss=0.01413, audio_tagging_loss=0.009433, over 3049010.16 frames. ], batch size: 53, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:39:04,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2346933.3333333335, ans=0.0 2023-11-23 10:39:05,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2346933.3333333335, ans=0.2 2023-11-23 10:39:09,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2346933.3333333335, ans=0.125 2023-11-23 10:39:12,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352050 2023-11-23 10:39:13,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-23 10:39:20,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2347000.0, ans=0.125 2023-11-23 10:39:36,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2347066.6666666665, ans=0.0 2023-11-23 10:39:38,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2347066.6666666665, ans=0.1 2023-11-23 10:39:51,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2023-11-23 10:39:55,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2347200.0, ans=0.125 2023-11-23 10:40:08,302 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3400, loss[loss=0.05725, simple_loss=0.0827, pruned_loss=0.007347, audio_tagging_loss=0.00855, over 15409.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09258, pruned_loss=0.01416, audio_tagging_loss=0.009267, over 3054002.83 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:40:19,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352100 2023-11-23 10:40:28,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2347333.3333333335, ans=0.125 2023-11-23 10:40:28,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2347333.3333333335, ans=0.125 2023-11-23 10:40:30,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2347333.3333333335, ans=0.125 2023-11-23 10:40:33,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2347400.0, ans=0.125 2023-11-23 10:40:35,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2347400.0, ans=0.0 2023-11-23 10:40:38,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2347400.0, ans=0.125 2023-11-23 10:40:39,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2347400.0, ans=0.2 2023-11-23 10:40:56,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.131e+01 8.875e+01 9.563e+01 1.133e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 10:41:00,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.54 vs. limit=10.0 2023-11-23 10:41:01,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2347533.3333333335, ans=0.125 2023-11-23 10:41:06,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2347533.3333333335, ans=0.125 2023-11-23 10:41:12,940 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3450, loss[loss=0.05591, simple_loss=0.08022, pruned_loss=0.007088, audio_tagging_loss=0.00871, over 15611.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09249, pruned_loss=0.01416, audio_tagging_loss=0.009134, over 3042859.30 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:41:23,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352150 2023-11-23 10:41:39,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.81 vs. limit=10.0 2023-11-23 10:41:40,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2347733.3333333335, ans=0.125 2023-11-23 10:42:11,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-11-23 10:42:16,593 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3500, loss[loss=0.07535, simple_loss=0.09544, pruned_loss=0.02133, audio_tagging_loss=0.006302, over 15441.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09311, pruned_loss=0.01421, audio_tagging_loss=0.008953, over 3048264.91 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:42:22,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2023-11-23 10:42:25,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2347933.3333333335, ans=0.0 2023-11-23 10:42:26,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352200 2023-11-23 10:42:27,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2348000.0, ans=0.0 2023-11-23 10:42:51,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2348066.6666666665, ans=0.125 2023-11-23 10:42:52,026 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:42:57,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2348133.3333333335, ans=0.0 2023-11-23 10:43:02,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2348133.3333333335, ans=0.2 2023-11-23 10:43:04,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.164e+01 8.763e+01 9.572e+01 1.144e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-23 10:43:20,958 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3550, loss[loss=0.08492, simple_loss=0.1251, pruned_loss=0.01706, audio_tagging_loss=0.005301, over 14808.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09306, pruned_loss=0.01412, audio_tagging_loss=0.008872, over 3055276.74 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:43:30,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2348266.6666666665, ans=0.0 2023-11-23 10:43:31,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352250 2023-11-23 10:43:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2348333.3333333335, ans=0.0 2023-11-23 10:43:46,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2348400.0, ans=0.1 2023-11-23 10:43:49,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2348400.0, ans=0.125 2023-11-23 10:43:52,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2348400.0, ans=0.125 2023-11-23 10:44:25,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2348600.0, ans=0.125 2023-11-23 10:44:25,952 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3600, loss[loss=0.05982, simple_loss=0.08424, pruned_loss=0.008843, audio_tagging_loss=0.008859, over 14779.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09289, pruned_loss=0.01414, audio_tagging_loss=0.008872, over 3046903.54 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:44:29,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2348600.0, ans=0.125 2023-11-23 10:44:33,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2348600.0, ans=0.125 2023-11-23 10:44:35,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352300 2023-11-23 10:44:36,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2348600.0, ans=6.0 2023-11-23 10:44:37,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2348666.6666666665, ans=0.0 2023-11-23 10:44:50,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-23 10:44:53,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2348733.3333333335, ans=0.125 2023-11-23 10:44:59,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2348733.3333333335, ans=0.125 2023-11-23 10:45:04,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2348800.0, ans=0.125 2023-11-23 10:45:07,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2348800.0, ans=0.025 2023-11-23 10:45:13,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.915e+01 8.138e+01 8.784e+01 9.716e+01 1.349e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-23 10:45:29,826 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3650, loss[loss=0.07928, simple_loss=0.1134, pruned_loss=0.01515, audio_tagging_loss=0.007437, over 15830.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09268, pruned_loss=0.01411, audio_tagging_loss=0.008879, over 3046222.68 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:45:30,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2348933.3333333335, ans=0.125 2023-11-23 10:45:35,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2348933.3333333335, ans=0.1 2023-11-23 10:45:39,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352350 2023-11-23 10:45:43,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2023-11-23 10:45:43,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2349000.0, ans=0.125 2023-11-23 10:45:53,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=10.0 2023-11-23 10:46:14,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2349133.3333333335, ans=0.04949747468305833 2023-11-23 10:46:16,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2349133.3333333335, ans=0.0 2023-11-23 10:46:18,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-23 10:46:19,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-11-23 10:46:25,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2349200.0, ans=0.0 2023-11-23 10:46:27,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2349200.0, ans=0.0 2023-11-23 10:46:34,666 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3700, loss[loss=0.0672, simple_loss=0.08927, pruned_loss=0.01335, audio_tagging_loss=0.009213, over 15261.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.0931, pruned_loss=0.01414, audio_tagging_loss=0.008882, over 3043090.61 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:46:46,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352400 2023-11-23 10:46:49,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2349333.3333333335, ans=0.125 2023-11-23 10:47:08,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-11-23 10:47:12,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2349400.0, ans=0.125 2023-11-23 10:47:20,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2349466.6666666665, ans=0.0 2023-11-23 10:47:23,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.475e+01 8.984e+01 9.756e+01 1.281e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 10:47:26,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2349533.3333333335, ans=0.125 2023-11-23 10:47:27,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2349533.3333333335, ans=0.125 2023-11-23 10:47:32,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-23 10:47:40,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2349600.0, ans=0.0 2023-11-23 10:47:42,631 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3750, loss[loss=0.0739, simple_loss=0.09672, pruned_loss=0.01445, audio_tagging_loss=0.0111, over 14708.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09364, pruned_loss=0.01433, audio_tagging_loss=0.008869, over 3053310.88 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:47:45,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2349600.0, ans=0.1 2023-11-23 10:47:53,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352450 2023-11-23 10:47:57,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2349666.6666666665, ans=0.125 2023-11-23 10:48:07,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2023-11-23 10:48:19,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2349733.3333333335, ans=0.125 2023-11-23 10:48:29,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2349800.0, ans=0.125 2023-11-23 10:48:31,263 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:48:31,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2349800.0, ans=0.0 2023-11-23 10:48:49,435 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3800, loss[loss=0.08342, simple_loss=0.1111, pruned_loss=0.01811, audio_tagging_loss=0.009782, over 15528.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09359, pruned_loss=0.0144, audio_tagging_loss=0.008895, over 3051844.06 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:48:57,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2349933.3333333335, ans=0.125 2023-11-23 10:48:59,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352500 2023-11-23 10:49:11,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-23 10:49:18,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2350066.6666666665, ans=0.0 2023-11-23 10:49:38,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.465e+01 8.919e+01 9.665e+01 1.243e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 10:49:38,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2350133.3333333335, ans=0.5 2023-11-23 10:49:44,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2350200.0, ans=0.125 2023-11-23 10:49:44,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2350200.0, ans=0.125 2023-11-23 10:49:52,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2350200.0, ans=0.125 2023-11-23 10:49:54,992 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3850, loss[loss=0.07144, simple_loss=0.09836, pruned_loss=0.01371, audio_tagging_loss=0.008551, over 16292.00 frames. ], tot_loss[loss=0.07063, simple_loss=0.09393, pruned_loss=0.01459, audio_tagging_loss=0.009073, over 3060847.13 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 10:50:06,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352550 2023-11-23 10:50:42,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2023-11-23 10:50:42,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-23 10:50:54,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2350533.3333333335, ans=0.0 2023-11-23 10:51:02,820 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3900, loss[loss=0.07591, simple_loss=0.1028, pruned_loss=0.01427, audio_tagging_loss=0.01023, over 15810.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09304, pruned_loss=0.0145, audio_tagging_loss=0.009203, over 3054798.94 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 8.0 2023-11-23 10:51:14,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352600 2023-11-23 10:51:26,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2350666.6666666665, ans=0.07 2023-11-23 10:51:39,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-23 10:51:55,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.310e+01 8.820e+01 9.647e+01 1.440e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-23 10:52:10,439 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 3950, loss[loss=0.06239, simple_loss=0.08052, pruned_loss=0.0117, audio_tagging_loss=0.01043, over 15033.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.09384, pruned_loss=0.01449, audio_tagging_loss=0.009276, over 3050362.59 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 8.0 2023-11-23 10:52:20,618 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352650 2023-11-23 10:52:22,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2351000.0, ans=0.125 2023-11-23 10:52:39,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2351066.6666666665, ans=0.2 2023-11-23 10:53:01,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2351133.3333333335, ans=0.125 2023-11-23 10:53:06,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2351200.0, ans=0.125 2023-11-23 10:53:10,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2351200.0, ans=0.125 2023-11-23 10:53:16,298 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4000, loss[loss=0.05681, simple_loss=0.07251, pruned_loss=0.01073, audio_tagging_loss=0.009822, over 14430.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.0944, pruned_loss=0.01449, audio_tagging_loss=0.009261, over 3047878.04 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:53:21,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-23 10:53:27,932 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352700 2023-11-23 10:53:28,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2351266.6666666665, ans=0.125 2023-11-23 10:53:42,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2351400.0, ans=0.125 2023-11-23 10:54:07,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-23 10:54:08,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.440e+01 9.076e+01 9.924e+01 2.102e+02, threshold=1.815e+02, percent-clipped=1.0 2023-11-23 10:54:19,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2023-11-23 10:54:23,975 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4050, loss[loss=0.07239, simple_loss=0.09779, pruned_loss=0.0141, audio_tagging_loss=0.009396, over 14881.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09394, pruned_loss=0.01449, audio_tagging_loss=0.009377, over 3046908.99 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:54:26,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2351600.0, ans=0.125 2023-11-23 10:54:27,748 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:54:34,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352750 2023-11-23 10:54:57,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2351733.3333333335, ans=0.0 2023-11-23 10:55:29,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2351933.3333333335, ans=0.125 2023-11-23 10:55:30,760 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4100, loss[loss=0.0742, simple_loss=0.1093, pruned_loss=0.01458, audio_tagging_loss=0.004962, over 16306.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.09459, pruned_loss=0.01463, audio_tagging_loss=0.009318, over 3049300.93 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:55:39,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2351933.3333333335, ans=0.2 2023-11-23 10:55:41,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352800 2023-11-23 10:55:49,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2352000.0, ans=0.125 2023-11-23 10:55:52,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2352000.0, ans=0.125 2023-11-23 10:56:14,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-23 10:56:22,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.387e+01 8.886e+01 9.725e+01 1.263e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-23 10:56:27,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2352200.0, ans=0.2 2023-11-23 10:56:36,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-23 10:56:37,354 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4150, loss[loss=0.06191, simple_loss=0.07845, pruned_loss=0.01115, audio_tagging_loss=0.01153, over 14520.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09399, pruned_loss=0.01463, audio_tagging_loss=0.009224, over 3048483.15 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:56:48,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352850 2023-11-23 10:57:25,904 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 10:57:29,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2352533.3333333335, ans=0.125 2023-11-23 10:57:43,195 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4200, loss[loss=0.06931, simple_loss=0.08629, pruned_loss=0.01724, audio_tagging_loss=0.008922, over 14910.00 frames. ], tot_loss[loss=0.07102, simple_loss=0.09469, pruned_loss=0.01462, audio_tagging_loss=0.009053, over 3044875.92 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:57:46,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2352600.0, ans=0.125 2023-11-23 10:57:49,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2352600.0, ans=0.125 2023-11-23 10:57:53,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352900 2023-11-23 10:57:54,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-11-23 10:58:01,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2352666.6666666665, ans=0.125 2023-11-23 10:58:03,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-23 10:58:26,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2352800.0, ans=0.125 2023-11-23 10:58:34,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.262e+01 8.280e+01 9.153e+01 9.878e+01 1.174e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-23 10:58:48,824 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4250, loss[loss=0.05777, simple_loss=0.06856, pruned_loss=0.01146, audio_tagging_loss=0.01203, over 14875.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09385, pruned_loss=0.01441, audio_tagging_loss=0.009024, over 3044576.92 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:58:58,764 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 352950 2023-11-23 10:59:04,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2353000.0, ans=0.125 2023-11-23 10:59:28,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2353133.3333333335, ans=0.0 2023-11-23 10:59:45,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2353200.0, ans=0.125 2023-11-23 10:59:49,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2353200.0, ans=0.1 2023-11-23 10:59:54,205 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4300, loss[loss=0.05701, simple_loss=0.07165, pruned_loss=0.01315, audio_tagging_loss=0.00803, over 14194.00 frames. ], tot_loss[loss=0.07058, simple_loss=0.09442, pruned_loss=0.01448, audio_tagging_loss=0.008898, over 3045639.47 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 10:59:55,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2353266.6666666665, ans=0.125 2023-11-23 10:59:55,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2353266.6666666665, ans=0.0 2023-11-23 11:00:04,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353000 2023-11-23 11:00:10,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2353333.3333333335, ans=0.125 2023-11-23 11:00:19,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2353400.0, ans=0.1 2023-11-23 11:00:21,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2353400.0, ans=0.125 2023-11-23 11:00:29,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2023-11-23 11:00:34,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=22.5 2023-11-23 11:00:38,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-11-23 11:00:45,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2353466.6666666665, ans=0.1 2023-11-23 11:00:45,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.412e+01 9.137e+01 9.657e+01 1.208e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-23 11:01:01,184 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4350, loss[loss=0.06892, simple_loss=0.0888, pruned_loss=0.01193, audio_tagging_loss=0.0126, over 14435.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.09396, pruned_loss=0.0142, audio_tagging_loss=0.008918, over 3051998.38 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:01:11,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353050 2023-11-23 11:01:30,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-23 11:01:30,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2353733.3333333335, ans=0.125 2023-11-23 11:01:45,431 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.123e-02 2023-11-23 11:01:55,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2353866.6666666665, ans=0.1 2023-11-23 11:02:07,043 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4400, loss[loss=0.05412, simple_loss=0.06697, pruned_loss=0.007923, audio_tagging_loss=0.01271, over 14454.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09366, pruned_loss=0.01421, audio_tagging_loss=0.00897, over 3051536.33 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:02:07,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2353933.3333333335, ans=0.125 2023-11-23 11:02:08,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2353933.3333333335, ans=0.0 2023-11-23 11:02:17,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353100 2023-11-23 11:02:20,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-23 11:02:33,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2354066.6666666665, ans=10.0 2023-11-23 11:02:58,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.258e+01 8.734e+01 9.523e+01 1.170e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-23 11:03:04,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2354200.0, ans=0.09899494936611666 2023-11-23 11:03:12,642 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4450, loss[loss=0.07845, simple_loss=0.107, pruned_loss=0.01739, audio_tagging_loss=0.007545, over 15750.00 frames. ], tot_loss[loss=0.06955, simple_loss=0.09294, pruned_loss=0.01414, audio_tagging_loss=0.008949, over 3050385.13 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:03:23,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353150 2023-11-23 11:03:31,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=22.5 2023-11-23 11:03:52,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2354466.6666666665, ans=0.1 2023-11-23 11:04:04,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2354533.3333333335, ans=0.125 2023-11-23 11:04:13,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-23 11:04:18,925 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4500, loss[loss=0.05248, simple_loss=0.06413, pruned_loss=0.01038, audio_tagging_loss=0.01003, over 15337.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09235, pruned_loss=0.01405, audio_tagging_loss=0.009039, over 3054221.81 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:04:29,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353200 2023-11-23 11:04:40,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-23 11:04:49,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2354733.3333333335, ans=0.1 2023-11-23 11:04:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2354733.3333333335, ans=0.05 2023-11-23 11:04:55,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2354733.3333333335, ans=0.1 2023-11-23 11:05:10,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.746e+01 8.361e+01 8.894e+01 9.678e+01 1.285e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 11:05:13,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2354866.6666666665, ans=0.0 2023-11-23 11:05:14,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2023-11-23 11:05:18,519 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:05:25,306 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4550, loss[loss=0.06986, simple_loss=0.08949, pruned_loss=0.01521, audio_tagging_loss=0.009907, over 14766.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09156, pruned_loss=0.01394, audio_tagging_loss=0.009034, over 3057242.89 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:05:35,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353250 2023-11-23 11:05:43,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2355000.0, ans=0.125 2023-11-23 11:05:43,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2355000.0, ans=0.125 2023-11-23 11:05:50,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2355066.6666666665, ans=0.0 2023-11-23 11:05:58,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2355066.6666666665, ans=0.125 2023-11-23 11:06:16,940 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 11:06:29,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2355266.6666666665, ans=0.125 2023-11-23 11:06:30,706 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4600, loss[loss=0.06741, simple_loss=0.08086, pruned_loss=0.01443, audio_tagging_loss=0.01255, over 16343.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09219, pruned_loss=0.01417, audio_tagging_loss=0.009146, over 3059341.43 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:06:30,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2355266.6666666665, ans=0.125 2023-11-23 11:06:40,932 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353300 2023-11-23 11:06:42,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2355333.3333333335, ans=0.125 2023-11-23 11:06:54,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2355333.3333333335, ans=0.09899494936611666 2023-11-23 11:06:58,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2355400.0, ans=0.125 2023-11-23 11:07:22,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.374e+01 9.113e+01 9.738e+01 1.636e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-23 11:07:29,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2355533.3333333335, ans=15.0 2023-11-23 11:07:35,290 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4650, loss[loss=0.0585, simple_loss=0.07545, pruned_loss=0.008386, audio_tagging_loss=0.01239, over 14827.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09191, pruned_loss=0.01407, audio_tagging_loss=0.009204, over 3051370.49 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:07:39,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2355600.0, ans=0.125 2023-11-23 11:07:42,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2355600.0, ans=0.0 2023-11-23 11:07:47,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353350 2023-11-23 11:07:59,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2355666.6666666665, ans=0.0 2023-11-23 11:08:06,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2355733.3333333335, ans=0.05 2023-11-23 11:08:23,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2355800.0, ans=0.0 2023-11-23 11:08:38,385 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:08:40,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2355933.3333333335, ans=0.125 2023-11-23 11:08:40,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2355933.3333333335, ans=0.5 2023-11-23 11:08:40,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:08:42,517 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4700, loss[loss=0.0774, simple_loss=0.1096, pruned_loss=0.01571, audio_tagging_loss=0.006866, over 15200.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09157, pruned_loss=0.01394, audio_tagging_loss=0.009249, over 3056363.79 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:08:52,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353400 2023-11-23 11:08:53,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2023-11-23 11:08:54,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2356000.0, ans=0.125 2023-11-23 11:09:02,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2356000.0, ans=0.0 2023-11-23 11:09:09,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2023-11-23 11:09:10,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2023-11-23 11:09:35,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.787e+01 8.186e+01 8.702e+01 9.591e+01 1.216e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-23 11:09:41,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-23 11:09:47,646 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4750, loss[loss=0.06568, simple_loss=0.09865, pruned_loss=0.01099, audio_tagging_loss=0.00536, over 15523.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09184, pruned_loss=0.01394, audio_tagging_loss=0.009217, over 3048392.82 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:09:56,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2356266.6666666665, ans=0.125 2023-11-23 11:09:57,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353450 2023-11-23 11:10:15,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2356400.0, ans=0.05 2023-11-23 11:10:46,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2356533.3333333335, ans=0.0 2023-11-23 11:10:52,103 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4800, loss[loss=0.0818, simple_loss=0.1102, pruned_loss=0.01658, audio_tagging_loss=0.01014, over 15404.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09185, pruned_loss=0.01395, audio_tagging_loss=0.00939, over 3044237.05 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:11:04,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353500 2023-11-23 11:11:22,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2356733.3333333335, ans=0.0 2023-11-23 11:11:28,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2356733.3333333335, ans=0.0 2023-11-23 11:11:44,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.167e+01 8.824e+01 9.550e+01 1.189e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 11:11:58,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2356933.3333333335, ans=0.0 2023-11-23 11:11:59,390 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4850, loss[loss=0.06447, simple_loss=0.09353, pruned_loss=0.01124, audio_tagging_loss=0.006461, over 15055.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09179, pruned_loss=0.01387, audio_tagging_loss=0.009446, over 3047412.74 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:12:10,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353550 2023-11-23 11:12:27,421 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:12:27,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2357066.6666666665, ans=0.025 2023-11-23 11:12:27,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2357066.6666666665, ans=0.0 2023-11-23 11:12:33,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2357066.6666666665, ans=0.125 2023-11-23 11:12:41,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2357133.3333333335, ans=0.0 2023-11-23 11:13:05,000 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4900, loss[loss=0.07233, simple_loss=0.09297, pruned_loss=0.01654, audio_tagging_loss=0.009305, over 16165.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09184, pruned_loss=0.01372, audio_tagging_loss=0.00942, over 3045283.17 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:13:05,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2357266.6666666665, ans=0.07 2023-11-23 11:13:12,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2357266.6666666665, ans=0.1 2023-11-23 11:13:15,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353600 2023-11-23 11:13:25,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2357333.3333333335, ans=0.125 2023-11-23 11:13:44,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-23 11:13:46,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2357466.6666666665, ans=0.125 2023-11-23 11:13:50,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2357466.6666666665, ans=0.5 2023-11-23 11:13:57,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.207e+01 8.666e+01 9.499e+01 1.245e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-23 11:14:00,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2357533.3333333335, ans=0.125 2023-11-23 11:14:00,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2357533.3333333335, ans=0.125 2023-11-23 11:14:05,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2357533.3333333335, ans=0.125 2023-11-23 11:14:10,306 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 4950, loss[loss=0.07827, simple_loss=0.1084, pruned_loss=0.01813, audio_tagging_loss=0.00596, over 15780.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09185, pruned_loss=0.01375, audio_tagging_loss=0.009221, over 3040269.53 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:14:21,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353650 2023-11-23 11:14:27,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2357666.6666666665, ans=0.125 2023-11-23 11:15:17,629 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5000, loss[loss=0.06689, simple_loss=0.08647, pruned_loss=0.01437, audio_tagging_loss=0.009284, over 15591.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09173, pruned_loss=0.01393, audio_tagging_loss=0.0092, over 3042068.53 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:15:21,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-23 11:15:26,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2023-11-23 11:15:26,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-23 11:15:28,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2357933.3333333335, ans=0.125 2023-11-23 11:15:29,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353700 2023-11-23 11:15:35,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2358000.0, ans=0.125 2023-11-23 11:15:48,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2358066.6666666665, ans=0.125 2023-11-23 11:16:10,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.172e+01 8.851e+01 9.559e+01 1.178e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 11:16:16,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2358200.0, ans=0.07 2023-11-23 11:16:24,099 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5050, loss[loss=0.05007, simple_loss=0.05814, pruned_loss=0.008633, audio_tagging_loss=0.01236, over 14760.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09158, pruned_loss=0.01387, audio_tagging_loss=0.009215, over 3036371.80 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:16:28,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2358266.6666666665, ans=0.2 2023-11-23 11:16:34,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353750 2023-11-23 11:16:44,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-23 11:17:12,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2358466.6666666665, ans=0.125 2023-11-23 11:17:21,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.24 vs. limit=22.5 2023-11-23 11:17:26,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2358533.3333333335, ans=0.125 2023-11-23 11:17:29,512 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5100, loss[loss=0.07007, simple_loss=0.09751, pruned_loss=0.01422, audio_tagging_loss=0.007095, over 14785.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09101, pruned_loss=0.01359, audio_tagging_loss=0.009122, over 3036002.18 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:17:33,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2358600.0, ans=0.125 2023-11-23 11:17:40,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353800 2023-11-23 11:17:51,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2358666.6666666665, ans=0.125 2023-11-23 11:17:54,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-23 11:18:06,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2358733.3333333335, ans=0.2 2023-11-23 11:18:17,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2358800.0, ans=0.125 2023-11-23 11:18:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2358866.6666666665, ans=0.1 2023-11-23 11:18:23,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.214e+01 8.798e+01 9.658e+01 1.127e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-23 11:18:34,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2358933.3333333335, ans=0.1 2023-11-23 11:18:35,635 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5150, loss[loss=0.07244, simple_loss=0.09731, pruned_loss=0.01506, audio_tagging_loss=0.008731, over 14409.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09127, pruned_loss=0.01362, audio_tagging_loss=0.009114, over 3041082.58 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:18:46,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353850 2023-11-23 11:18:57,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2359000.0, ans=0.125 2023-11-23 11:19:06,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2359066.6666666665, ans=0.125 2023-11-23 11:19:09,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2359066.6666666665, ans=0.0 2023-11-23 11:19:17,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2359133.3333333335, ans=0.125 2023-11-23 11:19:33,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2359200.0, ans=0.0 2023-11-23 11:19:38,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2359200.0, ans=0.125 2023-11-23 11:19:42,033 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5200, loss[loss=0.06544, simple_loss=0.0901, pruned_loss=0.01301, audio_tagging_loss=0.007379, over 14199.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09144, pruned_loss=0.01366, audio_tagging_loss=0.009097, over 3033410.79 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:19:52,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353900 2023-11-23 11:20:03,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2359333.3333333335, ans=0.125 2023-11-23 11:20:36,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.445e+01 9.037e+01 9.922e+01 1.226e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-23 11:20:46,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2359600.0, ans=0.125 2023-11-23 11:20:47,851 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5250, loss[loss=0.07945, simple_loss=0.09599, pruned_loss=0.02175, audio_tagging_loss=0.009703, over 15001.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09193, pruned_loss=0.01382, audio_tagging_loss=0.008954, over 3037425.25 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:20:48,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2359600.0, ans=0.0 2023-11-23 11:20:49,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-23 11:20:51,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-23 11:20:57,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 353950 2023-11-23 11:21:14,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2359733.3333333335, ans=0.1 2023-11-23 11:21:33,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2359800.0, ans=0.0 2023-11-23 11:21:54,201 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5300, loss[loss=0.04399, simple_loss=0.05482, pruned_loss=0.008009, audio_tagging_loss=0.008571, over 14577.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09193, pruned_loss=0.01377, audio_tagging_loss=0.008952, over 3039137.55 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:21:57,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2359933.3333333335, ans=0.125 2023-11-23 11:21:58,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2359933.3333333335, ans=0.2 2023-11-23 11:22:04,411 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354000 2023-11-23 11:22:29,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-23 11:22:49,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.002e+01 8.452e+01 9.049e+01 9.694e+01 1.252e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 11:22:57,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2360200.0, ans=0.125 2023-11-23 11:23:00,035 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5350, loss[loss=0.06095, simple_loss=0.07661, pruned_loss=0.01173, audio_tagging_loss=0.01092, over 13352.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09289, pruned_loss=0.01408, audio_tagging_loss=0.008928, over 3031751.26 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:23:03,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2360266.6666666665, ans=0.125 2023-11-23 11:23:10,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354050 2023-11-23 11:23:12,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2360333.3333333335, ans=0.125 2023-11-23 11:23:14,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2360333.3333333335, ans=0.125 2023-11-23 11:23:19,842 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:23:30,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2360400.0, ans=0.125 2023-11-23 11:23:52,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2360533.3333333335, ans=0.125 2023-11-23 11:24:06,735 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5400, loss[loss=0.06434, simple_loss=0.08353, pruned_loss=0.01164, audio_tagging_loss=0.01093, over 14839.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09255, pruned_loss=0.01403, audio_tagging_loss=0.008994, over 3029057.90 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:24:08,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2360600.0, ans=0.0 2023-11-23 11:24:16,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354100 2023-11-23 11:24:19,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2360666.6666666665, ans=0.125 2023-11-23 11:24:22,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2360666.6666666665, ans=0.2 2023-11-23 11:24:34,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2360733.3333333335, ans=0.125 2023-11-23 11:24:45,549 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:24:49,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2360800.0, ans=0.1 2023-11-23 11:24:56,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2360800.0, ans=0.0 2023-11-23 11:25:02,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.248e+01 8.837e+01 9.737e+01 1.214e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 11:25:02,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2360866.6666666665, ans=0.2 2023-11-23 11:25:12,628 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5450, loss[loss=0.06965, simple_loss=0.09232, pruned_loss=0.0132, audio_tagging_loss=0.01029, over 15661.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09246, pruned_loss=0.01412, audio_tagging_loss=0.009105, over 3025430.03 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:25:19,152 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:25:24,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354150 2023-11-23 11:25:26,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2361000.0, ans=0.1 2023-11-23 11:25:29,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2361000.0, ans=0.0 2023-11-23 11:25:52,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2023-11-23 11:25:54,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2361133.3333333335, ans=0.1 2023-11-23 11:26:09,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2361200.0, ans=0.0 2023-11-23 11:26:19,494 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5500, loss[loss=0.05408, simple_loss=0.07022, pruned_loss=0.01081, audio_tagging_loss=0.008156, over 13960.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09307, pruned_loss=0.01435, audio_tagging_loss=0.009146, over 3032088.13 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:26:23,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2361266.6666666665, ans=0.125 2023-11-23 11:26:24,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2361266.6666666665, ans=0.125 2023-11-23 11:26:30,280 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354200 2023-11-23 11:26:43,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2361333.3333333335, ans=0.125 2023-11-23 11:26:48,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2361400.0, ans=0.1 2023-11-23 11:27:10,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2361466.6666666665, ans=0.125 2023-11-23 11:27:16,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.480e+01 8.474e+01 9.083e+01 9.895e+01 1.258e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 11:27:22,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2361533.3333333335, ans=0.0 2023-11-23 11:27:26,396 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5550, loss[loss=0.06602, simple_loss=0.08612, pruned_loss=0.01083, audio_tagging_loss=0.01213, over 15084.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09292, pruned_loss=0.01443, audio_tagging_loss=0.009315, over 3036126.27 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:27:26,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2361600.0, ans=0.0 2023-11-23 11:27:37,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354250 2023-11-23 11:28:07,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2361800.0, ans=0.125 2023-11-23 11:28:22,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2361866.6666666665, ans=0.125 2023-11-23 11:28:27,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2361866.6666666665, ans=0.05 2023-11-23 11:28:32,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2023-11-23 11:28:32,566 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5600, loss[loss=0.07556, simple_loss=0.1022, pruned_loss=0.0163, audio_tagging_loss=0.008161, over 15120.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.09266, pruned_loss=0.01421, audio_tagging_loss=0.009467, over 3041375.16 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 32.0 2023-11-23 11:28:41,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2361933.3333333335, ans=0.0 2023-11-23 11:28:43,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354300 2023-11-23 11:29:00,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2362066.6666666665, ans=0.2 2023-11-23 11:29:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2362066.6666666665, ans=0.125 2023-11-23 11:29:05,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2362066.6666666665, ans=0.125 2023-11-23 11:29:10,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2023-11-23 11:29:13,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-23 11:29:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2362133.3333333335, ans=0.125 2023-11-23 11:29:21,272 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 11:29:29,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 8.297e+01 9.078e+01 9.642e+01 1.395e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-23 11:29:36,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-23 11:29:38,652 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5650, loss[loss=0.06663, simple_loss=0.09518, pruned_loss=0.01112, audio_tagging_loss=0.007925, over 15672.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09205, pruned_loss=0.01399, audio_tagging_loss=0.009531, over 3045151.80 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:29:49,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354350 2023-11-23 11:29:55,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.14 vs. limit=15.0 2023-11-23 11:29:55,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2362333.3333333335, ans=0.0 2023-11-23 11:30:26,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2362466.6666666665, ans=0.0 2023-11-23 11:30:31,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2362533.3333333335, ans=0.1 2023-11-23 11:30:35,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-23 11:30:44,117 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5700, loss[loss=0.07827, simple_loss=0.09682, pruned_loss=0.01896, audio_tagging_loss=0.0109, over 14717.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09259, pruned_loss=0.01411, audio_tagging_loss=0.009404, over 3041883.61 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:30:44,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2023-11-23 11:30:54,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354400 2023-11-23 11:30:59,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2362666.6666666665, ans=0.125 2023-11-23 11:31:08,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2362666.6666666665, ans=0.125 2023-11-23 11:31:19,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2362733.3333333335, ans=0.125 2023-11-23 11:31:25,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2362800.0, ans=0.125 2023-11-23 11:31:33,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2362800.0, ans=0.0 2023-11-23 11:31:40,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.276e+01 8.914e+01 9.745e+01 1.403e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 11:31:49,755 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5750, loss[loss=0.06864, simple_loss=0.09139, pruned_loss=0.01468, audio_tagging_loss=0.008273, over 16012.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09196, pruned_loss=0.014, audio_tagging_loss=0.009377, over 3046691.80 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:32:01,159 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354450 2023-11-23 11:32:13,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2363000.0, ans=0.1 2023-11-23 11:32:15,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2363066.6666666665, ans=0.125 2023-11-23 11:32:20,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2363066.6666666665, ans=0.125 2023-11-23 11:32:47,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2363200.0, ans=0.125 2023-11-23 11:32:55,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-23 11:32:56,513 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5800, loss[loss=0.0673, simple_loss=0.08675, pruned_loss=0.01547, audio_tagging_loss=0.00845, over 15440.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09253, pruned_loss=0.01407, audio_tagging_loss=0.00915, over 3054052.51 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:32:58,023 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:33:07,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354500 2023-11-23 11:33:07,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2363266.6666666665, ans=0.0 2023-11-23 11:33:16,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2363333.3333333335, ans=0.125 2023-11-23 11:33:18,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2363333.3333333335, ans=0.125 2023-11-23 11:33:50,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2363533.3333333335, ans=0.0 2023-11-23 11:33:53,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 8.322e+01 8.914e+01 9.568e+01 1.180e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 11:34:02,447 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5850, loss[loss=0.05611, simple_loss=0.07836, pruned_loss=0.009002, audio_tagging_loss=0.00793, over 15477.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09185, pruned_loss=0.01397, audio_tagging_loss=0.009076, over 3048986.31 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:34:10,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2363600.0, ans=0.1 2023-11-23 11:34:12,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354550 2023-11-23 11:34:13,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2363666.6666666665, ans=0.125 2023-11-23 11:34:21,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2363666.6666666665, ans=0.2 2023-11-23 11:34:27,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2363733.3333333335, ans=0.0 2023-11-23 11:34:35,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2363733.3333333335, ans=0.1 2023-11-23 11:35:06,286 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5900, loss[loss=0.06947, simple_loss=0.09634, pruned_loss=0.01252, audio_tagging_loss=0.008777, over 16778.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.0916, pruned_loss=0.01387, audio_tagging_loss=0.009099, over 3043850.07 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 16.0 2023-11-23 11:35:17,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354600 2023-11-23 11:35:19,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2364000.0, ans=0.125 2023-11-23 11:36:02,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.354e+01 8.917e+01 9.513e+01 1.247e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 11:36:03,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2364200.0, ans=0.0 2023-11-23 11:36:07,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2364200.0, ans=0.04949747468305833 2023-11-23 11:36:12,719 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 5950, loss[loss=0.07731, simple_loss=0.09874, pruned_loss=0.01577, audio_tagging_loss=0.01217, over 15334.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09159, pruned_loss=0.01397, audio_tagging_loss=0.0091, over 3045465.84 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 11:36:23,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354650 2023-11-23 11:36:25,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2023-11-23 11:36:27,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2364333.3333333335, ans=0.125 2023-11-23 11:36:44,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2364400.0, ans=0.125 2023-11-23 11:37:09,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-23 11:37:13,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2364533.3333333335, ans=0.0 2023-11-23 11:37:17,286 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6000, loss[loss=0.06294, simple_loss=0.08028, pruned_loss=0.01028, audio_tagging_loss=0.01252, over 15064.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09184, pruned_loss=0.01401, audio_tagging_loss=0.009065, over 3049441.59 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:37:17,287 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 11:37:37,467 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9466, 2.6068, 4.4583, 2.1579], device='cuda:1') 2023-11-23 11:37:45,494 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9709, 3.2314, 2.9513, 3.2331, 3.3682, 2.8234, 3.4567, 2.6822], device='cuda:1') 2023-11-23 11:37:49,570 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9567, 3.7866, 4.8850, 4.4119], device='cuda:1') 2023-11-23 11:37:52,656 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5964, 3.5420, 3.9198, 3.5287], device='cuda:1') 2023-11-23 11:37:53,462 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2751, 4.2698, 4.4877, 4.4977], device='cuda:1') 2023-11-23 11:37:57,929 INFO [train_asr.py:1253] (1/4) Epoch 30, validation: loss=0.05791, simple_loss=0.05108, pruned_loss=0.005053, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-23 11:37:57,930 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 11:38:08,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354700 2023-11-23 11:38:11,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:38:14,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2364666.6666666665, ans=0.1 2023-11-23 11:38:39,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2364800.0, ans=0.04949747468305833 2023-11-23 11:38:42,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2364800.0, ans=0.1 2023-11-23 11:38:45,073 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 11:38:52,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.297e+01 9.002e+01 9.719e+01 1.265e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-23 11:39:02,737 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6050, loss[loss=0.05432, simple_loss=0.06415, pruned_loss=0.009244, audio_tagging_loss=0.013, over 15398.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09176, pruned_loss=0.01398, audio_tagging_loss=0.00908, over 3058882.74 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:39:13,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354750 2023-11-23 11:39:46,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2365133.3333333335, ans=0.0 2023-11-23 11:39:58,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2365200.0, ans=0.0 2023-11-23 11:40:07,147 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6100, loss[loss=0.05169, simple_loss=0.07104, pruned_loss=0.007213, audio_tagging_loss=0.008956, over 14299.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09195, pruned_loss=0.01405, audio_tagging_loss=0.009034, over 3051101.72 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:40:16,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-23 11:40:17,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354800 2023-11-23 11:40:17,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2365266.6666666665, ans=0.125 2023-11-23 11:40:18,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2365333.3333333335, ans=0.0 2023-11-23 11:40:27,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2023-11-23 11:41:02,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.198e+01 8.863e+01 9.563e+01 1.175e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 11:41:11,367 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6150, loss[loss=0.05586, simple_loss=0.06998, pruned_loss=0.00984, audio_tagging_loss=0.01104, over 14751.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09105, pruned_loss=0.01379, audio_tagging_loss=0.009141, over 3052088.09 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:41:21,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354850 2023-11-23 11:41:56,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2365800.0, ans=0.125 2023-11-23 11:42:16,174 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6200, loss[loss=0.07028, simple_loss=0.07894, pruned_loss=0.018, audio_tagging_loss=0.01281, over 14938.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09083, pruned_loss=0.01365, audio_tagging_loss=0.009199, over 3049353.54 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:42:27,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354900 2023-11-23 11:42:48,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2366066.6666666665, ans=0.0 2023-11-23 11:42:52,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2366066.6666666665, ans=0.07 2023-11-23 11:43:01,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2366133.3333333335, ans=0.125 2023-11-23 11:43:13,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.235e+01 8.845e+01 9.606e+01 1.266e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 11:43:21,052 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6250, loss[loss=0.05099, simple_loss=0.0582, pruned_loss=0.01083, audio_tagging_loss=0.01106, over 14224.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09049, pruned_loss=0.0136, audio_tagging_loss=0.009294, over 3048100.78 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 11:43:27,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2366266.6666666665, ans=0.1 2023-11-23 11:43:31,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 354950 2023-11-23 11:43:35,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2023-11-23 11:43:54,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-23 11:43:58,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2366466.6666666665, ans=0.125 2023-11-23 11:44:04,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-23 11:44:24,651 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6300, loss[loss=0.07686, simple_loss=0.1047, pruned_loss=0.01627, audio_tagging_loss=0.008233, over 15195.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09145, pruned_loss=0.01369, audio_tagging_loss=0.009316, over 3051681.91 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 11:44:32,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-23 11:44:34,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355000 2023-11-23 11:44:47,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2366666.6666666665, ans=0.125 2023-11-23 11:45:01,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2366733.3333333335, ans=0.0 2023-11-23 11:45:02,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2366800.0, ans=0.2 2023-11-23 11:45:13,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2366800.0, ans=0.125 2023-11-23 11:45:17,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2366866.6666666665, ans=0.125 2023-11-23 11:45:20,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.278e+01 8.805e+01 9.594e+01 1.253e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-23 11:45:22,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2366866.6666666665, ans=0.125 2023-11-23 11:45:28,962 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6350, loss[loss=0.066, simple_loss=0.08236, pruned_loss=0.01576, audio_tagging_loss=0.009056, over 16085.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09193, pruned_loss=0.01366, audio_tagging_loss=0.009315, over 3052953.44 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 11:45:29,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2366933.3333333335, ans=0.125 2023-11-23 11:45:39,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355050 2023-11-23 11:45:53,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2367000.0, ans=0.125 2023-11-23 11:46:34,442 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6400, loss[loss=0.07854, simple_loss=0.1058, pruned_loss=0.01417, audio_tagging_loss=0.01145, over 15409.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09203, pruned_loss=0.01378, audio_tagging_loss=0.00931, over 3053498.52 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:46:37,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2367266.6666666665, ans=0.125 2023-11-23 11:46:44,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355100 2023-11-23 11:46:46,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2367333.3333333335, ans=0.07 2023-11-23 11:46:46,967 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 11:46:50,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2023-11-23 11:47:30,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.031e+01 8.724e+01 9.486e+01 1.143e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-23 11:47:38,267 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6450, loss[loss=0.04902, simple_loss=0.05676, pruned_loss=0.009834, audio_tagging_loss=0.01081, over 14445.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09217, pruned_loss=0.01389, audio_tagging_loss=0.009309, over 3046338.90 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:47:48,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355150 2023-11-23 11:47:55,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2367666.6666666665, ans=0.125 2023-11-23 11:48:43,078 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6500, loss[loss=0.06249, simple_loss=0.08971, pruned_loss=0.01194, audio_tagging_loss=0.005693, over 14664.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09222, pruned_loss=0.01391, audio_tagging_loss=0.009221, over 3047482.92 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:48:53,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355200 2023-11-23 11:48:59,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-23 11:49:40,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.456e+01 9.159e+01 9.950e+01 1.283e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-23 11:49:48,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-23 11:49:48,680 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6550, loss[loss=0.08103, simple_loss=0.1131, pruned_loss=0.01769, audio_tagging_loss=0.006821, over 15465.00 frames. ], tot_loss[loss=0.06984, simple_loss=0.0936, pruned_loss=0.01402, audio_tagging_loss=0.009026, over 3056573.44 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:49:59,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355250 2023-11-23 11:50:15,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2368400.0, ans=0.125 2023-11-23 11:50:39,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2368533.3333333335, ans=0.125 2023-11-23 11:50:53,260 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6600, loss[loss=0.09339, simple_loss=0.1218, pruned_loss=0.02407, audio_tagging_loss=0.00839, over 15284.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09276, pruned_loss=0.01395, audio_tagging_loss=0.00893, over 3053811.94 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:50:56,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2368600.0, ans=0.1 2023-11-23 11:51:01,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-23 11:51:03,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355300 2023-11-23 11:51:06,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2368666.6666666665, ans=0.125 2023-11-23 11:51:14,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2368666.6666666665, ans=0.125 2023-11-23 11:51:18,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2368733.3333333335, ans=0.0 2023-11-23 11:51:21,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2368733.3333333335, ans=0.125 2023-11-23 11:51:27,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2368733.3333333335, ans=0.0 2023-11-23 11:51:31,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2368800.0, ans=0.0 2023-11-23 11:51:34,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-23 11:51:50,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.216e+01 8.892e+01 9.603e+01 1.376e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-23 11:51:55,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2368866.6666666665, ans=0.1 2023-11-23 11:51:58,394 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6650, loss[loss=0.06554, simple_loss=0.0811, pruned_loss=0.0141, audio_tagging_loss=0.01088, over 15734.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09209, pruned_loss=0.01401, audio_tagging_loss=0.008928, over 3052347.89 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:52:07,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2368933.3333333335, ans=0.125 2023-11-23 11:52:08,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355350 2023-11-23 11:52:11,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2369000.0, ans=10.0 2023-11-23 11:52:35,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2369133.3333333335, ans=0.015 2023-11-23 11:52:35,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2369133.3333333335, ans=0.0 2023-11-23 11:52:42,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2369133.3333333335, ans=0.125 2023-11-23 11:53:03,801 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6700, loss[loss=0.06151, simple_loss=0.07696, pruned_loss=0.01374, audio_tagging_loss=0.009292, over 14223.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09176, pruned_loss=0.01359, audio_tagging_loss=0.008952, over 3056655.48 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:53:10,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2369266.6666666665, ans=0.125 2023-11-23 11:53:10,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2023-11-23 11:53:13,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355400 2023-11-23 11:53:47,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2369466.6666666665, ans=0.125 2023-11-23 11:53:50,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2369466.6666666665, ans=0.0 2023-11-23 11:54:01,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.203e+01 8.988e+01 9.539e+01 1.363e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-23 11:54:05,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2369533.3333333335, ans=0.0 2023-11-23 11:54:06,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-23 11:54:08,666 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6750, loss[loss=0.07944, simple_loss=0.1079, pruned_loss=0.01813, audio_tagging_loss=0.007373, over 15375.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09191, pruned_loss=0.01364, audio_tagging_loss=0.00902, over 3049081.40 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:54:19,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355450 2023-11-23 11:54:22,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2369666.6666666665, ans=0.1 2023-11-23 11:54:23,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2369666.6666666665, ans=0.1 2023-11-23 11:54:24,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2369666.6666666665, ans=0.125 2023-11-23 11:54:36,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2369733.3333333335, ans=0.1 2023-11-23 11:54:47,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2369800.0, ans=0.0 2023-11-23 11:55:02,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2369866.6666666665, ans=0.0 2023-11-23 11:55:07,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2369866.6666666665, ans=0.125 2023-11-23 11:55:13,492 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6800, loss[loss=0.04655, simple_loss=0.05429, pruned_loss=0.00826, audio_tagging_loss=0.01115, over 15926.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09245, pruned_loss=0.01387, audio_tagging_loss=0.008911, over 3048154.46 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:55:21,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2369933.3333333335, ans=0.125 2023-11-23 11:55:24,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355500 2023-11-23 11:55:24,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2369933.3333333335, ans=0.125 2023-11-23 11:55:30,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2370000.0, ans=0.025 2023-11-23 11:55:30,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2370000.0, ans=0.07 2023-11-23 11:56:02,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2370133.3333333335, ans=0.125 2023-11-23 11:56:08,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2370200.0, ans=0.125 2023-11-23 11:56:12,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.201e+01 8.354e+01 8.978e+01 9.690e+01 1.235e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 11:56:19,027 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6850, loss[loss=0.08773, simple_loss=0.1215, pruned_loss=0.0181, audio_tagging_loss=0.008892, over 15369.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09267, pruned_loss=0.01398, audio_tagging_loss=0.008936, over 3043788.00 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:56:29,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355550 2023-11-23 11:56:33,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2370333.3333333335, ans=0.0 2023-11-23 11:56:42,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=22.5 2023-11-23 11:56:49,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2370400.0, ans=0.04949747468305833 2023-11-23 11:56:49,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2370400.0, ans=0.1 2023-11-23 11:56:54,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2370400.0, ans=0.0 2023-11-23 11:57:24,307 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6900, loss[loss=0.04796, simple_loss=0.07041, pruned_loss=0.005392, audio_tagging_loss=0.007362, over 16554.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09266, pruned_loss=0.014, audio_tagging_loss=0.00897, over 3049262.61 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:57:27,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-23 11:57:34,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355600 2023-11-23 11:58:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2370800.0, ans=0.125 2023-11-23 11:58:09,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2370800.0, ans=0.07 2023-11-23 11:58:15,723 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 11:58:23,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.385e+01 8.957e+01 9.598e+01 1.316e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 11:58:25,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.0 2023-11-23 11:58:27,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2370866.6666666665, ans=0.0 2023-11-23 11:58:29,827 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 6950, loss[loss=0.05615, simple_loss=0.07518, pruned_loss=0.01052, audio_tagging_loss=0.008032, over 16312.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09279, pruned_loss=0.01417, audio_tagging_loss=0.008915, over 3050195.50 frames. ], batch size: 66, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:58:41,183 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355650 2023-11-23 11:58:51,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-23 11:58:59,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-23 11:59:13,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2371133.3333333335, ans=0.125 2023-11-23 11:59:34,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2371200.0, ans=0.0 2023-11-23 11:59:36,260 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7000, loss[loss=0.06116, simple_loss=0.08066, pruned_loss=0.0143, audio_tagging_loss=0.006531, over 16239.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09215, pruned_loss=0.01404, audio_tagging_loss=0.008952, over 3049937.76 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 11:59:44,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2023-11-23 11:59:46,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355700 2023-11-23 11:59:49,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2371333.3333333335, ans=0.125 2023-11-23 11:59:54,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-11-23 11:59:54,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2023-11-23 11:59:59,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2371333.3333333335, ans=0.2 2023-11-23 12:00:00,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-23 12:00:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2371466.6666666665, ans=0.125 2023-11-23 12:00:24,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2371466.6666666665, ans=0.1 2023-11-23 12:00:34,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.338e+01 8.854e+01 9.577e+01 1.141e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 12:00:39,853 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7050, loss[loss=0.05548, simple_loss=0.07403, pruned_loss=0.01079, audio_tagging_loss=0.007678, over 14805.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09113, pruned_loss=0.01387, audio_tagging_loss=0.00907, over 3044090.30 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:00:46,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2371600.0, ans=0.125 2023-11-23 12:00:49,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355750 2023-11-23 12:00:49,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2371600.0, ans=0.125 2023-11-23 12:00:53,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2371666.6666666665, ans=0.0 2023-11-23 12:01:08,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2371733.3333333335, ans=0.125 2023-11-23 12:01:12,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2371733.3333333335, ans=0.1 2023-11-23 12:01:26,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2371800.0, ans=0.1 2023-11-23 12:01:34,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2371866.6666666665, ans=0.125 2023-11-23 12:01:43,829 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7100, loss[loss=0.05973, simple_loss=0.08014, pruned_loss=0.008305, audio_tagging_loss=0.01136, over 15756.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09197, pruned_loss=0.01393, audio_tagging_loss=0.009116, over 3048883.83 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:01:54,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355800 2023-11-23 12:02:09,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2372066.6666666665, ans=0.0 2023-11-23 12:02:20,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2372066.6666666665, ans=0.125 2023-11-23 12:02:24,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-23 12:02:27,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 12:02:39,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-23 12:02:43,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.520e+01 9.184e+01 9.861e+01 1.233e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-23 12:02:48,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2023-11-23 12:02:49,718 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7150, loss[loss=0.07003, simple_loss=0.09322, pruned_loss=0.01375, audio_tagging_loss=0.009674, over 14289.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09229, pruned_loss=0.01405, audio_tagging_loss=0.009095, over 3048931.39 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:02:56,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-23 12:03:00,184 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355850 2023-11-23 12:03:00,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2372266.6666666665, ans=22.5 2023-11-23 12:03:03,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2372333.3333333335, ans=0.1 2023-11-23 12:03:05,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2372333.3333333335, ans=0.125 2023-11-23 12:03:05,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2372333.3333333335, ans=0.0 2023-11-23 12:03:21,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2372400.0, ans=0.125 2023-11-23 12:03:28,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2023-11-23 12:03:29,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-23 12:03:32,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2372466.6666666665, ans=0.2 2023-11-23 12:03:34,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2372466.6666666665, ans=0.1 2023-11-23 12:03:53,731 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7200, loss[loss=0.07086, simple_loss=0.09559, pruned_loss=0.01532, audio_tagging_loss=0.00775, over 14910.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09285, pruned_loss=0.01414, audio_tagging_loss=0.009134, over 3049007.16 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:04:03,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355900 2023-11-23 12:04:06,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2372666.6666666665, ans=0.125 2023-11-23 12:04:07,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2372666.6666666665, ans=0.125 2023-11-23 12:04:08,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2372666.6666666665, ans=0.125 2023-11-23 12:04:08,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2372666.6666666665, ans=0.125 2023-11-23 12:04:08,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2372666.6666666665, ans=0.125 2023-11-23 12:04:08,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2372666.6666666665, ans=0.5 2023-11-23 12:04:29,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2372733.3333333335, ans=0.5 2023-11-23 12:04:30,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2372800.0, ans=0.125 2023-11-23 12:04:52,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.499e+01 9.077e+01 9.977e+01 1.549e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 12:04:57,113 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7250, loss[loss=0.05738, simple_loss=0.06724, pruned_loss=0.01256, audio_tagging_loss=0.0112, over 15398.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09277, pruned_loss=0.01409, audio_tagging_loss=0.009248, over 3038687.55 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:05:07,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 355950 2023-11-23 12:05:19,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2373000.0, ans=0.1 2023-11-23 12:05:25,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2373066.6666666665, ans=0.025 2023-11-23 12:05:29,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2373066.6666666665, ans=0.125 2023-11-23 12:05:30,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-23 12:05:47,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2373200.0, ans=0.125 2023-11-23 12:06:01,448 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7300, loss[loss=0.08902, simple_loss=0.1171, pruned_loss=0.02403, audio_tagging_loss=0.006448, over 14771.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09295, pruned_loss=0.01402, audio_tagging_loss=0.009167, over 3038284.06 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:06:02,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2023-11-23 12:06:12,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356000 2023-11-23 12:06:34,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=22.5 2023-11-23 12:06:40,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2373400.0, ans=0.125 2023-11-23 12:06:42,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2373466.6666666665, ans=0.1 2023-11-23 12:06:43,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2373466.6666666665, ans=0.125 2023-11-23 12:06:51,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2373466.6666666665, ans=0.125 2023-11-23 12:06:57,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2373533.3333333335, ans=0.0 2023-11-23 12:07:00,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-23 12:07:05,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.323e+01 8.907e+01 9.427e+01 1.100e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 12:07:10,570 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7350, loss[loss=0.05943, simple_loss=0.07439, pruned_loss=0.01276, audio_tagging_loss=0.009469, over 14950.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09202, pruned_loss=0.01387, audio_tagging_loss=0.009095, over 3048470.80 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:07:12,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2373600.0, ans=0.0 2023-11-23 12:07:18,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2373600.0, ans=0.125 2023-11-23 12:07:20,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356050 2023-11-23 12:07:22,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-11-23 12:07:28,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2373666.6666666665, ans=0.125 2023-11-23 12:07:31,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2373666.6666666665, ans=0.0 2023-11-23 12:07:33,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2373666.6666666665, ans=0.125 2023-11-23 12:07:33,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2023-11-23 12:08:03,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2023-11-23 12:08:14,486 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7400, loss[loss=0.06577, simple_loss=0.09158, pruned_loss=0.01121, audio_tagging_loss=0.008769, over 16119.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.091, pruned_loss=0.01349, audio_tagging_loss=0.009047, over 3050634.20 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:08:24,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356100 2023-11-23 12:08:50,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-23 12:08:53,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2374133.3333333335, ans=0.2 2023-11-23 12:09:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2374200.0, ans=0.0 2023-11-23 12:09:13,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.433e+01 8.940e+01 9.689e+01 1.513e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-23 12:09:16,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2374200.0, ans=0.125 2023-11-23 12:09:18,583 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7450, loss[loss=0.06268, simple_loss=0.08561, pruned_loss=0.01042, audio_tagging_loss=0.009454, over 15906.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09124, pruned_loss=0.0136, audio_tagging_loss=0.009037, over 3048575.71 frames. ], batch size: 63, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:09:29,066 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356150 2023-11-23 12:09:50,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2374400.0, ans=0.125 2023-11-23 12:09:56,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2374466.6666666665, ans=0.125 2023-11-23 12:10:24,406 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7500, loss[loss=0.06742, simple_loss=0.09725, pruned_loss=0.009186, audio_tagging_loss=0.009613, over 14478.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09175, pruned_loss=0.0137, audio_tagging_loss=0.008913, over 3050512.20 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:10:33,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2374600.0, ans=0.0 2023-11-23 12:10:34,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356200 2023-11-23 12:10:37,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2374666.6666666665, ans=0.04949747468305833 2023-11-23 12:10:44,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2374666.6666666665, ans=0.5 2023-11-23 12:10:49,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2374733.3333333335, ans=0.015 2023-11-23 12:11:08,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2374800.0, ans=0.04949747468305833 2023-11-23 12:11:23,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.224e+01 8.821e+01 9.416e+01 1.230e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-23 12:11:26,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2374866.6666666665, ans=0.0 2023-11-23 12:11:28,801 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7550, loss[loss=0.0886, simple_loss=0.123, pruned_loss=0.02005, audio_tagging_loss=0.007033, over 15894.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.0921, pruned_loss=0.01364, audio_tagging_loss=0.008807, over 3050141.75 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:11:35,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2023-11-23 12:11:38,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2023-11-23 12:11:38,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356250 2023-11-23 12:11:41,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2375000.0, ans=0.125 2023-11-23 12:11:52,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2375000.0, ans=0.125 2023-11-23 12:12:01,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2375066.6666666665, ans=0.1 2023-11-23 12:12:08,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2375133.3333333335, ans=0.125 2023-11-23 12:12:34,256 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7600, loss[loss=0.0621, simple_loss=0.08012, pruned_loss=0.01141, audio_tagging_loss=0.01062, over 15465.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.09226, pruned_loss=0.01378, audio_tagging_loss=0.008864, over 3046203.15 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:12:44,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356300 2023-11-23 12:13:00,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2375400.0, ans=0.2 2023-11-23 12:13:02,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2375400.0, ans=0.02 2023-11-23 12:13:02,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2375400.0, ans=10.0 2023-11-23 12:13:10,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2375400.0, ans=0.2 2023-11-23 12:13:19,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2375466.6666666665, ans=0.125 2023-11-23 12:13:35,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.205e+01 8.929e+01 9.831e+01 1.216e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 12:13:39,101 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7650, loss[loss=0.06225, simple_loss=0.07107, pruned_loss=0.01203, audio_tagging_loss=0.01469, over 15893.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09079, pruned_loss=0.01369, audio_tagging_loss=0.008906, over 3039505.81 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:13:48,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2375600.0, ans=0.0 2023-11-23 12:13:50,290 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356350 2023-11-23 12:13:50,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2375600.0, ans=0.0 2023-11-23 12:13:54,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2375666.6666666665, ans=0.125 2023-11-23 12:14:00,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2375666.6666666665, ans=0.2 2023-11-23 12:14:06,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2375733.3333333335, ans=0.125 2023-11-23 12:14:12,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-11-23 12:14:13,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2375733.3333333335, ans=0.125 2023-11-23 12:14:32,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2375866.6666666665, ans=0.0 2023-11-23 12:14:38,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2375866.6666666665, ans=0.125 2023-11-23 12:14:40,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2375866.6666666665, ans=10.0 2023-11-23 12:14:44,727 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7700, loss[loss=0.05343, simple_loss=0.07038, pruned_loss=0.008084, audio_tagging_loss=0.01015, over 14911.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09121, pruned_loss=0.01362, audio_tagging_loss=0.008956, over 3041328.63 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:14:44,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2375933.3333333335, ans=0.125 2023-11-23 12:14:45,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2375933.3333333335, ans=0.1 2023-11-23 12:14:54,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356400 2023-11-23 12:15:06,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2376000.0, ans=0.1 2023-11-23 12:15:11,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2376066.6666666665, ans=0.125 2023-11-23 12:15:12,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2376066.6666666665, ans=0.1 2023-11-23 12:15:15,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2376066.6666666665, ans=0.2 2023-11-23 12:15:22,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2376133.3333333335, ans=0.0 2023-11-23 12:15:22,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-23 12:15:25,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2376133.3333333335, ans=0.0 2023-11-23 12:15:45,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.176e+01 8.917e+01 9.503e+01 1.166e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 12:15:49,145 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7750, loss[loss=0.05401, simple_loss=0.07379, pruned_loss=0.007681, audio_tagging_loss=0.009435, over 14265.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09121, pruned_loss=0.01357, audio_tagging_loss=0.008988, over 3039171.22 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:15:53,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2376266.6666666665, ans=0.125 2023-11-23 12:15:58,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2376266.6666666665, ans=0.025 2023-11-23 12:16:00,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356450 2023-11-23 12:16:00,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2376266.6666666665, ans=0.125 2023-11-23 12:16:54,247 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7800, loss[loss=0.04944, simple_loss=0.06381, pruned_loss=0.006581, audio_tagging_loss=0.01095, over 14593.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09094, pruned_loss=0.01362, audio_tagging_loss=0.008973, over 3035369.20 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:17:04,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356500 2023-11-23 12:17:12,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2023-11-23 12:17:33,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2376800.0, ans=0.1 2023-11-23 12:17:39,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-23 12:17:54,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.181e+01 8.920e+01 9.526e+01 1.528e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 12:17:58,452 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7850, loss[loss=0.08158, simple_loss=0.1108, pruned_loss=0.01572, audio_tagging_loss=0.01048, over 15651.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09172, pruned_loss=0.01369, audio_tagging_loss=0.009086, over 3033332.27 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:18:09,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356550 2023-11-23 12:18:22,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2377000.0, ans=0.125 2023-11-23 12:18:45,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2377133.3333333335, ans=15.0 2023-11-23 12:18:46,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2377133.3333333335, ans=0.125 2023-11-23 12:18:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2377200.0, ans=0.125 2023-11-23 12:18:54,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2377200.0, ans=0.125 2023-11-23 12:19:02,625 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7900, loss[loss=0.08806, simple_loss=0.1217, pruned_loss=0.0191, audio_tagging_loss=0.008133, over 15757.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09209, pruned_loss=0.01383, audio_tagging_loss=0.009099, over 3046851.62 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:19:08,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-23 12:19:13,255 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356600 2023-11-23 12:19:37,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-23 12:19:38,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2377400.0, ans=0.05 2023-11-23 12:19:42,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2377466.6666666665, ans=10.0 2023-11-23 12:20:04,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 8.633e+01 9.246e+01 1.018e+02 1.529e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-23 12:20:08,290 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 7950, loss[loss=0.0851, simple_loss=0.1142, pruned_loss=0.02094, audio_tagging_loss=0.007039, over 15785.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09196, pruned_loss=0.01386, audio_tagging_loss=0.009202, over 3047204.11 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:20:12,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2377600.0, ans=0.1 2023-11-23 12:20:18,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356650 2023-11-23 12:20:22,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2377666.6666666665, ans=0.125 2023-11-23 12:20:26,008 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 12:21:03,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2377866.6666666665, ans=0.2 2023-11-23 12:21:12,858 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8000, loss[loss=0.05597, simple_loss=0.08006, pruned_loss=0.008236, audio_tagging_loss=0.007703, over 16008.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09166, pruned_loss=0.01377, audio_tagging_loss=0.009305, over 3045909.29 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:21:22,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2377933.3333333335, ans=0.125 2023-11-23 12:21:23,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356700 2023-11-23 12:21:26,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2378000.0, ans=10.0 2023-11-23 12:21:43,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2378066.6666666665, ans=0.125 2023-11-23 12:21:49,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-11-23 12:21:50,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=12.0 2023-11-23 12:22:09,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2378200.0, ans=0.125 2023-11-23 12:22:15,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.250e+01 8.793e+01 9.379e+01 1.192e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 12:22:18,451 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8050, loss[loss=0.06192, simple_loss=0.08155, pruned_loss=0.013, audio_tagging_loss=0.008151, over 14777.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09218, pruned_loss=0.01392, audio_tagging_loss=0.009349, over 3051180.14 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:22:28,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356750 2023-11-23 12:22:29,064 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 12:22:46,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2378400.0, ans=0.5 2023-11-23 12:22:51,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-23 12:22:56,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-23 12:22:59,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=22.5 2023-11-23 12:23:18,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-11-23 12:23:23,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2378600.0, ans=0.0 2023-11-23 12:23:24,512 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8100, loss[loss=0.06872, simple_loss=0.09558, pruned_loss=0.0135, audio_tagging_loss=0.00743, over 14824.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09275, pruned_loss=0.01398, audio_tagging_loss=0.009155, over 3050426.10 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:23:34,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356800 2023-11-23 12:23:38,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2378666.6666666665, ans=0.1 2023-11-23 12:23:42,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2378666.6666666665, ans=0.1 2023-11-23 12:23:44,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2023-11-23 12:23:54,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2378733.3333333335, ans=0.125 2023-11-23 12:24:17,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2378866.6666666665, ans=0.0 2023-11-23 12:24:23,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-23 12:24:27,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.660e+01 9.350e+01 9.941e+01 1.281e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-23 12:24:29,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2378933.3333333335, ans=0.125 2023-11-23 12:24:30,483 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8150, loss[loss=0.06738, simple_loss=0.09073, pruned_loss=0.01305, audio_tagging_loss=0.008961, over 15492.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09251, pruned_loss=0.0139, audio_tagging_loss=0.009063, over 3054592.67 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:24:40,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356850 2023-11-23 12:24:55,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2379066.6666666665, ans=0.0 2023-11-23 12:25:34,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-23 12:25:34,957 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8200, loss[loss=0.09014, simple_loss=0.1229, pruned_loss=0.02082, audio_tagging_loss=0.007855, over 15207.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09367, pruned_loss=0.01405, audio_tagging_loss=0.008903, over 3049692.29 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:25:37,467 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 12:25:41,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2379266.6666666665, ans=0.1 2023-11-23 12:25:46,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356900 2023-11-23 12:25:46,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2379266.6666666665, ans=0.125 2023-11-23 12:25:47,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-23 12:26:27,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2379533.3333333335, ans=0.1 2023-11-23 12:26:38,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.430e+01 9.137e+01 9.990e+01 1.706e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-23 12:26:40,849 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8250, loss[loss=0.08857, simple_loss=0.1195, pruned_loss=0.01893, audio_tagging_loss=0.009894, over 15821.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.0946, pruned_loss=0.01417, audio_tagging_loss=0.00881, over 3051202.75 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:26:44,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2379600.0, ans=0.05 2023-11-23 12:26:47,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2023-11-23 12:26:51,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 356950 2023-11-23 12:27:13,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2379733.3333333335, ans=0.1 2023-11-23 12:27:13,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2379733.3333333335, ans=0.025 2023-11-23 12:27:14,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2379733.3333333335, ans=0.125 2023-11-23 12:27:23,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2023-11-23 12:27:24,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2023-11-23 12:27:25,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2379800.0, ans=0.5 2023-11-23 12:27:30,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2379800.0, ans=0.0 2023-11-23 12:27:45,807 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8300, loss[loss=0.06533, simple_loss=0.0916, pruned_loss=0.01325, audio_tagging_loss=0.006277, over 16230.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09333, pruned_loss=0.01405, audio_tagging_loss=0.008868, over 3057014.69 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:27:48,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2379933.3333333335, ans=0.125 2023-11-23 12:27:55,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357000 2023-11-23 12:28:09,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2023-11-23 12:28:16,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2380066.6666666665, ans=0.125 2023-11-23 12:28:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2380066.6666666665, ans=0.2 2023-11-23 12:28:42,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2380200.0, ans=0.0 2023-11-23 12:28:47,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.543e+01 9.003e+01 9.782e+01 1.205e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 12:28:50,042 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8350, loss[loss=0.06061, simple_loss=0.07308, pruned_loss=0.01353, audio_tagging_loss=0.01054, over 15412.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.09343, pruned_loss=0.01398, audio_tagging_loss=0.00894, over 3061870.30 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:28:50,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-23 12:28:58,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-23 12:29:00,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357050 2023-11-23 12:29:16,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2380400.0, ans=0.125 2023-11-23 12:29:54,898 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8400, loss[loss=0.07189, simple_loss=0.08828, pruned_loss=0.01683, audio_tagging_loss=0.01092, over 15300.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09271, pruned_loss=0.01397, audio_tagging_loss=0.008998, over 3063976.49 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:29:56,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2380600.0, ans=0.125 2023-11-23 12:30:05,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357100 2023-11-23 12:30:07,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2380666.6666666665, ans=0.125 2023-11-23 12:30:17,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2380666.6666666665, ans=0.125 2023-11-23 12:30:28,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2380733.3333333335, ans=0.025 2023-11-23 12:30:49,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2380866.6666666665, ans=0.07 2023-11-23 12:30:54,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2380866.6666666665, ans=0.2 2023-11-23 12:30:57,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.912e+01 8.179e+01 8.935e+01 9.716e+01 1.287e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 12:30:59,209 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8450, loss[loss=0.05702, simple_loss=0.08172, pruned_loss=0.009538, audio_tagging_loss=0.006619, over 16027.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09253, pruned_loss=0.01404, audio_tagging_loss=0.009042, over 3062276.36 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:31:09,735 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357150 2023-11-23 12:31:12,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2381000.0, ans=0.125 2023-11-23 12:31:13,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2381000.0, ans=0.125 2023-11-23 12:31:44,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2381133.3333333335, ans=0.09899494936611666 2023-11-23 12:31:55,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2381200.0, ans=0.0 2023-11-23 12:32:03,524 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8500, loss[loss=0.08677, simple_loss=0.1106, pruned_loss=0.02393, audio_tagging_loss=0.007562, over 15273.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.09244, pruned_loss=0.01407, audio_tagging_loss=0.009093, over 3065749.98 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:32:13,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357200 2023-11-23 12:32:21,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2381333.3333333335, ans=0.2 2023-11-23 12:32:41,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2381466.6666666665, ans=0.0 2023-11-23 12:32:44,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2381466.6666666665, ans=0.125 2023-11-23 12:32:45,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2381466.6666666665, ans=0.125 2023-11-23 12:32:59,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2381533.3333333335, ans=0.1 2023-11-23 12:33:06,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.727e+01 8.293e+01 8.960e+01 9.755e+01 1.208e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 12:33:08,140 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8550, loss[loss=0.05734, simple_loss=0.07471, pruned_loss=0.009632, audio_tagging_loss=0.01035, over 14787.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09156, pruned_loss=0.01384, audio_tagging_loss=0.009238, over 3058311.27 frames. ], batch size: 57, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:33:12,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2381600.0, ans=0.125 2023-11-23 12:33:18,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2381600.0, ans=0.125 2023-11-23 12:33:18,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2023-11-23 12:33:19,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357250 2023-11-23 12:33:24,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2381666.6666666665, ans=0.125 2023-11-23 12:33:30,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-23 12:33:37,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2381733.3333333335, ans=0.125 2023-11-23 12:33:44,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-23 12:33:47,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-23 12:34:00,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2381866.6666666665, ans=0.0 2023-11-23 12:34:11,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2381866.6666666665, ans=0.04949747468305833 2023-11-23 12:34:13,813 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8600, loss[loss=0.07966, simple_loss=0.1072, pruned_loss=0.01451, audio_tagging_loss=0.01155, over 16015.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09184, pruned_loss=0.01384, audio_tagging_loss=0.009282, over 3054613.31 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:34:15,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-11-23 12:34:24,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357300 2023-11-23 12:34:32,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2023-11-23 12:35:10,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2382200.0, ans=0.0 2023-11-23 12:35:11,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=8.0 2023-11-23 12:35:11,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2382200.0, ans=0.0 2023-11-23 12:35:17,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.339e+01 9.203e+01 9.998e+01 1.297e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-23 12:35:18,839 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8650, loss[loss=0.07498, simple_loss=0.09536, pruned_loss=0.01772, audio_tagging_loss=0.009577, over 14962.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09135, pruned_loss=0.0137, audio_tagging_loss=0.009316, over 3057509.83 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:35:20,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2382266.6666666665, ans=0.2 2023-11-23 12:35:22,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-23 12:35:26,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2382266.6666666665, ans=0.125 2023-11-23 12:35:28,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357350 2023-11-23 12:35:53,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2382400.0, ans=0.125 2023-11-23 12:35:59,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2382466.6666666665, ans=0.0 2023-11-23 12:35:59,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2382466.6666666665, ans=0.125 2023-11-23 12:36:00,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2382466.6666666665, ans=0.0 2023-11-23 12:36:02,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2382466.6666666665, ans=0.125 2023-11-23 12:36:02,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2382466.6666666665, ans=0.0 2023-11-23 12:36:17,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2382533.3333333335, ans=0.1 2023-11-23 12:36:22,783 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8700, loss[loss=0.07375, simple_loss=0.09114, pruned_loss=0.0163, audio_tagging_loss=0.01188, over 16514.00 frames. ], tot_loss[loss=0.06909, simple_loss=0.09177, pruned_loss=0.01379, audio_tagging_loss=0.009417, over 3059538.27 frames. ], batch size: 63, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:36:33,991 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357400 2023-11-23 12:37:05,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-23 12:37:27,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.508e+01 9.323e+01 9.917e+01 1.440e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-23 12:37:28,927 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8750, loss[loss=0.05103, simple_loss=0.06399, pruned_loss=0.00918, audio_tagging_loss=0.009853, over 13492.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09141, pruned_loss=0.01385, audio_tagging_loss=0.009446, over 3059162.19 frames. ], batch size: 52, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:37:32,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2382933.3333333335, ans=0.125 2023-11-23 12:37:40,182 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357450 2023-11-23 12:37:54,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2383066.6666666665, ans=0.1 2023-11-23 12:37:57,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2383066.6666666665, ans=0.0 2023-11-23 12:38:35,186 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8800, loss[loss=0.07913, simple_loss=0.1119, pruned_loss=0.01481, audio_tagging_loss=0.008381, over 15229.00 frames. ], tot_loss[loss=0.0692, simple_loss=0.09187, pruned_loss=0.01392, audio_tagging_loss=0.009342, over 3059037.35 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 32.0 2023-11-23 12:38:39,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2383266.6666666665, ans=0.125 2023-11-23 12:38:44,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357500 2023-11-23 12:38:45,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2383266.6666666665, ans=0.1 2023-11-23 12:38:46,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2383333.3333333335, ans=0.125 2023-11-23 12:38:52,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-11-23 12:38:55,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2383333.3333333335, ans=0.125 2023-11-23 12:39:06,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2383400.0, ans=0.1 2023-11-23 12:39:39,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.582e+01 9.222e+01 9.925e+01 1.869e+02, threshold=1.844e+02, percent-clipped=1.0 2023-11-23 12:39:39,795 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8850, loss[loss=0.06401, simple_loss=0.08505, pruned_loss=0.01201, audio_tagging_loss=0.009476, over 14366.00 frames. ], tot_loss[loss=0.06968, simple_loss=0.09242, pruned_loss=0.01408, audio_tagging_loss=0.009379, over 3058144.64 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:39:43,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2383600.0, ans=0.125 2023-11-23 12:39:50,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357550 2023-11-23 12:39:54,618 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 12:40:27,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.51 vs. limit=5.0 2023-11-23 12:40:35,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2383866.6666666665, ans=0.2 2023-11-23 12:40:37,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2383866.6666666665, ans=10.0 2023-11-23 12:40:45,536 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8900, loss[loss=0.05691, simple_loss=0.08105, pruned_loss=0.008399, audio_tagging_loss=0.007987, over 15360.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09244, pruned_loss=0.01418, audio_tagging_loss=0.009259, over 3063325.70 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:40:55,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-23 12:40:55,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357600 2023-11-23 12:40:57,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2384000.0, ans=0.125 2023-11-23 12:41:21,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2384066.6666666665, ans=0.0 2023-11-23 12:41:26,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2384133.3333333335, ans=0.0 2023-11-23 12:41:37,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2023-11-23 12:41:41,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2384200.0, ans=0.125 2023-11-23 12:41:48,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2384200.0, ans=0.07 2023-11-23 12:41:51,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.078e+01 8.629e+01 9.554e+01 1.140e+02, threshold=1.726e+02, percent-clipped=0.0 2023-11-23 12:41:51,064 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 8950, loss[loss=0.06052, simple_loss=0.08643, pruned_loss=0.00825, audio_tagging_loss=0.009055, over 14971.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09232, pruned_loss=0.01397, audio_tagging_loss=0.009051, over 3062160.21 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:41:55,014 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 12:42:00,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357650 2023-11-23 12:42:11,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-23 12:42:33,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2384466.6666666665, ans=0.125 2023-11-23 12:42:39,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-23 12:42:54,532 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9000, loss[loss=0.07841, simple_loss=0.105, pruned_loss=0.019, audio_tagging_loss=0.006916, over 15954.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09266, pruned_loss=0.01403, audio_tagging_loss=0.008954, over 3064932.09 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:42:54,532 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 12:43:13,935 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3151, 4.8303, 5.2093, 4.5725], device='cuda:1') 2023-11-23 12:43:18,584 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3271, 4.9599, 5.3016, 4.8250], device='cuda:1') 2023-11-23 12:43:36,343 INFO [train_asr.py:1253] (1/4) Epoch 30, validation: loss=0.05877, simple_loss=0.051, pruned_loss=0.005026, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-23 12:43:36,344 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 12:43:40,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=22.5 2023-11-23 12:43:46,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=2384600.0, ans=0.1 2023-11-23 12:43:46,863 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357700 2023-11-23 12:43:47,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2384600.0, ans=0.125 2023-11-23 12:44:25,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2384800.0, ans=0.1 2023-11-23 12:44:32,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-23 12:44:41,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.530e+01 8.986e+01 9.928e+01 1.282e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 12:44:41,177 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9050, loss[loss=0.07952, simple_loss=0.1081, pruned_loss=0.01862, audio_tagging_loss=0.006852, over 14607.00 frames. ], tot_loss[loss=0.06964, simple_loss=0.09324, pruned_loss=0.01413, audio_tagging_loss=0.008888, over 3065396.17 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 16.0 2023-11-23 12:44:51,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357750 2023-11-23 12:44:52,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2385000.0, ans=0.125 2023-11-23 12:45:39,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2385200.0, ans=0.125 2023-11-23 12:45:44,791 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9100, loss[loss=0.0676, simple_loss=0.09052, pruned_loss=0.01444, audio_tagging_loss=0.007902, over 14969.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09229, pruned_loss=0.01381, audio_tagging_loss=0.008918, over 3060727.68 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:45:55,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357800 2023-11-23 12:46:49,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.272e+01 8.936e+01 9.614e+01 1.250e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 12:46:49,849 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9150, loss[loss=0.07178, simple_loss=0.09433, pruned_loss=0.01633, audio_tagging_loss=0.008288, over 14730.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09154, pruned_loss=0.01374, audio_tagging_loss=0.008934, over 3051167.23 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:46:50,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2385600.0, ans=0.0 2023-11-23 12:46:51,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2385600.0, ans=0.125 2023-11-23 12:46:51,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-23 12:47:00,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357850 2023-11-23 12:47:01,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2385666.6666666665, ans=0.2 2023-11-23 12:47:20,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2385733.3333333335, ans=0.0 2023-11-23 12:47:25,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2385733.3333333335, ans=0.2 2023-11-23 12:47:26,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2385800.0, ans=0.0 2023-11-23 12:47:38,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2385800.0, ans=0.035 2023-11-23 12:47:42,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2385866.6666666665, ans=0.05 2023-11-23 12:47:53,028 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9200, loss[loss=0.06852, simple_loss=0.09094, pruned_loss=0.01378, audio_tagging_loss=0.009267, over 15591.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09075, pruned_loss=0.01371, audio_tagging_loss=0.008974, over 3052376.85 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 12:47:54,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2385933.3333333335, ans=0.125 2023-11-23 12:48:03,477 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357900 2023-11-23 12:48:05,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2386000.0, ans=0.1 2023-11-23 12:48:07,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2386000.0, ans=0.0 2023-11-23 12:48:31,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2386133.3333333335, ans=0.125 2023-11-23 12:48:45,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2386200.0, ans=0.2 2023-11-23 12:48:53,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2386200.0, ans=0.2 2023-11-23 12:48:56,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.180e+01 8.978e+01 9.494e+01 1.363e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 12:48:56,556 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9250, loss[loss=0.0825, simple_loss=0.1091, pruned_loss=0.01768, audio_tagging_loss=0.01026, over 16178.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09047, pruned_loss=0.01356, audio_tagging_loss=0.009022, over 3049577.11 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 12:48:58,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2386266.6666666665, ans=0.2 2023-11-23 12:49:06,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 357950 2023-11-23 12:49:08,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2386333.3333333335, ans=0.2 2023-11-23 12:49:17,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2386333.3333333335, ans=0.125 2023-11-23 12:49:36,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-23 12:49:48,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2386533.3333333335, ans=0.125 2023-11-23 12:49:58,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2386533.3333333335, ans=0.02 2023-11-23 12:49:59,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2386600.0, ans=0.125 2023-11-23 12:50:00,436 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9300, loss[loss=0.0823, simple_loss=0.1126, pruned_loss=0.01877, audio_tagging_loss=0.007238, over 15096.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09163, pruned_loss=0.01378, audio_tagging_loss=0.008955, over 3056190.99 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 12:50:05,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2386600.0, ans=0.04949747468305833 2023-11-23 12:50:06,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2386600.0, ans=0.1 2023-11-23 12:50:10,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358000 2023-11-23 12:50:13,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2386666.6666666665, ans=0.0 2023-11-23 12:50:33,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2386733.3333333335, ans=0.0 2023-11-23 12:50:43,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2386800.0, ans=0.2 2023-11-23 12:50:45,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-23 12:50:50,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2386866.6666666665, ans=0.125 2023-11-23 12:50:57,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-23 12:51:03,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2023-11-23 12:51:04,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.283e+01 8.829e+01 9.752e+01 1.235e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 12:51:04,247 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9350, loss[loss=0.06642, simple_loss=0.08728, pruned_loss=0.01406, audio_tagging_loss=0.008718, over 14570.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.0921, pruned_loss=0.01388, audio_tagging_loss=0.009014, over 3056872.09 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 12:51:08,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2386933.3333333335, ans=0.0 2023-11-23 12:51:13,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358050 2023-11-23 12:51:26,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2387000.0, ans=0.0 2023-11-23 12:51:42,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2387133.3333333335, ans=0.125 2023-11-23 12:52:07,399 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9400, loss[loss=0.07755, simple_loss=0.11, pruned_loss=0.01522, audio_tagging_loss=0.007355, over 14180.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09245, pruned_loss=0.01411, audio_tagging_loss=0.00913, over 3047902.94 frames. ], batch size: 51, lr: 2.25e-03, grad_scale: 8.0 2023-11-23 12:52:17,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358100 2023-11-23 12:52:31,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2387333.3333333335, ans=0.1 2023-11-23 12:52:35,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-23 12:52:36,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2023-11-23 12:52:52,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2023-11-23 12:53:09,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2387533.3333333335, ans=0.1 2023-11-23 12:53:10,722 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 12:53:11,878 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9450, loss[loss=0.07849, simple_loss=0.1072, pruned_loss=0.01525, audio_tagging_loss=0.009619, over 16154.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09195, pruned_loss=0.01403, audio_tagging_loss=0.009293, over 3056375.03 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 8.0 2023-11-23 12:53:14,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.523e+01 9.315e+01 1.049e+02 1.250e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-23 12:53:22,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358150 2023-11-23 12:53:27,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2387666.6666666665, ans=0.125 2023-11-23 12:53:41,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2387733.3333333335, ans=0.125 2023-11-23 12:53:52,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2387800.0, ans=0.125 2023-11-23 12:53:58,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2387800.0, ans=0.05 2023-11-23 12:54:03,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2387866.6666666665, ans=0.2 2023-11-23 12:54:08,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2387866.6666666665, ans=0.1 2023-11-23 12:54:08,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2387866.6666666665, ans=0.0 2023-11-23 12:54:16,494 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9500, loss[loss=0.05139, simple_loss=0.06739, pruned_loss=0.01108, audio_tagging_loss=0.006617, over 15437.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09148, pruned_loss=0.01395, audio_tagging_loss=0.009383, over 3054374.94 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 8.0 2023-11-23 12:54:26,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358200 2023-11-23 12:54:47,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2388066.6666666665, ans=0.125 2023-11-23 12:54:49,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2388066.6666666665, ans=0.0 2023-11-23 12:54:54,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2388133.3333333335, ans=0.125 2023-11-23 12:55:19,848 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9550, loss[loss=0.085, simple_loss=0.1017, pruned_loss=0.02259, audio_tagging_loss=0.01156, over 14385.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09154, pruned_loss=0.01388, audio_tagging_loss=0.009344, over 3051592.73 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 8.0 2023-11-23 12:55:22,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.501e+01 9.046e+01 9.743e+01 1.461e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 12:55:29,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358250 2023-11-23 12:55:37,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2388333.3333333335, ans=0.05 2023-11-23 12:55:40,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2388333.3333333335, ans=0.125 2023-11-23 12:55:44,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2388400.0, ans=0.0 2023-11-23 12:55:58,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-23 12:56:03,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.77 vs. limit=22.5 2023-11-23 12:56:06,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2388466.6666666665, ans=0.2 2023-11-23 12:56:23,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-23 12:56:24,035 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9600, loss[loss=0.07355, simple_loss=0.09976, pruned_loss=0.01503, audio_tagging_loss=0.008638, over 14594.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09188, pruned_loss=0.014, audio_tagging_loss=0.009385, over 3050532.85 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:56:24,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2388600.0, ans=0.125 2023-11-23 12:56:34,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358300 2023-11-23 12:56:44,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2388666.6666666665, ans=0.0 2023-11-23 12:56:53,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2388733.3333333335, ans=0.125 2023-11-23 12:56:59,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-23 12:57:09,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2388800.0, ans=0.0 2023-11-23 12:57:27,864 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9650, loss[loss=0.0494, simple_loss=0.06895, pruned_loss=0.006467, audio_tagging_loss=0.008452, over 15338.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09185, pruned_loss=0.01396, audio_tagging_loss=0.009336, over 3042994.14 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:57:30,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.376e+01 8.856e+01 9.540e+01 1.217e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 12:57:38,302 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358350 2023-11-23 12:57:43,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2389000.0, ans=0.1 2023-11-23 12:57:50,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2389000.0, ans=0.125 2023-11-23 12:57:58,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-23 12:58:08,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2023-11-23 12:58:27,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2389200.0, ans=0.125 2023-11-23 12:58:30,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2389266.6666666665, ans=0.05 2023-11-23 12:58:31,814 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9700, loss[loss=0.05203, simple_loss=0.06018, pruned_loss=0.009849, audio_tagging_loss=0.01209, over 15632.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09175, pruned_loss=0.01397, audio_tagging_loss=0.00919, over 3046205.75 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:58:41,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358400 2023-11-23 12:59:11,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2389466.6666666665, ans=0.0 2023-11-23 12:59:12,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2389466.6666666665, ans=0.1 2023-11-23 12:59:18,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2389466.6666666665, ans=0.2 2023-11-23 12:59:20,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2389466.6666666665, ans=0.1 2023-11-23 12:59:21,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2389466.6666666665, ans=0.125 2023-11-23 12:59:36,163 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9750, loss[loss=0.06811, simple_loss=0.1012, pruned_loss=0.01114, audio_tagging_loss=0.006385, over 14527.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.0922, pruned_loss=0.01403, audio_tagging_loss=0.009006, over 3054885.48 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 12:59:39,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.327e+01 8.730e+01 9.519e+01 2.872e+02, threshold=1.746e+02, percent-clipped=1.0 2023-11-23 12:59:47,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358450 2023-11-23 12:59:51,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=12.0 2023-11-23 12:59:57,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2389666.6666666665, ans=0.125 2023-11-23 13:00:04,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-23 13:00:05,011 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:00:40,785 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9800, loss[loss=0.07282, simple_loss=0.09727, pruned_loss=0.01463, audio_tagging_loss=0.009549, over 15984.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09244, pruned_loss=0.01402, audio_tagging_loss=0.008985, over 3054497.72 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:00:49,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-23 13:00:51,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358500 2023-11-23 13:00:55,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2390000.0, ans=0.035 2023-11-23 13:01:05,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2390066.6666666665, ans=0.125 2023-11-23 13:01:08,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2390066.6666666665, ans=0.1 2023-11-23 13:01:21,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2390133.3333333335, ans=0.125 2023-11-23 13:01:29,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2390133.3333333335, ans=0.125 2023-11-23 13:01:38,906 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:01:45,028 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9850, loss[loss=0.06108, simple_loss=0.07801, pruned_loss=0.01323, audio_tagging_loss=0.008848, over 15516.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09226, pruned_loss=0.01391, audio_tagging_loss=0.008896, over 3055947.13 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:01:46,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2390266.6666666665, ans=0.125 2023-11-23 13:01:47,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.339e+01 8.951e+01 9.575e+01 1.283e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 13:01:47,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2390266.6666666665, ans=0.1 2023-11-23 13:01:55,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358550 2023-11-23 13:02:41,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2390533.3333333335, ans=0.2 2023-11-23 13:02:43,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2390533.3333333335, ans=0.95 2023-11-23 13:02:48,996 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9900, loss[loss=0.07183, simple_loss=0.101, pruned_loss=0.01028, audio_tagging_loss=0.01108, over 15687.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09198, pruned_loss=0.01382, audio_tagging_loss=0.008919, over 3048963.52 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:02:59,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358600 2023-11-23 13:03:13,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2390666.6666666665, ans=0.1 2023-11-23 13:03:15,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-23 13:03:53,917 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 9950, loss[loss=0.06358, simple_loss=0.0866, pruned_loss=0.01178, audio_tagging_loss=0.008495, over 14442.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09194, pruned_loss=0.01373, audio_tagging_loss=0.008871, over 3049272.27 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:03:55,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2390933.3333333335, ans=0.125 2023-11-23 13:03:56,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.358e+01 9.006e+01 9.900e+01 1.170e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 13:03:57,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2390933.3333333335, ans=0.015 2023-11-23 13:04:04,284 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358650 2023-11-23 13:04:36,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2391133.3333333335, ans=0.125 2023-11-23 13:04:42,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-23 13:04:55,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2391200.0, ans=0.1 2023-11-23 13:04:57,628 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10000, loss[loss=0.05735, simple_loss=0.07547, pruned_loss=0.009536, audio_tagging_loss=0.01008, over 15115.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09209, pruned_loss=0.01369, audio_tagging_loss=0.008795, over 3045011.87 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:05:03,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2391266.6666666665, ans=0.125 2023-11-23 13:05:06,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2391266.6666666665, ans=0.125 2023-11-23 13:05:07,298 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358700 2023-11-23 13:05:07,402 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:05:23,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-11-23 13:05:28,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2391400.0, ans=0.5 2023-11-23 13:05:37,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2391466.6666666665, ans=0.2 2023-11-23 13:05:48,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:05:56,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2391533.3333333335, ans=0.125 2023-11-23 13:06:01,419 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10050, loss[loss=0.07085, simple_loss=0.09393, pruned_loss=0.01746, audio_tagging_loss=0.006428, over 14897.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09185, pruned_loss=0.01361, audio_tagging_loss=0.008912, over 3048769.94 frames. ], batch size: 53, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:06:03,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.322e+01 8.907e+01 9.751e+01 1.198e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 13:06:05,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-23 13:06:11,100 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358750 2023-11-23 13:06:18,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2391666.6666666665, ans=0.2 2023-11-23 13:06:22,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2391666.6666666665, ans=0.1 2023-11-23 13:06:46,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2391800.0, ans=0.125 2023-11-23 13:07:02,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2023-11-23 13:07:05,469 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10100, loss[loss=0.0645, simple_loss=0.09052, pruned_loss=0.009908, audio_tagging_loss=0.009333, over 14787.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09169, pruned_loss=0.01355, audio_tagging_loss=0.008865, over 3049699.71 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:07:06,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2391933.3333333335, ans=0.0 2023-11-23 13:07:10,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2391933.3333333335, ans=0.04949747468305833 2023-11-23 13:07:16,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358800 2023-11-23 13:07:54,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2392133.3333333335, ans=0.0 2023-11-23 13:07:57,875 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:08:10,169 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10150, loss[loss=0.07819, simple_loss=0.09955, pruned_loss=0.01656, audio_tagging_loss=0.01185, over 16116.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09197, pruned_loss=0.01357, audio_tagging_loss=0.008941, over 3054462.93 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:08:13,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.548e+01 9.170e+01 9.737e+01 1.223e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-23 13:08:18,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2392266.6666666665, ans=0.2 2023-11-23 13:08:19,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358850 2023-11-23 13:08:36,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2392400.0, ans=0.125 2023-11-23 13:08:40,882 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:08:52,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2392466.6666666665, ans=0.025 2023-11-23 13:08:52,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2392466.6666666665, ans=0.125 2023-11-23 13:09:10,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-23 13:09:13,914 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10200, loss[loss=0.06841, simple_loss=0.09357, pruned_loss=0.01299, audio_tagging_loss=0.00864, over 14848.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09159, pruned_loss=0.01346, audio_tagging_loss=0.009016, over 3054041.77 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:09:16,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2392600.0, ans=0.2 2023-11-23 13:09:21,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2392600.0, ans=0.0 2023-11-23 13:09:21,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2392600.0, ans=0.2 2023-11-23 13:09:23,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358900 2023-11-23 13:09:39,572 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:09:39,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2392733.3333333335, ans=0.0 2023-11-23 13:09:54,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2392800.0, ans=0.125 2023-11-23 13:09:56,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2392800.0, ans=0.125 2023-11-23 13:10:01,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2392800.0, ans=0.125 2023-11-23 13:10:18,483 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10250, loss[loss=0.06513, simple_loss=0.07609, pruned_loss=0.01306, audio_tagging_loss=0.01402, over 16483.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09145, pruned_loss=0.01349, audio_tagging_loss=0.009246, over 3054384.28 frames. ], batch size: 62, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:10:22,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.314e+01 9.044e+01 9.616e+01 1.457e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 13:10:28,371 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 358950 2023-11-23 13:10:34,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2393000.0, ans=0.1 2023-11-23 13:10:43,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-23 13:10:44,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-23 13:10:47,990 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:11:09,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2393200.0, ans=0.125 2023-11-23 13:11:23,061 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10300, loss[loss=0.05064, simple_loss=0.06204, pruned_loss=0.008898, audio_tagging_loss=0.01072, over 14955.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.0912, pruned_loss=0.01349, audio_tagging_loss=0.009322, over 3053303.31 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:11:33,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359000 2023-11-23 13:11:48,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2393400.0, ans=0.125 2023-11-23 13:12:00,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2393466.6666666665, ans=15.0 2023-11-23 13:12:24,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2393533.3333333335, ans=0.125 2023-11-23 13:12:26,615 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10350, loss[loss=0.06138, simple_loss=0.08137, pruned_loss=0.009783, audio_tagging_loss=0.01091, over 15702.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09178, pruned_loss=0.0136, audio_tagging_loss=0.009391, over 3061719.89 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:12:30,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.344e+01 8.692e+01 9.326e+01 1.149e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-23 13:12:36,420 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359050 2023-11-23 13:12:43,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2393666.6666666665, ans=0.125 2023-11-23 13:12:45,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2393666.6666666665, ans=0.05 2023-11-23 13:12:57,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2393733.3333333335, ans=0.125 2023-11-23 13:13:03,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2393800.0, ans=0.5 2023-11-23 13:13:11,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2393800.0, ans=0.0 2023-11-23 13:13:14,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2393800.0, ans=0.1 2023-11-23 13:13:30,172 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10400, loss[loss=0.05744, simple_loss=0.07013, pruned_loss=0.009989, audio_tagging_loss=0.01238, over 13640.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09167, pruned_loss=0.01373, audio_tagging_loss=0.00953, over 3054747.38 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:13:39,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359100 2023-11-23 13:13:51,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2394000.0, ans=0.0 2023-11-23 13:14:01,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2394066.6666666665, ans=0.125 2023-11-23 13:14:03,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2394066.6666666665, ans=0.125 2023-11-23 13:14:28,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2394200.0, ans=0.0 2023-11-23 13:14:30,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-23 13:14:32,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2394266.6666666665, ans=0.125 2023-11-23 13:14:33,790 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10450, loss[loss=0.05935, simple_loss=0.08198, pruned_loss=0.008507, audio_tagging_loss=0.009855, over 14610.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09101, pruned_loss=0.01359, audio_tagging_loss=0.009444, over 3046570.43 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:14:37,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.401e+01 9.132e+01 1.001e+02 1.338e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-23 13:14:43,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359150 2023-11-23 13:14:57,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2394400.0, ans=0.0 2023-11-23 13:15:14,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2394466.6666666665, ans=0.125 2023-11-23 13:15:37,332 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10500, loss[loss=0.04953, simple_loss=0.06297, pruned_loss=0.008317, audio_tagging_loss=0.009731, over 15551.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09087, pruned_loss=0.01354, audio_tagging_loss=0.009311, over 3043601.12 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:15:45,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2394600.0, ans=0.125 2023-11-23 13:15:47,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359200 2023-11-23 13:15:50,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-23 13:16:05,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2394733.3333333335, ans=0.125 2023-11-23 13:16:15,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2394800.0, ans=0.125 2023-11-23 13:16:35,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2394866.6666666665, ans=0.125 2023-11-23 13:16:41,122 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10550, loss[loss=0.0499, simple_loss=0.06712, pruned_loss=0.007844, audio_tagging_loss=0.008492, over 15100.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09014, pruned_loss=0.01343, audio_tagging_loss=0.009275, over 3042475.05 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:16:45,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.241e+01 8.844e+01 9.462e+01 1.240e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 13:16:51,711 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359250 2023-11-23 13:16:52,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-23 13:17:45,205 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10600, loss[loss=0.06049, simple_loss=0.08431, pruned_loss=0.009874, audio_tagging_loss=0.00846, over 15647.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.09108, pruned_loss=0.0135, audio_tagging_loss=0.00911, over 3047018.66 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:17:55,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359300 2023-11-23 13:17:59,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2023-11-23 13:18:11,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2395400.0, ans=0.125 2023-11-23 13:18:12,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2023-11-23 13:18:49,241 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10650, loss[loss=0.07077, simple_loss=0.0914, pruned_loss=0.01671, audio_tagging_loss=0.008361, over 14483.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09035, pruned_loss=0.01339, audio_tagging_loss=0.009124, over 3043103.90 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:18:53,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.219e+01 8.921e+01 9.796e+01 1.166e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 13:18:59,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359350 2023-11-23 13:19:15,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2395733.3333333335, ans=0.125 2023-11-23 13:19:17,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-23 13:19:44,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2395866.6666666665, ans=0.0 2023-11-23 13:19:47,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2395866.6666666665, ans=0.125 2023-11-23 13:19:52,820 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10700, loss[loss=0.05463, simple_loss=0.07093, pruned_loss=0.01015, audio_tagging_loss=0.009014, over 15239.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09121, pruned_loss=0.01345, audio_tagging_loss=0.00903, over 3051454.59 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:19:58,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2395933.3333333335, ans=0.1 2023-11-23 13:20:02,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2395933.3333333335, ans=0.125 2023-11-23 13:20:03,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359400 2023-11-23 13:20:18,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2396066.6666666665, ans=0.125 2023-11-23 13:20:18,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2396066.6666666665, ans=0.0 2023-11-23 13:20:19,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2396066.6666666665, ans=0.1 2023-11-23 13:20:48,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2396200.0, ans=0.1 2023-11-23 13:20:57,060 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10750, loss[loss=0.07419, simple_loss=0.1013, pruned_loss=0.01592, audio_tagging_loss=0.007609, over 15097.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09106, pruned_loss=0.01354, audio_tagging_loss=0.008995, over 3047263.51 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:21:02,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 8.422e+01 8.871e+01 9.738e+01 1.299e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 13:21:07,224 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359450 2023-11-23 13:21:11,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-23 13:21:22,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2396400.0, ans=0.125 2023-11-23 13:21:33,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-23 13:21:37,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2396466.6666666665, ans=0.0 2023-11-23 13:21:41,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2396466.6666666665, ans=0.0 2023-11-23 13:21:47,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2396533.3333333335, ans=0.1 2023-11-23 13:22:01,023 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10800, loss[loss=0.0576, simple_loss=0.07004, pruned_loss=0.01406, audio_tagging_loss=0.00851, over 15796.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09082, pruned_loss=0.01353, audio_tagging_loss=0.009, over 3048802.26 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:22:11,200 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359500 2023-11-23 13:22:28,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2396733.3333333335, ans=0.125 2023-11-23 13:22:47,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2396800.0, ans=0.125 2023-11-23 13:22:47,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2023-11-23 13:22:58,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2396866.6666666665, ans=0.04949747468305833 2023-11-23 13:23:04,148 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10850, loss[loss=0.05476, simple_loss=0.06994, pruned_loss=0.00812, audio_tagging_loss=0.01167, over 14910.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09031, pruned_loss=0.01365, audio_tagging_loss=0.009058, over 3048280.06 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:23:11,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 8.074e+01 8.715e+01 9.516e+01 1.146e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-23 13:23:15,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359550 2023-11-23 13:23:17,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2397000.0, ans=0.0 2023-11-23 13:23:19,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2397000.0, ans=0.1 2023-11-23 13:23:59,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2397200.0, ans=0.0 2023-11-23 13:24:05,387 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:24:07,827 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10900, loss[loss=0.08366, simple_loss=0.116, pruned_loss=0.01757, audio_tagging_loss=0.008072, over 14713.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09066, pruned_loss=0.01347, audio_tagging_loss=0.009126, over 3050250.05 frames. ], batch size: 53, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:24:18,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359600 2023-11-23 13:24:23,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2397333.3333333335, ans=0.125 2023-11-23 13:24:46,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2397466.6666666665, ans=0.125 2023-11-23 13:24:57,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2397466.6666666665, ans=0.125 2023-11-23 13:24:58,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2397533.3333333335, ans=0.1 2023-11-23 13:25:03,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2397533.3333333335, ans=0.1 2023-11-23 13:25:05,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=22.5 2023-11-23 13:25:12,097 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 10950, loss[loss=0.0565, simple_loss=0.07019, pruned_loss=0.01118, audio_tagging_loss=0.01022, over 16111.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09137, pruned_loss=0.01358, audio_tagging_loss=0.009182, over 3055252.30 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:25:18,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.360e+01 9.047e+01 9.764e+01 1.279e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 13:25:19,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2023-11-23 13:25:20,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2397600.0, ans=0.1 2023-11-23 13:25:21,988 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359650 2023-11-23 13:25:35,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-23 13:25:44,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2397733.3333333335, ans=0.125 2023-11-23 13:26:02,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2397866.6666666665, ans=0.125 2023-11-23 13:26:16,459 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11000, loss[loss=0.07015, simple_loss=0.09081, pruned_loss=0.01452, audio_tagging_loss=0.01023, over 14689.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09245, pruned_loss=0.01378, audio_tagging_loss=0.009168, over 3055509.38 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:26:27,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359700 2023-11-23 13:26:28,852 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:26:30,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2398000.0, ans=0.1 2023-11-23 13:27:06,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-23 13:27:13,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2398200.0, ans=0.125 2023-11-23 13:27:22,148 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11050, loss[loss=0.0678, simple_loss=0.09174, pruned_loss=0.01223, audio_tagging_loss=0.009692, over 15032.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09354, pruned_loss=0.01405, audio_tagging_loss=0.009245, over 3058712.20 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:27:26,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2398266.6666666665, ans=0.1 2023-11-23 13:27:28,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.620e+01 9.222e+01 9.990e+01 1.274e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-23 13:27:31,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2398266.6666666665, ans=0.0 2023-11-23 13:27:32,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359750 2023-11-23 13:27:34,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-23 13:27:46,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2398400.0, ans=0.0 2023-11-23 13:27:52,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2398400.0, ans=0.1 2023-11-23 13:28:11,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2023-11-23 13:28:16,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2398533.3333333335, ans=0.0 2023-11-23 13:28:27,248 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11100, loss[loss=0.06673, simple_loss=0.07824, pruned_loss=0.01436, audio_tagging_loss=0.01325, over 15236.00 frames. ], tot_loss[loss=0.07048, simple_loss=0.09381, pruned_loss=0.01424, audio_tagging_loss=0.009336, over 3058061.58 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:28:31,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2398600.0, ans=0.1 2023-11-23 13:28:36,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2398600.0, ans=0.0 2023-11-23 13:28:37,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359800 2023-11-23 13:28:38,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2398666.6666666665, ans=0.125 2023-11-23 13:28:43,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2023-11-23 13:28:43,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2398666.6666666665, ans=0.1 2023-11-23 13:29:02,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2398733.3333333335, ans=0.2 2023-11-23 13:29:18,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2398866.6666666665, ans=0.0 2023-11-23 13:29:19,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2398866.6666666665, ans=0.125 2023-11-23 13:29:25,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2398866.6666666665, ans=0.0 2023-11-23 13:29:31,461 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11150, loss[loss=0.08154, simple_loss=0.1092, pruned_loss=0.01751, audio_tagging_loss=0.009424, over 15353.00 frames. ], tot_loss[loss=0.07079, simple_loss=0.09395, pruned_loss=0.01436, audio_tagging_loss=0.009461, over 3051482.36 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:29:32,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2398933.3333333335, ans=0.0 2023-11-23 13:29:37,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.369e+01 8.812e+01 9.424e+01 1.136e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-23 13:29:39,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-23 13:29:41,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359850 2023-11-23 13:29:56,592 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:29:57,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2399066.6666666665, ans=0.1 2023-11-23 13:30:02,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-23 13:30:04,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2399066.6666666665, ans=0.125 2023-11-23 13:30:18,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2399133.3333333335, ans=0.125 2023-11-23 13:30:35,616 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11200, loss[loss=0.0475, simple_loss=0.06119, pruned_loss=0.006071, audio_tagging_loss=0.01083, over 14009.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09297, pruned_loss=0.01409, audio_tagging_loss=0.009462, over 3055132.32 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:30:46,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359900 2023-11-23 13:30:47,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2399333.3333333335, ans=0.125 2023-11-23 13:31:01,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-23 13:31:10,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-23 13:31:36,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2399533.3333333335, ans=0.04949747468305833 2023-11-23 13:31:39,961 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11250, loss[loss=0.06831, simple_loss=0.08832, pruned_loss=0.01438, audio_tagging_loss=0.00978, over 14740.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.09215, pruned_loss=0.01413, audio_tagging_loss=0.009543, over 3048681.82 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:31:47,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.449e+01 9.308e+01 1.008e+02 1.171e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-23 13:31:49,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 359950 2023-11-23 13:31:57,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2399666.6666666665, ans=10.0 2023-11-23 13:31:58,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2399666.6666666665, ans=10.0 2023-11-23 13:32:18,287 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:32:42,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2399933.3333333335, ans=0.125 2023-11-23 13:32:43,475 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11300, loss[loss=0.08698, simple_loss=0.1216, pruned_loss=0.01874, audio_tagging_loss=0.007454, over 14495.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.09188, pruned_loss=0.01395, audio_tagging_loss=0.009421, over 3048341.13 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:32:51,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2399933.3333333335, ans=0.2 2023-11-23 13:32:53,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360000 2023-11-23 13:32:53,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2399933.3333333335, ans=0.125 2023-11-23 13:32:59,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2400000.0, ans=0.125 2023-11-23 13:33:18,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-23 13:33:29,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-23 13:33:50,671 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11350, loss[loss=0.0565, simple_loss=0.07425, pruned_loss=0.01022, audio_tagging_loss=0.009151, over 15843.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.0917, pruned_loss=0.01387, audio_tagging_loss=0.009266, over 3055805.36 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:33:54,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400266.6666666665, ans=0.1 2023-11-23 13:33:57,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.206e+01 9.153e+01 9.834e+01 1.397e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-23 13:33:59,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2400266.6666666665, ans=0.125 2023-11-23 13:34:00,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360050 2023-11-23 13:34:18,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2400400.0, ans=0.125 2023-11-23 13:34:38,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2400466.6666666665, ans=0.125 2023-11-23 13:34:54,318 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11400, loss[loss=0.06122, simple_loss=0.0751, pruned_loss=0.01422, audio_tagging_loss=0.00945, over 16194.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09141, pruned_loss=0.01376, audio_tagging_loss=0.009213, over 3058505.42 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:35:04,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360100 2023-11-23 13:35:07,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-23 13:35:14,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2400666.6666666665, ans=0.2 2023-11-23 13:35:15,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2023-11-23 13:35:18,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2400733.3333333335, ans=0.125 2023-11-23 13:35:20,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2400733.3333333335, ans=0.0 2023-11-23 13:35:26,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2400733.3333333335, ans=0.125 2023-11-23 13:35:40,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2400800.0, ans=10.0 2023-11-23 13:35:46,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2400866.6666666665, ans=0.125 2023-11-23 13:35:48,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2400866.6666666665, ans=0.125 2023-11-23 13:35:57,407 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11450, loss[loss=0.06144, simple_loss=0.08446, pruned_loss=0.01015, audio_tagging_loss=0.009063, over 16127.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09145, pruned_loss=0.01372, audio_tagging_loss=0.009123, over 3057101.02 frames. ], batch size: 59, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:35:57,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2400933.3333333335, ans=0.0 2023-11-23 13:36:00,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2400933.3333333335, ans=0.04949747468305833 2023-11-23 13:36:00,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-11-23 13:36:04,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.159e+01 8.794e+01 9.452e+01 1.261e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 13:36:06,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2400933.3333333335, ans=0.07 2023-11-23 13:36:07,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360150 2023-11-23 13:36:18,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2401000.0, ans=0.125 2023-11-23 13:36:42,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2401133.3333333335, ans=0.125 2023-11-23 13:36:49,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2401200.0, ans=0.95 2023-11-23 13:36:58,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2401200.0, ans=0.04949747468305833 2023-11-23 13:37:01,739 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11500, loss[loss=0.07164, simple_loss=0.1004, pruned_loss=0.01323, audio_tagging_loss=0.008206, over 15153.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09156, pruned_loss=0.01377, audio_tagging_loss=0.009155, over 3052230.26 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:37:04,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2023-11-23 13:37:09,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2401266.6666666665, ans=0.1 2023-11-23 13:37:12,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-23 13:37:12,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360200 2023-11-23 13:37:29,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2401400.0, ans=0.0 2023-11-23 13:37:40,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2401466.6666666665, ans=0.2 2023-11-23 13:37:44,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2401466.6666666665, ans=0.0 2023-11-23 13:37:59,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2401533.3333333335, ans=0.125 2023-11-23 13:38:07,649 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11550, loss[loss=0.07881, simple_loss=0.1056, pruned_loss=0.01839, audio_tagging_loss=0.007602, over 14979.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09203, pruned_loss=0.01394, audio_tagging_loss=0.009063, over 3047907.77 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:38:15,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.476e+01 9.086e+01 9.753e+01 1.372e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 13:38:18,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360250 2023-11-23 13:38:19,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2401666.6666666665, ans=0.125 2023-11-23 13:38:27,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-23 13:38:35,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2401733.3333333335, ans=0.1 2023-11-23 13:38:41,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2401733.3333333335, ans=0.125 2023-11-23 13:38:47,262 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 13:39:11,784 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11600, loss[loss=0.05423, simple_loss=0.06652, pruned_loss=0.0126, audio_tagging_loss=0.008364, over 15239.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.09284, pruned_loss=0.01402, audio_tagging_loss=0.009092, over 3048163.12 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:39:18,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2401933.3333333335, ans=0.2 2023-11-23 13:39:21,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360300 2023-11-23 13:39:26,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2402000.0, ans=0.1 2023-11-23 13:39:40,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2402066.6666666665, ans=0.125 2023-11-23 13:39:43,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2402066.6666666665, ans=0.1 2023-11-23 13:39:43,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2402066.6666666665, ans=0.2 2023-11-23 13:39:46,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-11-23 13:40:09,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:40:14,605 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11650, loss[loss=0.0683, simple_loss=0.09421, pruned_loss=0.01062, audio_tagging_loss=0.01057, over 16139.00 frames. ], tot_loss[loss=0.06951, simple_loss=0.09295, pruned_loss=0.01398, audio_tagging_loss=0.009066, over 3046864.53 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:40:19,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2402266.6666666665, ans=0.125 2023-11-23 13:40:22,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.635e+01 8.364e+01 9.300e+01 1.004e+02 1.361e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-23 13:40:25,046 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360350 2023-11-23 13:40:27,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2402333.3333333335, ans=0.2 2023-11-23 13:40:57,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2402466.6666666665, ans=0.125 2023-11-23 13:41:02,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2402466.6666666665, ans=0.09899494936611666 2023-11-23 13:41:06,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2402533.3333333335, ans=0.95 2023-11-23 13:41:18,465 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11700, loss[loss=0.08583, simple_loss=0.1002, pruned_loss=0.02312, audio_tagging_loss=0.01263, over 13741.00 frames. ], tot_loss[loss=0.06911, simple_loss=0.09209, pruned_loss=0.01394, audio_tagging_loss=0.00913, over 3049264.37 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:41:20,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2402600.0, ans=6.0 2023-11-23 13:41:27,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2402600.0, ans=0.015 2023-11-23 13:41:29,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360400 2023-11-23 13:41:29,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2402600.0, ans=0.04949747468305833 2023-11-23 13:41:41,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-11-23 13:41:46,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2402733.3333333335, ans=0.0 2023-11-23 13:41:48,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2402733.3333333335, ans=0.2 2023-11-23 13:41:53,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2402733.3333333335, ans=0.125 2023-11-23 13:42:00,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2402800.0, ans=0.125 2023-11-23 13:42:00,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2402800.0, ans=0.125 2023-11-23 13:42:23,001 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11750, loss[loss=0.0677, simple_loss=0.08574, pruned_loss=0.01435, audio_tagging_loss=0.01048, over 16111.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09276, pruned_loss=0.01404, audio_tagging_loss=0.009159, over 3053038.52 frames. ], batch size: 61, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:42:32,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 8.361e+01 8.957e+01 9.749e+01 1.339e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 13:42:33,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360450 2023-11-23 13:42:36,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2403000.0, ans=0.125 2023-11-23 13:42:37,300 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:42:39,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2403000.0, ans=0.0 2023-11-23 13:42:52,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2403066.6666666665, ans=10.0 2023-11-23 13:43:03,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2403133.3333333335, ans=0.0 2023-11-23 13:43:10,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2403133.3333333335, ans=0.125 2023-11-23 13:43:21,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-23 13:43:25,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-23 13:43:26,973 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11800, loss[loss=0.06995, simple_loss=0.08673, pruned_loss=0.01554, audio_tagging_loss=0.01104, over 15544.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09242, pruned_loss=0.01393, audio_tagging_loss=0.009099, over 3054798.57 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:43:37,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360500 2023-11-23 13:43:49,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2403333.3333333335, ans=0.125 2023-11-23 13:43:54,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-23 13:44:16,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2403533.3333333335, ans=0.125 2023-11-23 13:44:24,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2403533.3333333335, ans=0.0 2023-11-23 13:44:31,206 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11850, loss[loss=0.05056, simple_loss=0.06749, pruned_loss=0.007036, audio_tagging_loss=0.009777, over 14829.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.0924, pruned_loss=0.01403, audio_tagging_loss=0.009094, over 3052205.17 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:44:40,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.325e+01 8.959e+01 9.729e+01 1.337e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 13:44:41,695 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360550 2023-11-23 13:44:41,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2403600.0, ans=0.0 2023-11-23 13:44:42,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-23 13:44:55,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-23 13:45:01,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2403733.3333333335, ans=0.0 2023-11-23 13:45:02,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2403733.3333333335, ans=0.125 2023-11-23 13:45:21,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2403866.6666666665, ans=0.125 2023-11-23 13:45:27,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2403866.6666666665, ans=0.125 2023-11-23 13:45:30,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2403866.6666666665, ans=0.125 2023-11-23 13:45:35,376 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11900, loss[loss=0.06627, simple_loss=0.08174, pruned_loss=0.01412, audio_tagging_loss=0.01128, over 14474.00 frames. ], tot_loss[loss=0.07039, simple_loss=0.09378, pruned_loss=0.01433, audio_tagging_loss=0.009176, over 3047881.40 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:45:45,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360600 2023-11-23 13:45:56,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2404000.0, ans=0.1 2023-11-23 13:45:56,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2023-11-23 13:46:11,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2404066.6666666665, ans=0.125 2023-11-23 13:46:19,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-23 13:46:23,199 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:46:25,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2404133.3333333335, ans=0.125 2023-11-23 13:46:29,628 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:46:40,764 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 11950, loss[loss=0.06731, simple_loss=0.09044, pruned_loss=0.01348, audio_tagging_loss=0.008612, over 15487.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09326, pruned_loss=0.01434, audio_tagging_loss=0.009263, over 3038474.80 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 16.0 2023-11-23 13:46:45,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-23 13:46:50,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.271e+01 8.961e+01 9.651e+01 1.560e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 13:46:51,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360650 2023-11-23 13:47:03,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2404333.3333333335, ans=0.125 2023-11-23 13:47:33,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2404533.3333333335, ans=0.125 2023-11-23 13:47:36,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2023-11-23 13:47:42,938 INFO [train_asr.py:1221] (1/4) Epoch 30, batch 12000, loss[loss=0.06051, simple_loss=0.0751, pruned_loss=0.0134, audio_tagging_loss=0.009558, over 14787.00 frames. ], tot_loss[loss=0.07034, simple_loss=0.09327, pruned_loss=0.01435, audio_tagging_loss=0.009356, over 3039644.09 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2023-11-23 13:47:42,939 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 13:48:05,355 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2148, 4.1391, 4.3725, 4.2605], device='cuda:1') 2023-11-23 13:48:25,604 INFO [train_asr.py:1253] (1/4) Epoch 30, validation: loss=0.05798, simple_loss=0.05115, pruned_loss=0.00515, audio_tagging_loss=0.02725, over 4681554.00 frames. 2023-11-23 13:48:25,605 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 13:48:35,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360700 2023-11-23 13:48:42,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2404666.6666666665, ans=0.2 2023-11-23 13:49:30,362 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 0, loss[loss=0.09598, simple_loss=0.1164, pruned_loss=0.01918, audio_tagging_loss=0.01858, over 16345.00 frames. ], tot_loss[loss=0.09598, simple_loss=0.1164, pruned_loss=0.01918, audio_tagging_loss=0.01858, over 16345.00 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:49:30,362 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 13:50:05,322 INFO [train_asr.py:1253] (1/4) Epoch 31, validation: loss=0.05797, simple_loss=0.05105, pruned_loss=0.005059, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-23 13:50:05,323 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 13:50:15,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-11-23 13:50:18,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.78 vs. limit=22.5 2023-11-23 13:50:30,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2404900.0, ans=0.125 2023-11-23 13:50:32,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2404900.0, ans=0.2 2023-11-23 13:50:36,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2404900.0, ans=0.125 2023-11-23 13:50:37,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2404900.0, ans=0.0 2023-11-23 13:50:40,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-23 13:50:45,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-23 13:50:47,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.784e+01 9.433e+01 1.048e+02 1.296e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-23 13:50:47,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2404966.6666666665, ans=0.1 2023-11-23 13:50:48,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360750 2023-11-23 13:50:51,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2404966.6666666665, ans=0.125 2023-11-23 13:50:54,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2404966.6666666665, ans=10.0 2023-11-23 13:51:03,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2023-11-23 13:51:10,455 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 50, loss[loss=0.08834, simple_loss=0.1021, pruned_loss=0.02145, audio_tagging_loss=0.01585, over 15741.00 frames. ], tot_loss[loss=0.07808, simple_loss=0.09332, pruned_loss=0.0141, audio_tagging_loss=0.01732, over 690143.09 frames. ], batch size: 58, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:51:28,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2405166.6666666665, ans=0.125 2023-11-23 13:51:29,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-23 13:51:30,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2405166.6666666665, ans=0.125 2023-11-23 13:51:32,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2405166.6666666665, ans=0.0 2023-11-23 13:51:35,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2405233.3333333335, ans=0.04949747468305833 2023-11-23 13:51:48,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2405300.0, ans=0.0 2023-11-23 13:51:53,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360800 2023-11-23 13:51:59,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-23 13:52:16,515 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 100, loss[loss=0.07113, simple_loss=0.08219, pruned_loss=0.01183, audio_tagging_loss=0.01821, over 14458.00 frames. ], tot_loss[loss=0.07672, simple_loss=0.09224, pruned_loss=0.01393, audio_tagging_loss=0.01667, over 1219962.08 frames. ], batch size: 54, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:52:26,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2405433.3333333335, ans=0.2 2023-11-23 13:52:37,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=12.0 2023-11-23 13:52:46,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2405566.6666666665, ans=0.125 2023-11-23 13:52:57,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.885e+01 9.566e+01 1.092e+02 1.525e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-23 13:52:59,052 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360850 2023-11-23 13:53:01,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2405633.3333333335, ans=0.0 2023-11-23 13:53:02,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-23 13:53:18,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2405700.0, ans=0.125 2023-11-23 13:53:20,663 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 150, loss[loss=0.05475, simple_loss=0.06513, pruned_loss=0.00851, audio_tagging_loss=0.01367, over 15894.00 frames. ], tot_loss[loss=0.07412, simple_loss=0.09084, pruned_loss=0.01363, audio_tagging_loss=0.01507, over 1623710.77 frames. ], batch size: 61, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:53:20,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2405766.6666666665, ans=0.0 2023-11-23 13:53:43,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2405833.3333333335, ans=0.0 2023-11-23 13:53:50,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2405900.0, ans=0.0 2023-11-23 13:54:00,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2405966.6666666665, ans=0.2 2023-11-23 13:54:03,687 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360900 2023-11-23 13:54:05,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2405966.6666666665, ans=0.2 2023-11-23 13:54:25,014 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 200, loss[loss=0.06657, simple_loss=0.08685, pruned_loss=0.01442, audio_tagging_loss=0.008715, over 14430.00 frames. ], tot_loss[loss=0.07379, simple_loss=0.0931, pruned_loss=0.014, audio_tagging_loss=0.01324, over 1936061.81 frames. ], batch size: 55, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:54:39,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2406166.6666666665, ans=0.125 2023-11-23 13:54:59,226 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:55:06,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.350e+01 9.246e+01 9.961e+01 1.409e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-23 13:55:07,605 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 360950 2023-11-23 13:55:10,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2406300.0, ans=0.04949747468305833 2023-11-23 13:55:24,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2406366.6666666665, ans=0.1 2023-11-23 13:55:30,782 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 250, loss[loss=0.06997, simple_loss=0.08891, pruned_loss=0.01408, audio_tagging_loss=0.01144, over 15857.00 frames. ], tot_loss[loss=0.07221, simple_loss=0.09246, pruned_loss=0.01384, audio_tagging_loss=0.01213, over 2181379.59 frames. ], batch size: 59, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:55:40,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2406433.3333333335, ans=0.0 2023-11-23 13:55:41,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2023-11-23 13:56:12,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2406633.3333333335, ans=0.125 2023-11-23 13:56:13,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361000 2023-11-23 13:56:34,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2406766.6666666665, ans=0.0 2023-11-23 13:56:35,023 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 300, loss[loss=0.06498, simple_loss=0.08577, pruned_loss=0.01114, audio_tagging_loss=0.01095, over 16261.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.09264, pruned_loss=0.01388, audio_tagging_loss=0.01126, over 2375572.70 frames. ], batch size: 60, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:56:47,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2406833.3333333335, ans=0.0 2023-11-23 13:56:48,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2406833.3333333335, ans=0.1 2023-11-23 13:56:49,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-23 13:56:52,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2406833.3333333335, ans=0.1 2023-11-23 13:57:00,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2406900.0, ans=0.2 2023-11-23 13:57:15,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2406966.6666666665, ans=0.2 2023-11-23 13:57:16,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.477e+01 9.124e+01 9.859e+01 1.340e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 13:57:17,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361050 2023-11-23 13:57:20,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2406966.6666666665, ans=0.0 2023-11-23 13:57:24,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2406966.6666666665, ans=0.0 2023-11-23 13:57:38,719 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 350, loss[loss=0.07365, simple_loss=0.1024, pruned_loss=0.01367, audio_tagging_loss=0.008805, over 14703.00 frames. ], tot_loss[loss=0.07153, simple_loss=0.09403, pruned_loss=0.01403, audio_tagging_loss=0.01049, over 2525306.96 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:57:59,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2407166.6666666665, ans=0.0 2023-11-23 13:58:00,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2407166.6666666665, ans=0.04949747468305833 2023-11-23 13:58:07,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-23 13:58:08,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2407233.3333333335, ans=0.0 2023-11-23 13:58:21,939 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361100 2023-11-23 13:58:41,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-23 13:58:44,375 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 400, loss[loss=0.05718, simple_loss=0.07385, pruned_loss=0.01148, audio_tagging_loss=0.008773, over 14571.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09342, pruned_loss=0.01393, audio_tagging_loss=0.0102, over 2645574.04 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:58:51,402 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 13:59:02,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2407500.0, ans=0.0 2023-11-23 13:59:04,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-23 13:59:12,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2407566.6666666665, ans=0.2 2023-11-23 13:59:13,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2407566.6666666665, ans=0.95 2023-11-23 13:59:26,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.67 vs. limit=10.0 2023-11-23 13:59:27,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.381e+01 8.890e+01 9.709e+01 1.192e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-23 13:59:27,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361150 2023-11-23 13:59:31,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2407633.3333333335, ans=0.1 2023-11-23 13:59:47,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-23 13:59:49,031 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 450, loss[loss=0.06069, simple_loss=0.0819, pruned_loss=0.009778, audio_tagging_loss=0.009961, over 14393.00 frames. ], tot_loss[loss=0.07027, simple_loss=0.09298, pruned_loss=0.01382, audio_tagging_loss=0.009962, over 2726888.98 frames. ], batch size: 56, lr: 2.21e-03, grad_scale: 32.0 2023-11-23 13:59:59,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-23 14:00:04,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2407833.3333333335, ans=0.0 2023-11-23 14:00:14,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2407900.0, ans=0.125 2023-11-23 14:00:16,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2407900.0, ans=0.1 2023-11-23 14:00:20,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2407900.0, ans=0.1 2023-11-23 14:00:25,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2407900.0, ans=0.2 2023-11-23 14:00:30,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2407966.6666666665, ans=0.0 2023-11-23 14:00:31,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361200 2023-11-23 14:00:40,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2408033.3333333335, ans=0.1 2023-11-23 14:00:41,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2408033.3333333335, ans=0.125 2023-11-23 14:00:52,587 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 500, loss[loss=0.06441, simple_loss=0.07922, pruned_loss=0.01317, audio_tagging_loss=0.01163, over 15055.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09274, pruned_loss=0.01382, audio_tagging_loss=0.009818, over 2789495.11 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:00:52,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2408100.0, ans=0.1 2023-11-23 14:01:16,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2408166.6666666665, ans=0.09899494936611666 2023-11-23 14:01:21,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2408233.3333333335, ans=0.125 2023-11-23 14:01:35,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361250 2023-11-23 14:01:38,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.202e+01 8.738e+01 9.216e+01 1.113e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 14:01:48,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2408366.6666666665, ans=0.0 2023-11-23 14:01:55,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2408366.6666666665, ans=0.125 2023-11-23 14:01:58,081 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 550, loss[loss=0.08367, simple_loss=0.112, pruned_loss=0.01882, audio_tagging_loss=0.008832, over 14780.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.09272, pruned_loss=0.01387, audio_tagging_loss=0.009667, over 2845917.85 frames. ], batch size: 56, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:02:01,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2408433.3333333335, ans=0.125 2023-11-23 14:02:04,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2408433.3333333335, ans=0.0 2023-11-23 14:02:19,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2408500.0, ans=0.125 2023-11-23 14:02:28,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2408566.6666666665, ans=0.125 2023-11-23 14:02:30,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2408566.6666666665, ans=0.2 2023-11-23 14:02:31,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-23 14:02:36,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2408633.3333333335, ans=0.125 2023-11-23 14:02:36,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2408633.3333333335, ans=0.125 2023-11-23 14:02:40,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361300 2023-11-23 14:02:44,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-11-23 14:02:45,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-23 14:03:02,847 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 600, loss[loss=0.0889, simple_loss=0.1267, pruned_loss=0.01863, audio_tagging_loss=0.00694, over 15396.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09207, pruned_loss=0.01381, audio_tagging_loss=0.009593, over 2891962.90 frames. ], batch size: 53, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:03:04,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=12.0 2023-11-23 14:03:16,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2408833.3333333335, ans=0.2 2023-11-23 14:03:32,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2408900.0, ans=0.125 2023-11-23 14:03:34,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2408900.0, ans=0.125 2023-11-23 14:03:39,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2408966.6666666665, ans=0.1 2023-11-23 14:03:45,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361350 2023-11-23 14:03:48,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.522e+01 9.044e+01 1.001e+02 1.721e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 14:03:52,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2408966.6666666665, ans=0.125 2023-11-23 14:04:06,527 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 650, loss[loss=0.1041, simple_loss=0.137, pruned_loss=0.02779, audio_tagging_loss=0.007836, over 15369.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09273, pruned_loss=0.01409, audio_tagging_loss=0.009493, over 2924777.94 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:04:10,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2409100.0, ans=0.125 2023-11-23 14:04:13,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-23 14:04:20,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2409166.6666666665, ans=0.0 2023-11-23 14:04:22,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2409166.6666666665, ans=0.0 2023-11-23 14:04:35,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-23 14:04:39,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-23 14:04:42,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2409233.3333333335, ans=0.0 2023-11-23 14:04:48,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2409300.0, ans=0.125 2023-11-23 14:04:49,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361400 2023-11-23 14:05:06,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2409366.6666666665, ans=0.125 2023-11-23 14:05:09,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2409366.6666666665, ans=0.125 2023-11-23 14:05:12,024 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 700, loss[loss=0.05845, simple_loss=0.08016, pruned_loss=0.01003, audio_tagging_loss=0.008339, over 14853.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09379, pruned_loss=0.01415, audio_tagging_loss=0.009285, over 2954638.06 frames. ], batch size: 59, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:05:13,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2409433.3333333335, ans=0.125 2023-11-23 14:05:27,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2409500.0, ans=0.025 2023-11-23 14:05:45,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2409566.6666666665, ans=0.1 2023-11-23 14:05:54,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361450 2023-11-23 14:05:55,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2409633.3333333335, ans=0.125 2023-11-23 14:05:57,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.383e+01 9.026e+01 9.852e+01 1.230e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 14:06:12,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-11-23 14:06:13,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2409700.0, ans=0.125 2023-11-23 14:06:17,510 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 750, loss[loss=0.06711, simple_loss=0.09093, pruned_loss=0.01161, audio_tagging_loss=0.01003, over 15254.00 frames. ], tot_loss[loss=0.07089, simple_loss=0.0946, pruned_loss=0.0143, audio_tagging_loss=0.009292, over 2982157.92 frames. ], batch size: 58, lr: 2.21e-03, grad_scale: 8.0 2023-11-23 14:06:27,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2409766.6666666665, ans=0.09899494936611666 2023-11-23 14:06:33,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2409833.3333333335, ans=0.2 2023-11-23 14:06:34,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2409833.3333333335, ans=0.2 2023-11-23 14:06:35,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2409833.3333333335, ans=0.0 2023-11-23 14:06:39,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2409833.3333333335, ans=0.0 2023-11-23 14:06:59,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361500 2023-11-23 14:07:03,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2409966.6666666665, ans=0.0 2023-11-23 14:07:12,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2410033.3333333335, ans=0.0 2023-11-23 14:07:15,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410033.3333333335, ans=0.125 2023-11-23 14:07:20,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-11-23 14:07:21,098 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 800, loss[loss=0.06498, simple_loss=0.08333, pruned_loss=0.01195, audio_tagging_loss=0.01136, over 15645.00 frames. ], tot_loss[loss=0.07061, simple_loss=0.09425, pruned_loss=0.01417, audio_tagging_loss=0.009319, over 2999336.66 frames. ], batch size: 59, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:07:22,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2410100.0, ans=0.125 2023-11-23 14:07:44,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2410166.6666666665, ans=0.2 2023-11-23 14:07:46,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410233.3333333335, ans=0.125 2023-11-23 14:07:52,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-23 14:07:54,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-23 14:08:04,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361550 2023-11-23 14:08:06,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.503e+01 9.179e+01 9.718e+01 1.363e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-23 14:08:13,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2410366.6666666665, ans=0.1 2023-11-23 14:08:13,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2410366.6666666665, ans=0.0 2023-11-23 14:08:26,883 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 850, loss[loss=0.06265, simple_loss=0.07427, pruned_loss=0.0141, audio_tagging_loss=0.01141, over 13619.00 frames. ], tot_loss[loss=0.07036, simple_loss=0.09385, pruned_loss=0.01404, audio_tagging_loss=0.009386, over 3005443.11 frames. ], batch size: 54, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:09:09,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361600 2023-11-23 14:09:32,503 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 900, loss[loss=0.0941, simple_loss=0.1375, pruned_loss=0.02022, audio_tagging_loss=0.005108, over 14943.00 frames. ], tot_loss[loss=0.07086, simple_loss=0.09466, pruned_loss=0.01421, audio_tagging_loss=0.009324, over 3008062.19 frames. ], batch size: 54, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:09:34,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.68 vs. limit=10.0 2023-11-23 14:09:44,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2410833.3333333335, ans=0.0 2023-11-23 14:10:15,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361650 2023-11-23 14:10:18,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.422e+01 9.031e+01 9.667e+01 1.145e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-23 14:10:24,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2411033.3333333335, ans=0.125 2023-11-23 14:10:37,454 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 950, loss[loss=0.05894, simple_loss=0.07765, pruned_loss=0.01134, audio_tagging_loss=0.008773, over 14421.00 frames. ], tot_loss[loss=0.07061, simple_loss=0.09462, pruned_loss=0.01413, audio_tagging_loss=0.009178, over 3014934.82 frames. ], batch size: 55, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:10:42,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2411100.0, ans=10.0 2023-11-23 14:10:46,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2411100.0, ans=0.0 2023-11-23 14:11:19,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361700 2023-11-23 14:11:37,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2411366.6666666665, ans=0.1 2023-11-23 14:11:41,904 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1000, loss[loss=0.0804, simple_loss=0.1053, pruned_loss=0.02192, audio_tagging_loss=0.005833, over 15701.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09293, pruned_loss=0.01403, audio_tagging_loss=0.009123, over 3014103.87 frames. ], batch size: 55, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:11:42,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-23 14:11:43,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-23 14:12:01,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2411500.0, ans=0.125 2023-11-23 14:12:08,897 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:12:24,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361750 2023-11-23 14:12:26,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.336e+01 9.096e+01 9.576e+01 1.309e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-23 14:12:26,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2411633.3333333335, ans=0.125 2023-11-23 14:12:30,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2411633.3333333335, ans=0.125 2023-11-23 14:12:43,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2411700.0, ans=0.125 2023-11-23 14:12:47,003 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1050, loss[loss=0.0666, simple_loss=0.08464, pruned_loss=0.01339, audio_tagging_loss=0.01089, over 15675.00 frames. ], tot_loss[loss=0.06962, simple_loss=0.09297, pruned_loss=0.01406, audio_tagging_loss=0.009077, over 3029481.49 frames. ], batch size: 60, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:13:01,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-23 14:13:05,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2411833.3333333335, ans=0.125 2023-11-23 14:13:10,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2411833.3333333335, ans=0.125 2023-11-23 14:13:30,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361800 2023-11-23 14:13:48,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2412033.3333333335, ans=0.0 2023-11-23 14:13:51,457 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1100, loss[loss=0.06237, simple_loss=0.08203, pruned_loss=0.01117, audio_tagging_loss=0.01019, over 15041.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09194, pruned_loss=0.01375, audio_tagging_loss=0.00906, over 3035871.14 frames. ], batch size: 57, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:13:53,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=12.0 2023-11-23 14:13:53,965 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:14:27,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2412233.3333333335, ans=0.0 2023-11-23 14:14:31,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2412300.0, ans=0.125 2023-11-23 14:14:32,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2412300.0, ans=0.0 2023-11-23 14:14:34,859 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361850 2023-11-23 14:14:37,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.240e+01 8.744e+01 9.286e+01 1.135e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-23 14:14:37,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2412300.0, ans=0.0 2023-11-23 14:14:56,284 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1150, loss[loss=0.05403, simple_loss=0.07228, pruned_loss=0.007234, audio_tagging_loss=0.01065, over 15501.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09202, pruned_loss=0.01369, audio_tagging_loss=0.009049, over 3036060.77 frames. ], batch size: 60, lr: 2.21e-03, grad_scale: 16.0 2023-11-23 14:14:56,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2412433.3333333335, ans=0.1 2023-11-23 14:14:56,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2412433.3333333335, ans=0.125 2023-11-23 14:15:39,815 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361900 2023-11-23 14:16:02,440 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1200, loss[loss=0.07434, simple_loss=0.1097, pruned_loss=0.01538, audio_tagging_loss=0.004107, over 15193.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09141, pruned_loss=0.01368, audio_tagging_loss=0.009018, over 3035276.71 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:16:44,764 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 361950 2023-11-23 14:16:48,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.465e+01 9.389e+01 1.006e+02 1.334e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-23 14:17:06,339 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1250, loss[loss=0.05168, simple_loss=0.06324, pruned_loss=0.0111, audio_tagging_loss=0.008955, over 14717.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.09188, pruned_loss=0.01376, audio_tagging_loss=0.008947, over 3045544.82 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:17:14,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2413100.0, ans=0.0 2023-11-23 14:17:49,035 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.700e-03 2023-11-23 14:17:50,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362000 2023-11-23 14:18:06,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2413366.6666666665, ans=0.125 2023-11-23 14:18:11,299 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1300, loss[loss=0.07462, simple_loss=0.09161, pruned_loss=0.02, audio_tagging_loss=0.008811, over 15507.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09098, pruned_loss=0.01367, audio_tagging_loss=0.009023, over 3050334.44 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:18:28,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2413500.0, ans=0.0 2023-11-23 14:18:29,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2413500.0, ans=0.125 2023-11-23 14:18:31,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2413500.0, ans=0.0 2023-11-23 14:18:45,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2413566.6666666665, ans=0.0 2023-11-23 14:18:54,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362050 2023-11-23 14:18:58,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 8.227e+01 8.895e+01 9.502e+01 1.940e+02, threshold=1.779e+02, percent-clipped=1.0 2023-11-23 14:19:04,408 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 14:19:17,498 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1350, loss[loss=0.06398, simple_loss=0.08505, pruned_loss=0.009835, audio_tagging_loss=0.01162, over 14935.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09162, pruned_loss=0.01373, audio_tagging_loss=0.009025, over 3045379.65 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:19:35,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2413833.3333333335, ans=0.2 2023-11-23 14:20:01,081 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362100 2023-11-23 14:20:03,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2413966.6666666665, ans=0.1 2023-11-23 14:20:04,680 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:20:04,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2413966.6666666665, ans=0.125 2023-11-23 14:20:11,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2414033.3333333335, ans=0.125 2023-11-23 14:20:16,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-23 14:20:22,789 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1400, loss[loss=0.06537, simple_loss=0.09344, pruned_loss=0.01075, audio_tagging_loss=0.007896, over 15663.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09123, pruned_loss=0.01357, audio_tagging_loss=0.009078, over 3048799.36 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:21:06,220 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362150 2023-11-23 14:21:07,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2414300.0, ans=0.04949747468305833 2023-11-23 14:21:09,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.379e+01 8.906e+01 9.568e+01 1.474e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 14:21:15,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2414366.6666666665, ans=0.125 2023-11-23 14:21:23,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=15.0 2023-11-23 14:21:27,364 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1450, loss[loss=0.07836, simple_loss=0.1148, pruned_loss=0.01309, audio_tagging_loss=0.007884, over 16184.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09122, pruned_loss=0.01355, audio_tagging_loss=0.009258, over 3048306.95 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:21:35,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2414433.3333333335, ans=0.0 2023-11-23 14:21:37,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2414433.3333333335, ans=0.0 2023-11-23 14:21:41,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2414500.0, ans=0.1 2023-11-23 14:21:49,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2414500.0, ans=0.2 2023-11-23 14:21:57,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-23 14:22:10,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362200 2023-11-23 14:22:20,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-23 14:22:34,345 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1500, loss[loss=0.0498, simple_loss=0.05818, pruned_loss=0.006965, audio_tagging_loss=0.01375, over 16135.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09148, pruned_loss=0.01368, audio_tagging_loss=0.009315, over 3047815.25 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:22:35,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2414766.6666666665, ans=0.0 2023-11-23 14:22:59,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2414900.0, ans=0.125 2023-11-23 14:23:01,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2414900.0, ans=0.0 2023-11-23 14:23:16,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362250 2023-11-23 14:23:20,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.572e+01 9.255e+01 1.007e+02 1.352e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-23 14:23:30,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2415033.3333333335, ans=0.0 2023-11-23 14:23:38,720 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1550, loss[loss=0.06708, simple_loss=0.09604, pruned_loss=0.009415, audio_tagging_loss=0.009646, over 14417.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09223, pruned_loss=0.0139, audio_tagging_loss=0.009274, over 3047131.19 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:23:39,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2415100.0, ans=0.125 2023-11-23 14:23:40,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2415100.0, ans=0.125 2023-11-23 14:23:42,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2415100.0, ans=0.125 2023-11-23 14:24:21,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-23 14:24:22,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362300 2023-11-23 14:24:40,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2415366.6666666665, ans=0.125 2023-11-23 14:24:42,968 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1600, loss[loss=0.07665, simple_loss=0.1051, pruned_loss=0.0162, audio_tagging_loss=0.007879, over 14228.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09197, pruned_loss=0.01385, audio_tagging_loss=0.009402, over 3043730.28 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:25:05,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2415500.0, ans=0.125 2023-11-23 14:25:13,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2415566.6666666665, ans=0.09899494936611666 2023-11-23 14:25:14,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2415566.6666666665, ans=0.125 2023-11-23 14:25:16,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2415566.6666666665, ans=0.125 2023-11-23 14:25:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2415566.6666666665, ans=0.1 2023-11-23 14:25:25,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2415633.3333333335, ans=0.125 2023-11-23 14:25:26,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362350 2023-11-23 14:25:26,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2415633.3333333335, ans=0.125 2023-11-23 14:25:27,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-23 14:25:30,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.201e+01 8.953e+01 9.579e+01 1.488e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 14:25:48,503 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1650, loss[loss=0.06565, simple_loss=0.08583, pruned_loss=0.01336, audio_tagging_loss=0.009375, over 13871.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09141, pruned_loss=0.01388, audio_tagging_loss=0.009368, over 3044553.52 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:26:06,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-23 14:26:31,326 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362400 2023-11-23 14:26:53,708 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1700, loss[loss=0.06252, simple_loss=0.08973, pruned_loss=0.01171, audio_tagging_loss=0.005949, over 15506.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.0914, pruned_loss=0.01385, audio_tagging_loss=0.009411, over 3052534.70 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:27:23,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2416233.3333333335, ans=0.125 2023-11-23 14:27:34,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2416300.0, ans=0.0 2023-11-23 14:27:36,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362450 2023-11-23 14:27:36,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2416300.0, ans=0.125 2023-11-23 14:27:40,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.569e+01 8.211e+01 9.052e+01 9.581e+01 1.125e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 14:27:42,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2416300.0, ans=0.125 2023-11-23 14:27:45,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-23 14:27:49,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2416366.6666666665, ans=0.1 2023-11-23 14:27:51,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2416366.6666666665, ans=0.0 2023-11-23 14:27:57,271 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1750, loss[loss=0.07862, simple_loss=0.09851, pruned_loss=0.01666, audio_tagging_loss=0.01271, over 13762.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09187, pruned_loss=0.01379, audio_tagging_loss=0.009317, over 3051884.10 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:28:06,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2416433.3333333335, ans=0.125 2023-11-23 14:28:17,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-23 14:28:20,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2416500.0, ans=0.125 2023-11-23 14:28:24,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2416566.6666666665, ans=0.125 2023-11-23 14:28:34,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2416566.6666666665, ans=0.1 2023-11-23 14:28:40,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362500 2023-11-23 14:28:43,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2416633.3333333335, ans=0.1 2023-11-23 14:28:46,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2416633.3333333335, ans=0.125 2023-11-23 14:28:47,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2416633.3333333335, ans=0.0 2023-11-23 14:28:48,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2416700.0, ans=0.125 2023-11-23 14:28:48,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2416700.0, ans=0.125 2023-11-23 14:29:02,033 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1800, loss[loss=0.05807, simple_loss=0.07689, pruned_loss=0.01069, audio_tagging_loss=0.008936, over 15050.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09215, pruned_loss=0.01381, audio_tagging_loss=0.009188, over 3051131.07 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:29:07,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2416766.6666666665, ans=0.0 2023-11-23 14:29:44,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362550 2023-11-23 14:29:48,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.278e+01 8.801e+01 9.362e+01 1.432e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-23 14:30:07,995 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1850, loss[loss=0.05227, simple_loss=0.07045, pruned_loss=0.008545, audio_tagging_loss=0.008495, over 14840.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.0919, pruned_loss=0.0138, audio_tagging_loss=0.009165, over 3049350.93 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:30:10,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2417100.0, ans=0.2 2023-11-23 14:30:39,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2417233.3333333335, ans=0.125 2023-11-23 14:30:47,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2417300.0, ans=0.125 2023-11-23 14:30:50,706 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362600 2023-11-23 14:30:57,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-23 14:31:05,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2023-11-23 14:31:12,190 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1900, loss[loss=0.07459, simple_loss=0.1099, pruned_loss=0.01228, audio_tagging_loss=0.007343, over 15647.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09128, pruned_loss=0.01357, audio_tagging_loss=0.009154, over 3047629.96 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:31:13,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2417433.3333333335, ans=0.2 2023-11-23 14:31:14,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=22.5 2023-11-23 14:31:27,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2417500.0, ans=0.125 2023-11-23 14:31:28,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2417500.0, ans=0.0 2023-11-23 14:31:40,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2417566.6666666665, ans=22.5 2023-11-23 14:31:54,845 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362650 2023-11-23 14:31:58,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.275e+01 8.858e+01 9.597e+01 1.368e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 14:32:02,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-23 14:32:16,146 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 1950, loss[loss=0.06788, simple_loss=0.09398, pruned_loss=0.01293, audio_tagging_loss=0.007958, over 16394.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09094, pruned_loss=0.01362, audio_tagging_loss=0.009116, over 3043095.54 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:32:25,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2417766.6666666665, ans=0.1 2023-11-23 14:32:38,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-23 14:32:58,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362700 2023-11-23 14:33:15,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-23 14:33:18,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2418033.3333333335, ans=0.125 2023-11-23 14:33:20,868 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2000, loss[loss=0.0583, simple_loss=0.07384, pruned_loss=0.009516, audio_tagging_loss=0.01187, over 14268.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.0908, pruned_loss=0.01354, audio_tagging_loss=0.009092, over 3037740.12 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:33:35,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2418166.6666666665, ans=0.125 2023-11-23 14:33:53,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2418233.3333333335, ans=0.125 2023-11-23 14:33:53,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2418233.3333333335, ans=0.125 2023-11-23 14:34:03,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362750 2023-11-23 14:34:05,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2418300.0, ans=0.2 2023-11-23 14:34:10,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.308e+01 8.872e+01 9.432e+01 1.250e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 14:34:22,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2418366.6666666665, ans=0.125 2023-11-23 14:34:24,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2418433.3333333335, ans=0.125 2023-11-23 14:34:26,002 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2050, loss[loss=0.04382, simple_loss=0.05522, pruned_loss=0.006889, audio_tagging_loss=0.009325, over 15474.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09143, pruned_loss=0.01375, audio_tagging_loss=0.009033, over 3044114.94 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:34:26,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2418433.3333333335, ans=0.1 2023-11-23 14:34:46,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2418500.0, ans=0.2 2023-11-23 14:35:01,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2418566.6666666665, ans=0.125 2023-11-23 14:35:06,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2418633.3333333335, ans=0.125 2023-11-23 14:35:06,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2418633.3333333335, ans=0.04949747468305833 2023-11-23 14:35:09,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362800 2023-11-23 14:35:31,085 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2100, loss[loss=0.06469, simple_loss=0.09214, pruned_loss=0.0107, audio_tagging_loss=0.00792, over 15249.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09144, pruned_loss=0.01374, audio_tagging_loss=0.009031, over 3042425.85 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:35:31,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2023-11-23 14:36:01,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2418900.0, ans=0.2 2023-11-23 14:36:03,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2418900.0, ans=0.2 2023-11-23 14:36:14,739 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362850 2023-11-23 14:36:20,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.402e+01 8.985e+01 9.676e+01 1.115e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 14:36:21,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2418966.6666666665, ans=0.125 2023-11-23 14:36:37,680 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2150, loss[loss=0.08217, simple_loss=0.1156, pruned_loss=0.01674, audio_tagging_loss=0.007613, over 15410.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09107, pruned_loss=0.01382, audio_tagging_loss=0.009116, over 3044579.25 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:36:38,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2419100.0, ans=0.125 2023-11-23 14:36:39,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2419100.0, ans=0.0 2023-11-23 14:36:53,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2419166.6666666665, ans=0.0 2023-11-23 14:36:55,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2419166.6666666665, ans=0.125 2023-11-23 14:36:59,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2419166.6666666665, ans=0.125 2023-11-23 14:37:00,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2419166.6666666665, ans=0.125 2023-11-23 14:37:05,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2419233.3333333335, ans=0.125 2023-11-23 14:37:05,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2419233.3333333335, ans=0.125 2023-11-23 14:37:08,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2419233.3333333335, ans=0.125 2023-11-23 14:37:14,948 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:37:19,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2419300.0, ans=0.0 2023-11-23 14:37:20,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362900 2023-11-23 14:37:29,944 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 14:37:42,018 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2200, loss[loss=0.05829, simple_loss=0.0804, pruned_loss=0.01075, audio_tagging_loss=0.007343, over 14528.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.0915, pruned_loss=0.01387, audio_tagging_loss=0.00907, over 3048638.09 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:37:45,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-23 14:37:48,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-23 14:37:54,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-23 14:38:05,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2419500.0, ans=0.125 2023-11-23 14:38:25,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 362950 2023-11-23 14:38:31,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.445e+01 9.122e+01 9.777e+01 1.344e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-23 14:38:33,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2419700.0, ans=0.0 2023-11-23 14:38:34,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2419700.0, ans=0.1 2023-11-23 14:38:47,332 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2250, loss[loss=0.05804, simple_loss=0.08164, pruned_loss=0.01077, audio_tagging_loss=0.006443, over 16174.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09158, pruned_loss=0.01401, audio_tagging_loss=0.009082, over 3056120.29 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:38:47,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2419766.6666666665, ans=0.125 2023-11-23 14:38:57,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2419766.6666666665, ans=0.1 2023-11-23 14:39:14,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2419900.0, ans=0.09899494936611666 2023-11-23 14:39:31,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363000 2023-11-23 14:39:31,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2419966.6666666665, ans=0.125 2023-11-23 14:39:50,452 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 14:39:54,457 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2300, loss[loss=0.07521, simple_loss=0.1016, pruned_loss=0.01567, audio_tagging_loss=0.008747, over 15654.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09135, pruned_loss=0.01389, audio_tagging_loss=0.009136, over 3049776.49 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:40:05,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2420166.6666666665, ans=0.0 2023-11-23 14:40:09,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2420166.6666666665, ans=0.2 2023-11-23 14:40:17,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2420166.6666666665, ans=0.125 2023-11-23 14:40:18,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-23 14:40:35,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-23 14:40:36,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363050 2023-11-23 14:40:43,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.427e+01 9.145e+01 9.809e+01 1.490e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-23 14:40:47,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2420366.6666666665, ans=0.0 2023-11-23 14:40:49,900 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:40:53,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2420366.6666666665, ans=0.125 2023-11-23 14:40:58,431 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2350, loss[loss=0.07576, simple_loss=0.1029, pruned_loss=0.01518, audio_tagging_loss=0.009117, over 15837.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09091, pruned_loss=0.01381, audio_tagging_loss=0.009206, over 3047443.49 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:41:01,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2420433.3333333335, ans=0.04949747468305833 2023-11-23 14:41:13,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-11-23 14:41:34,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2420566.6666666665, ans=0.125 2023-11-23 14:41:41,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363100 2023-11-23 14:42:02,799 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2400, loss[loss=0.07252, simple_loss=0.09879, pruned_loss=0.01293, audio_tagging_loss=0.01019, over 14836.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09226, pruned_loss=0.01391, audio_tagging_loss=0.009228, over 3046000.20 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:42:04,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2420766.6666666665, ans=0.125 2023-11-23 14:42:14,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-23 14:42:17,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2420833.3333333335, ans=0.1 2023-11-23 14:42:17,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2420833.3333333335, ans=0.125 2023-11-23 14:42:39,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2420900.0, ans=0.125 2023-11-23 14:42:43,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2420966.6666666665, ans=0.125 2023-11-23 14:42:44,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2420966.6666666665, ans=0.125 2023-11-23 14:42:45,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363150 2023-11-23 14:42:46,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2420966.6666666665, ans=0.1 2023-11-23 14:42:49,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-23 14:42:51,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.436e+01 9.019e+01 9.718e+01 1.173e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 14:42:53,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2023-11-23 14:42:56,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2421033.3333333335, ans=0.125 2023-11-23 14:43:07,837 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2450, loss[loss=0.05205, simple_loss=0.07293, pruned_loss=0.006839, audio_tagging_loss=0.008744, over 15749.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09173, pruned_loss=0.0138, audio_tagging_loss=0.009367, over 3038352.88 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:43:35,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2421233.3333333335, ans=0.125 2023-11-23 14:43:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2421300.0, ans=0.125 2023-11-23 14:43:50,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363200 2023-11-23 14:44:12,623 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2500, loss[loss=0.07122, simple_loss=0.09442, pruned_loss=0.01526, audio_tagging_loss=0.008748, over 15000.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09237, pruned_loss=0.01398, audio_tagging_loss=0.009413, over 3039248.85 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:44:15,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-23 14:44:20,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2421433.3333333335, ans=0.125 2023-11-23 14:44:22,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2421433.3333333335, ans=0.125 2023-11-23 14:44:28,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2421500.0, ans=0.1 2023-11-23 14:44:28,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2421500.0, ans=0.125 2023-11-23 14:44:33,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2421500.0, ans=0.125 2023-11-23 14:44:38,170 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 14:44:51,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2421633.3333333335, ans=0.2 2023-11-23 14:44:56,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363250 2023-11-23 14:45:01,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.211e+01 9.083e+01 9.843e+01 1.373e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 14:45:17,036 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2550, loss[loss=0.06944, simple_loss=0.0817, pruned_loss=0.01896, audio_tagging_loss=0.009629, over 15351.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09284, pruned_loss=0.01403, audio_tagging_loss=0.009254, over 3043636.78 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:45:25,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-23 14:45:34,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=12.0 2023-11-23 14:46:00,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363300 2023-11-23 14:46:05,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2421966.6666666665, ans=0.1 2023-11-23 14:46:12,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2422033.3333333335, ans=0.125 2023-11-23 14:46:17,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422033.3333333335, ans=0.1 2023-11-23 14:46:22,958 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2600, loss[loss=0.08154, simple_loss=0.1035, pruned_loss=0.02005, audio_tagging_loss=0.009739, over 15279.00 frames. ], tot_loss[loss=0.069, simple_loss=0.0922, pruned_loss=0.01376, audio_tagging_loss=0.009136, over 3043380.29 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:46:42,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2422166.6666666665, ans=0.125 2023-11-23 14:46:45,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-23 14:46:56,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2422233.3333333335, ans=0.0 2023-11-23 14:47:05,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363350 2023-11-23 14:47:05,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2422300.0, ans=0.125 2023-11-23 14:47:09,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-23 14:47:10,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2422300.0, ans=0.0 2023-11-23 14:47:12,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.370e+01 9.064e+01 9.837e+01 1.240e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-23 14:47:28,333 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2650, loss[loss=0.06856, simple_loss=0.09026, pruned_loss=0.01288, audio_tagging_loss=0.01055, over 15662.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09216, pruned_loss=0.01372, audio_tagging_loss=0.009144, over 3048569.87 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:47:45,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2422500.0, ans=0.125 2023-11-23 14:48:02,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-23 14:48:11,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363400 2023-11-23 14:48:32,428 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2700, loss[loss=0.08003, simple_loss=0.1109, pruned_loss=0.01616, audio_tagging_loss=0.008422, over 15285.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.0928, pruned_loss=0.01398, audio_tagging_loss=0.009046, over 3044176.23 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:48:52,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2422833.3333333335, ans=0.0 2023-11-23 14:48:57,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2422833.3333333335, ans=0.2 2023-11-23 14:49:04,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-23 14:49:15,903 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363450 2023-11-23 14:49:23,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.399e+01 8.847e+01 9.605e+01 1.237e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 14:49:29,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2423033.3333333335, ans=0.0 2023-11-23 14:49:35,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 14:49:37,790 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2750, loss[loss=0.08311, simple_loss=0.1082, pruned_loss=0.01948, audio_tagging_loss=0.009519, over 15196.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09188, pruned_loss=0.01379, audio_tagging_loss=0.009037, over 3043933.66 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:49:44,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-23 14:49:49,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2423100.0, ans=0.125 2023-11-23 14:50:20,855 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363500 2023-11-23 14:50:27,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2423300.0, ans=0.09899494936611666 2023-11-23 14:50:33,613 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:50:43,971 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2800, loss[loss=0.05312, simple_loss=0.0745, pruned_loss=0.007469, audio_tagging_loss=0.008395, over 14142.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09123, pruned_loss=0.01368, audio_tagging_loss=0.009011, over 3039735.85 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 14:50:46,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2423433.3333333335, ans=0.1 2023-11-23 14:51:06,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2423500.0, ans=0.125 2023-11-23 14:51:12,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2423566.6666666665, ans=0.2 2023-11-23 14:51:27,453 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363550 2023-11-23 14:51:36,067 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.140e+01 8.811e+01 9.422e+01 1.418e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-23 14:51:41,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2023-11-23 14:51:48,470 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2850, loss[loss=0.05735, simple_loss=0.07504, pruned_loss=0.01054, audio_tagging_loss=0.009285, over 14947.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09141, pruned_loss=0.01363, audio_tagging_loss=0.008959, over 3040328.39 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:51:56,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-23 14:52:01,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=22.5 2023-11-23 14:52:20,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2423900.0, ans=0.0 2023-11-23 14:52:21,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2423900.0, ans=0.125 2023-11-23 14:52:30,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2423966.6666666665, ans=0.1 2023-11-23 14:52:31,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363600 2023-11-23 14:52:45,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-23 14:52:46,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2023-11-23 14:52:52,479 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2900, loss[loss=0.06173, simple_loss=0.08041, pruned_loss=0.01139, audio_tagging_loss=0.01014, over 14826.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09241, pruned_loss=0.01383, audio_tagging_loss=0.008979, over 3039041.30 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:52:57,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2424100.0, ans=0.0 2023-11-23 14:53:07,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2424166.6666666665, ans=0.125 2023-11-23 14:53:23,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2424233.3333333335, ans=0.2 2023-11-23 14:53:35,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363650 2023-11-23 14:53:44,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 8.514e+01 9.320e+01 9.888e+01 1.457e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-23 14:53:59,188 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 2950, loss[loss=0.07674, simple_loss=0.1089, pruned_loss=0.0147, audio_tagging_loss=0.007587, over 15806.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.0931, pruned_loss=0.01393, audio_tagging_loss=0.00896, over 3043817.36 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:54:06,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-11-23 14:54:19,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-23 14:54:37,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2424633.3333333335, ans=0.09899494936611666 2023-11-23 14:54:41,427 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363700 2023-11-23 14:54:51,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2424700.0, ans=0.125 2023-11-23 14:55:03,634 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3000, loss[loss=0.07563, simple_loss=0.09919, pruned_loss=0.0147, audio_tagging_loss=0.01133, over 15761.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09241, pruned_loss=0.01391, audio_tagging_loss=0.009093, over 3047852.92 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:55:03,635 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 14:55:45,739 INFO [train_asr.py:1253] (1/4) Epoch 31, validation: loss=0.0577, simple_loss=0.05103, pruned_loss=0.005016, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-23 14:55:45,740 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 14:55:58,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2424833.3333333335, ans=0.0 2023-11-23 14:56:28,214 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363750 2023-11-23 14:56:37,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.570e+01 9.368e+01 9.997e+01 1.561e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-23 14:56:46,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2425033.3333333335, ans=0.2 2023-11-23 14:56:46,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-11-23 14:56:50,878 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3050, loss[loss=0.08062, simple_loss=0.1076, pruned_loss=0.01758, audio_tagging_loss=0.00925, over 16038.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09198, pruned_loss=0.01395, audio_tagging_loss=0.009196, over 3050190.31 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:57:05,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2425166.6666666665, ans=0.125 2023-11-23 14:57:27,977 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 14:57:32,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2425300.0, ans=0.125 2023-11-23 14:57:34,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363800 2023-11-23 14:57:40,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2425300.0, ans=0.125 2023-11-23 14:57:52,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2425366.6666666665, ans=0.1 2023-11-23 14:57:55,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2425433.3333333335, ans=0.0 2023-11-23 14:57:56,140 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3100, loss[loss=0.06525, simple_loss=0.096, pruned_loss=0.009693, audio_tagging_loss=0.007557, over 15766.00 frames. ], tot_loss[loss=0.06916, simple_loss=0.09212, pruned_loss=0.01388, audio_tagging_loss=0.009229, over 3040822.65 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:58:17,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2023-11-23 14:58:39,601 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363850 2023-11-23 14:58:42,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2425633.3333333335, ans=0.2 2023-11-23 14:58:47,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.546e+01 8.908e+01 9.738e+01 1.357e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 14:59:00,935 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3150, loss[loss=0.06611, simple_loss=0.08919, pruned_loss=0.01248, audio_tagging_loss=0.009039, over 15503.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09219, pruned_loss=0.01377, audio_tagging_loss=0.00926, over 3047494.85 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 14:59:05,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2425766.6666666665, ans=0.125 2023-11-23 14:59:09,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2425766.6666666665, ans=0.1 2023-11-23 14:59:10,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2425766.6666666665, ans=0.2 2023-11-23 14:59:31,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-23 14:59:43,595 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363900 2023-11-23 14:59:54,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2426033.3333333335, ans=0.1 2023-11-23 15:00:02,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426033.3333333335, ans=0.125 2023-11-23 15:00:06,087 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3200, loss[loss=0.06675, simple_loss=0.09601, pruned_loss=0.01126, audio_tagging_loss=0.00748, over 15599.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.0922, pruned_loss=0.01374, audio_tagging_loss=0.009354, over 3046030.31 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:00:06,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2426100.0, ans=0.125 2023-11-23 15:00:13,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2426100.0, ans=0.125 2023-11-23 15:00:30,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-23 15:00:43,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2426300.0, ans=0.125 2023-11-23 15:00:46,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2426300.0, ans=0.125 2023-11-23 15:00:48,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2426300.0, ans=0.125 2023-11-23 15:00:49,088 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 363950 2023-11-23 15:00:58,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.380e+01 9.015e+01 9.537e+01 1.251e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 15:01:11,007 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3250, loss[loss=0.06282, simple_loss=0.08016, pruned_loss=0.0125, audio_tagging_loss=0.01025, over 15479.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09131, pruned_loss=0.01356, audio_tagging_loss=0.009455, over 3045694.68 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:01:31,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-23 15:01:40,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2426566.6666666665, ans=0.125 2023-11-23 15:01:41,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2426566.6666666665, ans=0.07 2023-11-23 15:01:53,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364000 2023-11-23 15:02:18,576 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3300, loss[loss=0.072, simple_loss=0.09381, pruned_loss=0.0168, audio_tagging_loss=0.008292, over 14432.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09128, pruned_loss=0.01364, audio_tagging_loss=0.009478, over 3041513.86 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:02:18,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2426766.6666666665, ans=0.125 2023-11-23 15:02:29,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2426766.6666666665, ans=0.1 2023-11-23 15:02:34,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-23 15:02:42,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2426833.3333333335, ans=0.1 2023-11-23 15:02:43,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2426900.0, ans=0.2 2023-11-23 15:02:45,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=12.0 2023-11-23 15:02:52,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2426900.0, ans=0.1 2023-11-23 15:02:57,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2426966.6666666665, ans=0.09899494936611666 2023-11-23 15:03:01,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364050 2023-11-23 15:03:10,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.350e+01 9.048e+01 9.771e+01 1.182e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 15:03:23,927 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3350, loss[loss=0.05609, simple_loss=0.06564, pruned_loss=0.01211, audio_tagging_loss=0.01116, over 15347.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09183, pruned_loss=0.01382, audio_tagging_loss=0.009457, over 3045222.01 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:03:42,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2023-11-23 15:04:05,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2427300.0, ans=0.125 2023-11-23 15:04:06,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364100 2023-11-23 15:04:14,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2427366.6666666665, ans=0.125 2023-11-23 15:04:28,269 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3400, loss[loss=0.07359, simple_loss=0.09959, pruned_loss=0.01525, audio_tagging_loss=0.008541, over 15553.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09112, pruned_loss=0.0137, audio_tagging_loss=0.009308, over 3044180.89 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:04:30,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2427433.3333333335, ans=0.125 2023-11-23 15:04:36,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-11-23 15:05:06,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2427633.3333333335, ans=0.125 2023-11-23 15:05:11,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364150 2023-11-23 15:05:19,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2427700.0, ans=0.2 2023-11-23 15:05:21,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.657e+01 9.005e+01 9.760e+01 1.355e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 15:05:23,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2427700.0, ans=0.05 2023-11-23 15:05:32,853 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3450, loss[loss=0.05974, simple_loss=0.06802, pruned_loss=0.01551, audio_tagging_loss=0.01022, over 15190.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.0916, pruned_loss=0.01379, audio_tagging_loss=0.009245, over 3040914.12 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:05:50,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2427833.3333333335, ans=0.125 2023-11-23 15:06:14,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-23 15:06:16,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364200 2023-11-23 15:06:39,846 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3500, loss[loss=0.05758, simple_loss=0.07847, pruned_loss=0.007963, audio_tagging_loss=0.01039, over 15817.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.091, pruned_loss=0.01358, audio_tagging_loss=0.009223, over 3046949.97 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:06:43,720 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 15:06:46,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2428100.0, ans=0.0 2023-11-23 15:06:51,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2428166.6666666665, ans=0.125 2023-11-23 15:06:51,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-23 15:07:10,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2428233.3333333335, ans=0.1 2023-11-23 15:07:11,429 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:07:21,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364250 2023-11-23 15:07:32,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.333e+01 8.735e+01 9.330e+01 1.206e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-23 15:07:37,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2428366.6666666665, ans=0.125 2023-11-23 15:07:39,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2428366.6666666665, ans=0.95 2023-11-23 15:07:43,888 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3550, loss[loss=0.04998, simple_loss=0.06138, pruned_loss=0.008458, audio_tagging_loss=0.01084, over 15891.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09172, pruned_loss=0.01373, audio_tagging_loss=0.009133, over 3052018.56 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:07:54,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2428433.3333333335, ans=0.0 2023-11-23 15:08:25,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2428633.3333333335, ans=0.1 2023-11-23 15:08:26,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2428633.3333333335, ans=0.1 2023-11-23 15:08:27,278 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364300 2023-11-23 15:08:28,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2428633.3333333335, ans=0.125 2023-11-23 15:08:38,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2428700.0, ans=0.1 2023-11-23 15:08:49,089 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3600, loss[loss=0.08162, simple_loss=0.1104, pruned_loss=0.01721, audio_tagging_loss=0.009213, over 15362.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09131, pruned_loss=0.01365, audio_tagging_loss=0.009074, over 3045401.51 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:08:50,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2428766.6666666665, ans=0.125 2023-11-23 15:09:11,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2428833.3333333335, ans=0.2 2023-11-23 15:09:12,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2428833.3333333335, ans=0.1 2023-11-23 15:09:22,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2428900.0, ans=0.1 2023-11-23 15:09:31,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364350 2023-11-23 15:09:42,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.312e+01 8.847e+01 9.562e+01 1.157e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-23 15:09:48,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-11-23 15:09:54,398 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3650, loss[loss=0.08142, simple_loss=0.1157, pruned_loss=0.01703, audio_tagging_loss=0.006514, over 15503.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09225, pruned_loss=0.01384, audio_tagging_loss=0.009018, over 3054668.75 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:10:25,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2429233.3333333335, ans=0.2 2023-11-23 15:10:32,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2429300.0, ans=0.1 2023-11-23 15:10:36,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364400 2023-11-23 15:10:59,370 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3700, loss[loss=0.06686, simple_loss=0.09126, pruned_loss=0.01358, audio_tagging_loss=0.007647, over 15273.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09203, pruned_loss=0.0139, audio_tagging_loss=0.009019, over 3050984.91 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:10:59,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2429433.3333333335, ans=0.125 2023-11-23 15:11:00,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2429433.3333333335, ans=0.1 2023-11-23 15:11:03,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-11-23 15:11:06,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2429433.3333333335, ans=0.1 2023-11-23 15:11:07,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2429433.3333333335, ans=0.05 2023-11-23 15:11:08,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2429433.3333333335, ans=0.0 2023-11-23 15:11:13,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2429500.0, ans=0.0 2023-11-23 15:11:14,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2429500.0, ans=0.125 2023-11-23 15:11:38,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2429633.3333333335, ans=0.0 2023-11-23 15:11:42,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364450 2023-11-23 15:11:45,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2429633.3333333335, ans=0.0 2023-11-23 15:11:52,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.697e+01 8.537e+01 9.041e+01 9.821e+01 1.313e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-23 15:11:56,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2429700.0, ans=0.125 2023-11-23 15:12:01,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2429700.0, ans=0.125 2023-11-23 15:12:03,711 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3750, loss[loss=0.06087, simple_loss=0.0708, pruned_loss=0.01121, audio_tagging_loss=0.01425, over 15439.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09213, pruned_loss=0.01387, audio_tagging_loss=0.00901, over 3048653.83 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:12:07,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2429766.6666666665, ans=0.0 2023-11-23 15:12:34,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2429900.0, ans=0.05 2023-11-23 15:12:44,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2429966.6666666665, ans=0.1 2023-11-23 15:12:46,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364500 2023-11-23 15:12:47,573 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:12:47,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2429966.6666666665, ans=0.125 2023-11-23 15:12:50,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-23 15:12:51,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2023-11-23 15:13:04,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2023-11-23 15:13:08,465 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3800, loss[loss=0.09997, simple_loss=0.1364, pruned_loss=0.02389, audio_tagging_loss=0.007889, over 15684.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09284, pruned_loss=0.01386, audio_tagging_loss=0.009018, over 3053640.53 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:13:14,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-23 15:13:17,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2430100.0, ans=0.95 2023-11-23 15:13:17,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2430100.0, ans=0.125 2023-11-23 15:13:48,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2430300.0, ans=0.0 2023-11-23 15:13:50,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364550 2023-11-23 15:14:04,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.445e+01 8.986e+01 9.684e+01 1.334e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 15:14:04,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2430366.6666666665, ans=0.04949747468305833 2023-11-23 15:14:14,495 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3850, loss[loss=0.07105, simple_loss=0.0886, pruned_loss=0.01666, audio_tagging_loss=0.01009, over 15208.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.09264, pruned_loss=0.01392, audio_tagging_loss=0.009139, over 3046274.49 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:14:25,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2430500.0, ans=0.125 2023-11-23 15:14:26,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2023-11-23 15:14:57,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364600 2023-11-23 15:15:09,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2430700.0, ans=0.2 2023-11-23 15:15:18,953 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3900, loss[loss=0.08001, simple_loss=0.0956, pruned_loss=0.02293, audio_tagging_loss=0.009281, over 15309.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09209, pruned_loss=0.01389, audio_tagging_loss=0.00912, over 3044924.21 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:15:33,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-23 15:16:01,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-23 15:16:02,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364650 2023-11-23 15:16:14,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.382e+01 9.163e+01 9.741e+01 1.346e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 15:16:18,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2431033.3333333335, ans=0.0 2023-11-23 15:16:23,925 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 3950, loss[loss=0.074, simple_loss=0.1015, pruned_loss=0.01458, audio_tagging_loss=0.008664, over 14650.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09254, pruned_loss=0.01393, audio_tagging_loss=0.009192, over 3046390.79 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 8.0 2023-11-23 15:16:42,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2431166.6666666665, ans=0.2 2023-11-23 15:16:46,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2431166.6666666665, ans=0.2 2023-11-23 15:16:49,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2431233.3333333335, ans=0.1 2023-11-23 15:17:06,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364700 2023-11-23 15:17:14,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2431366.6666666665, ans=0.125 2023-11-23 15:17:24,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-23 15:17:29,987 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4000, loss[loss=0.06821, simple_loss=0.09187, pruned_loss=0.01314, audio_tagging_loss=0.009135, over 14741.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09271, pruned_loss=0.01406, audio_tagging_loss=0.009291, over 3046603.62 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:17:36,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-23 15:17:48,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-23 15:18:12,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364750 2023-11-23 15:18:19,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2431633.3333333335, ans=10.0 2023-11-23 15:18:25,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.571e+01 8.345e+01 8.888e+01 9.598e+01 2.808e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-23 15:18:28,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2431700.0, ans=0.125 2023-11-23 15:18:29,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2431700.0, ans=0.125 2023-11-23 15:18:33,724 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4050, loss[loss=0.07952, simple_loss=0.1095, pruned_loss=0.01666, audio_tagging_loss=0.008084, over 15416.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09255, pruned_loss=0.01387, audio_tagging_loss=0.009353, over 3041789.15 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:18:34,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2431766.6666666665, ans=0.0 2023-11-23 15:18:36,210 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:18:54,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2431833.3333333335, ans=0.2 2023-11-23 15:18:58,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2431900.0, ans=0.125 2023-11-23 15:18:58,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2431900.0, ans=0.125 2023-11-23 15:19:11,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2431966.6666666665, ans=0.125 2023-11-23 15:19:12,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2431966.6666666665, ans=0.125 2023-11-23 15:19:16,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364800 2023-11-23 15:19:22,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2431966.6666666665, ans=0.0 2023-11-23 15:19:38,079 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4100, loss[loss=0.07654, simple_loss=0.1092, pruned_loss=0.01433, audio_tagging_loss=0.007587, over 15099.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09236, pruned_loss=0.01362, audio_tagging_loss=0.009267, over 3045079.22 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:19:39,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-23 15:19:47,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=2432100.0, ans=22.5 2023-11-23 15:19:55,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2432166.6666666665, ans=0.125 2023-11-23 15:19:59,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2023-11-23 15:20:05,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-23 15:20:13,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2432233.3333333335, ans=0.035 2023-11-23 15:20:15,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2432300.0, ans=0.125 2023-11-23 15:20:18,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2023-11-23 15:20:19,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364850 2023-11-23 15:20:32,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.595e+01 9.054e+01 9.904e+01 2.115e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-23 15:20:43,312 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4150, loss[loss=0.05449, simple_loss=0.06873, pruned_loss=0.01021, audio_tagging_loss=0.009921, over 13620.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09226, pruned_loss=0.01367, audio_tagging_loss=0.009185, over 3043179.71 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:20:43,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2432433.3333333335, ans=0.5 2023-11-23 15:20:47,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2432433.3333333335, ans=0.125 2023-11-23 15:20:48,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2432433.3333333335, ans=0.125 2023-11-23 15:20:54,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2432500.0, ans=0.125 2023-11-23 15:21:04,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2432500.0, ans=0.0 2023-11-23 15:21:07,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2432566.6666666665, ans=0.0 2023-11-23 15:21:11,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2432566.6666666665, ans=0.1 2023-11-23 15:21:19,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2432633.3333333335, ans=0.125 2023-11-23 15:21:22,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2432633.3333333335, ans=0.125 2023-11-23 15:21:25,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364900 2023-11-23 15:21:29,311 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:21:31,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2023-11-23 15:21:33,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2432700.0, ans=0.1 2023-11-23 15:21:43,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2432700.0, ans=0.0 2023-11-23 15:21:47,301 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4200, loss[loss=0.08911, simple_loss=0.1193, pruned_loss=0.02081, audio_tagging_loss=0.008664, over 15287.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09228, pruned_loss=0.0137, audio_tagging_loss=0.009119, over 3039878.60 frames. ], batch size: 55, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:22:12,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=22.5 2023-11-23 15:22:29,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2432966.6666666665, ans=0.125 2023-11-23 15:22:30,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 364950 2023-11-23 15:22:42,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.314e+01 9.166e+01 9.730e+01 1.323e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 15:22:51,138 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4250, loss[loss=0.0691, simple_loss=0.09508, pruned_loss=0.01368, audio_tagging_loss=0.007887, over 15956.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09149, pruned_loss=0.01351, audio_tagging_loss=0.009179, over 3038907.38 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:23:18,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2433233.3333333335, ans=0.125 2023-11-23 15:23:18,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-11-23 15:23:33,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365000 2023-11-23 15:23:33,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2433300.0, ans=0.5 2023-11-23 15:23:40,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2433300.0, ans=0.0 2023-11-23 15:23:42,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-23 15:23:52,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2433366.6666666665, ans=0.125 2023-11-23 15:23:56,032 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4300, loss[loss=0.089, simple_loss=0.1271, pruned_loss=0.02078, audio_tagging_loss=0.004662, over 15117.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09221, pruned_loss=0.01369, audio_tagging_loss=0.009007, over 3036106.83 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:24:26,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2433566.6666666665, ans=0.2 2023-11-23 15:24:38,093 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365050 2023-11-23 15:24:38,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2023-11-23 15:24:44,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2433633.3333333335, ans=0.125 2023-11-23 15:24:50,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.539e+01 9.202e+01 9.861e+01 1.335e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-23 15:25:00,114 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4350, loss[loss=0.07415, simple_loss=0.09072, pruned_loss=0.0182, audio_tagging_loss=0.01059, over 15700.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.09175, pruned_loss=0.01374, audio_tagging_loss=0.009011, over 3032782.65 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 16.0 2023-11-23 15:25:32,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2433900.0, ans=0.125 2023-11-23 15:25:38,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.94 vs. limit=10.0 2023-11-23 15:25:42,423 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365100 2023-11-23 15:26:03,785 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4400, loss[loss=0.06446, simple_loss=0.07651, pruned_loss=0.01735, audio_tagging_loss=0.008861, over 15344.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09218, pruned_loss=0.01383, audio_tagging_loss=0.008918, over 3041437.98 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:26:13,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2434100.0, ans=0.09899494936611666 2023-11-23 15:26:14,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2434100.0, ans=0.125 2023-11-23 15:26:24,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2434166.6666666665, ans=0.125 2023-11-23 15:26:33,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2023-11-23 15:26:35,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2023-11-23 15:26:39,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2434233.3333333335, ans=0.1 2023-11-23 15:26:46,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365150 2023-11-23 15:26:58,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=12.0 2023-11-23 15:26:59,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.264e+01 9.016e+01 9.697e+01 1.171e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 15:27:01,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2434366.6666666665, ans=0.1 2023-11-23 15:27:09,406 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4450, loss[loss=0.07248, simple_loss=0.09488, pruned_loss=0.01518, audio_tagging_loss=0.009865, over 14708.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09303, pruned_loss=0.01399, audio_tagging_loss=0.008787, over 3043028.10 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 32.0 2023-11-23 15:27:32,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2434500.0, ans=0.1 2023-11-23 15:27:39,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=22.5 2023-11-23 15:27:51,849 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365200 2023-11-23 15:28:08,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2434700.0, ans=0.125 2023-11-23 15:28:14,254 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4500, loss[loss=0.07825, simple_loss=0.09768, pruned_loss=0.01706, audio_tagging_loss=0.01235, over 15351.00 frames. ], tot_loss[loss=0.06995, simple_loss=0.09407, pruned_loss=0.01412, audio_tagging_loss=0.008792, over 3046732.76 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:28:34,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2434833.3333333335, ans=0.125 2023-11-23 15:28:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2434900.0, ans=0.125 2023-11-23 15:28:47,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-23 15:28:48,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2434900.0, ans=0.125 2023-11-23 15:28:51,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2434900.0, ans=0.125 2023-11-23 15:28:57,929 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365250 2023-11-23 15:29:03,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.502e-03 2023-11-23 15:29:05,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2435033.3333333335, ans=0.125 2023-11-23 15:29:11,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.316e+01 8.938e+01 9.967e+01 1.364e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-23 15:29:18,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-23 15:29:19,396 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4550, loss[loss=0.04399, simple_loss=0.05897, pruned_loss=0.005711, audio_tagging_loss=0.008798, over 14435.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09311, pruned_loss=0.01391, audio_tagging_loss=0.008788, over 3053592.56 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:29:22,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2435100.0, ans=0.125 2023-11-23 15:29:29,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2435100.0, ans=0.2 2023-11-23 15:29:45,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 2023-11-23 15:30:03,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365300 2023-11-23 15:30:06,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2435300.0, ans=0.1 2023-11-23 15:30:09,133 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:30:10,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2435366.6666666665, ans=0.0 2023-11-23 15:30:16,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2435366.6666666665, ans=0.0 2023-11-23 15:30:22,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2435366.6666666665, ans=0.125 2023-11-23 15:30:24,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-23 15:30:24,790 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4600, loss[loss=0.06566, simple_loss=0.08227, pruned_loss=0.01211, audio_tagging_loss=0.01241, over 15753.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09234, pruned_loss=0.01386, audio_tagging_loss=0.008916, over 3050494.39 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:30:28,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2435433.3333333335, ans=0.0 2023-11-23 15:30:36,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2435433.3333333335, ans=0.125 2023-11-23 15:30:51,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2435566.6666666665, ans=0.125 2023-11-23 15:31:08,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365350 2023-11-23 15:31:20,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2435700.0, ans=0.0 2023-11-23 15:31:22,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.583e+01 9.120e+01 9.726e+01 1.437e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-23 15:31:24,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2435700.0, ans=0.1 2023-11-23 15:31:29,879 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4650, loss[loss=0.06566, simple_loss=0.08479, pruned_loss=0.01314, audio_tagging_loss=0.01013, over 15170.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09296, pruned_loss=0.01387, audio_tagging_loss=0.009083, over 3055100.46 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:31:42,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2435833.3333333335, ans=0.125 2023-11-23 15:32:07,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2435966.6666666665, ans=0.0 2023-11-23 15:32:07,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2435966.6666666665, ans=0.125 2023-11-23 15:32:12,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365400 2023-11-23 15:32:33,929 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4700, loss[loss=0.07162, simple_loss=0.1014, pruned_loss=0.0141, audio_tagging_loss=0.006841, over 15973.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09257, pruned_loss=0.01384, audio_tagging_loss=0.009069, over 3052780.57 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:32:39,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2023-11-23 15:32:41,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2436100.0, ans=0.125 2023-11-23 15:32:45,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2436100.0, ans=0.1 2023-11-23 15:33:01,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2436233.3333333335, ans=0.125 2023-11-23 15:33:08,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2436233.3333333335, ans=0.125 2023-11-23 15:33:11,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2436233.3333333335, ans=0.95 2023-11-23 15:33:17,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365450 2023-11-23 15:33:17,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2436300.0, ans=0.0 2023-11-23 15:33:23,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2436300.0, ans=0.125 2023-11-23 15:33:26,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2436366.6666666665, ans=0.07 2023-11-23 15:33:31,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.245e+01 8.834e+01 9.507e+01 1.184e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 15:33:39,240 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4750, loss[loss=0.07857, simple_loss=0.1066, pruned_loss=0.01466, audio_tagging_loss=0.01063, over 14705.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09194, pruned_loss=0.01388, audio_tagging_loss=0.009198, over 3048176.75 frames. ], batch size: 53, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:34:06,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2436566.6666666665, ans=0.125 2023-11-23 15:34:07,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2436566.6666666665, ans=0.035 2023-11-23 15:34:22,768 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365500 2023-11-23 15:34:40,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-11-23 15:34:42,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2436700.0, ans=0.1 2023-11-23 15:34:44,603 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4800, loss[loss=0.08743, simple_loss=0.1168, pruned_loss=0.01517, audio_tagging_loss=0.01384, over 16136.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09291, pruned_loss=0.01407, audio_tagging_loss=0.009297, over 3048182.19 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 15:34:51,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2436766.6666666665, ans=0.1 2023-11-23 15:34:59,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2436833.3333333335, ans=0.95 2023-11-23 15:35:18,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2436900.0, ans=0.0 2023-11-23 15:35:19,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2436900.0, ans=0.125 2023-11-23 15:35:27,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365550 2023-11-23 15:35:42,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.333e+01 8.912e+01 9.620e+01 1.211e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 15:35:48,317 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4850, loss[loss=0.05447, simple_loss=0.07002, pruned_loss=0.008934, audio_tagging_loss=0.01053, over 13970.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09218, pruned_loss=0.01394, audio_tagging_loss=0.009435, over 3045445.04 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:35:49,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2437100.0, ans=0.0 2023-11-23 15:35:49,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2437100.0, ans=10.0 2023-11-23 15:36:07,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2437166.6666666665, ans=0.125 2023-11-23 15:36:08,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2437166.6666666665, ans=0.125 2023-11-23 15:36:16,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-23 15:36:31,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365600 2023-11-23 15:36:32,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-23 15:36:40,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2437366.6666666665, ans=0.125 2023-11-23 15:36:50,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2437366.6666666665, ans=0.07 2023-11-23 15:36:53,497 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4900, loss[loss=0.06587, simple_loss=0.08632, pruned_loss=0.01406, audio_tagging_loss=0.008651, over 15158.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09182, pruned_loss=0.01373, audio_tagging_loss=0.009314, over 3042561.03 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:37:28,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2437566.6666666665, ans=0.0 2023-11-23 15:37:35,626 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365650 2023-11-23 15:37:35,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2437633.3333333335, ans=0.2 2023-11-23 15:37:37,096 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 15:37:53,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.240e+01 8.743e+01 9.594e+01 1.319e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-23 15:37:58,809 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 4950, loss[loss=0.07909, simple_loss=0.1171, pruned_loss=0.01397, audio_tagging_loss=0.006585, over 14376.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09208, pruned_loss=0.0138, audio_tagging_loss=0.009001, over 3039871.31 frames. ], batch size: 52, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 15:38:41,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365700 2023-11-23 15:38:47,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-11-23 15:39:02,635 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5000, loss[loss=0.07989, simple_loss=0.112, pruned_loss=0.01498, audio_tagging_loss=0.008914, over 15967.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09185, pruned_loss=0.01385, audio_tagging_loss=0.008931, over 3034218.84 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 15:39:15,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-23 15:39:38,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2438233.3333333335, ans=0.125 2023-11-23 15:39:45,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365750 2023-11-23 15:39:47,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-11-23 15:39:52,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2438300.0, ans=0.125 2023-11-23 15:39:55,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-23 15:40:01,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.229e+01 8.801e+01 9.337e+01 1.229e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-23 15:40:07,578 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5050, loss[loss=0.07993, simple_loss=0.1148, pruned_loss=0.01251, audio_tagging_loss=0.01004, over 15452.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09139, pruned_loss=0.01366, audio_tagging_loss=0.008882, over 3038941.27 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 15:40:12,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2438433.3333333335, ans=0.125 2023-11-23 15:40:34,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2023-11-23 15:40:49,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365800 2023-11-23 15:40:55,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2438633.3333333335, ans=0.0 2023-11-23 15:40:56,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-11-23 15:41:12,963 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5100, loss[loss=0.05738, simple_loss=0.07493, pruned_loss=0.01066, audio_tagging_loss=0.009258, over 14343.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09122, pruned_loss=0.01369, audio_tagging_loss=0.009024, over 3039235.75 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 15:41:14,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2438766.6666666665, ans=0.5 2023-11-23 15:41:21,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2438766.6666666665, ans=0.0 2023-11-23 15:41:28,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-23 15:41:34,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2023-11-23 15:41:40,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2438900.0, ans=0.5 2023-11-23 15:41:53,818 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365850 2023-11-23 15:42:00,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2438966.6666666665, ans=0.5 2023-11-23 15:42:10,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2439033.3333333335, ans=0.2 2023-11-23 15:42:11,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.471e+01 8.600e+01 9.200e+01 9.982e+01 4.087e+02, threshold=1.840e+02, percent-clipped=1.0 2023-11-23 15:42:16,211 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5150, loss[loss=0.05692, simple_loss=0.08071, pruned_loss=0.009785, audio_tagging_loss=0.006779, over 15376.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09135, pruned_loss=0.01375, audio_tagging_loss=0.009023, over 3042458.46 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 15:42:27,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2439166.6666666665, ans=0.04949747468305833 2023-11-23 15:42:31,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2439166.6666666665, ans=0.125 2023-11-23 15:42:59,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365900 2023-11-23 15:43:17,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2439366.6666666665, ans=0.125 2023-11-23 15:43:20,525 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5200, loss[loss=0.07953, simple_loss=0.1081, pruned_loss=0.01679, audio_tagging_loss=0.008683, over 16218.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09203, pruned_loss=0.01393, audio_tagging_loss=0.009031, over 3041531.50 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:43:49,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2023-11-23 15:43:50,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2439566.6666666665, ans=0.125 2023-11-23 15:44:04,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 365950 2023-11-23 15:44:18,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2439700.0, ans=0.125 2023-11-23 15:44:21,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.221e+01 8.985e+01 9.483e+01 1.522e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 15:44:25,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2439766.6666666665, ans=0.125 2023-11-23 15:44:27,291 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5250, loss[loss=0.07718, simple_loss=0.1129, pruned_loss=0.0142, audio_tagging_loss=0.006554, over 15187.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.0923, pruned_loss=0.01387, audio_tagging_loss=0.009022, over 3046769.25 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:45:09,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366000 2023-11-23 15:45:29,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2440033.3333333335, ans=0.0 2023-11-23 15:45:33,376 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5300, loss[loss=0.05949, simple_loss=0.08284, pruned_loss=0.00836, audio_tagging_loss=0.009711, over 13314.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09275, pruned_loss=0.01399, audio_tagging_loss=0.009042, over 3048619.81 frames. ], batch size: 53, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:45:43,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2440100.0, ans=0.1 2023-11-23 15:45:55,649 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 15:46:16,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2440300.0, ans=0.125 2023-11-23 15:46:17,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366050 2023-11-23 15:46:32,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.490e+01 8.834e+01 9.644e+01 1.833e+02, threshold=1.767e+02, percent-clipped=1.0 2023-11-23 15:46:37,971 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5350, loss[loss=0.08501, simple_loss=0.1114, pruned_loss=0.02205, audio_tagging_loss=0.007249, over 15561.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09261, pruned_loss=0.01397, audio_tagging_loss=0.008961, over 3045014.48 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:46:39,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-23 15:47:02,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2440500.0, ans=0.0 2023-11-23 15:47:11,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2440566.6666666665, ans=0.2 2023-11-23 15:47:21,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366100 2023-11-23 15:47:28,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2440700.0, ans=0.125 2023-11-23 15:47:33,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2440700.0, ans=0.125 2023-11-23 15:47:39,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2440700.0, ans=0.125 2023-11-23 15:47:42,587 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5400, loss[loss=0.07166, simple_loss=0.09032, pruned_loss=0.01632, audio_tagging_loss=0.01018, over 14414.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09189, pruned_loss=0.01392, audio_tagging_loss=0.009027, over 3034534.84 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:47:54,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2440766.6666666665, ans=0.2 2023-11-23 15:48:25,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.07 vs. limit=15.0 2023-11-23 15:48:26,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366150 2023-11-23 15:48:42,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.475e+01 9.130e+01 9.785e+01 1.452e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-23 15:48:48,679 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5450, loss[loss=0.07054, simple_loss=0.09041, pruned_loss=0.01454, audio_tagging_loss=0.01079, over 15075.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.09149, pruned_loss=0.01393, audio_tagging_loss=0.009039, over 3032086.96 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:48:51,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-23 15:48:54,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2441100.0, ans=0.0 2023-11-23 15:49:19,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2441233.3333333335, ans=0.2 2023-11-23 15:49:29,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2023-11-23 15:49:31,086 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366200 2023-11-23 15:49:34,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-11-23 15:49:35,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2441300.0, ans=0.125 2023-11-23 15:49:52,716 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5500, loss[loss=0.07804, simple_loss=0.111, pruned_loss=0.01459, audio_tagging_loss=0.007947, over 15665.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09114, pruned_loss=0.01373, audio_tagging_loss=0.009108, over 3038856.71 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:50:26,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2441566.6666666665, ans=0.0 2023-11-23 15:50:32,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-11-23 15:50:35,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366250 2023-11-23 15:50:52,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.303e+01 8.937e+01 9.609e+01 1.241e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 15:50:57,363 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5550, loss[loss=0.07665, simple_loss=0.1184, pruned_loss=0.01214, audio_tagging_loss=0.005336, over 14592.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09166, pruned_loss=0.01391, audio_tagging_loss=0.009285, over 3040642.40 frames. ], batch size: 52, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:50:57,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2441766.6666666665, ans=0.1 2023-11-23 15:50:59,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2441766.6666666665, ans=0.2 2023-11-23 15:51:40,010 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366300 2023-11-23 15:52:01,530 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5600, loss[loss=0.07877, simple_loss=0.1087, pruned_loss=0.01813, audio_tagging_loss=0.006299, over 15309.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.0912, pruned_loss=0.01366, audio_tagging_loss=0.009403, over 3041617.27 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 15:52:01,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2442100.0, ans=0.2 2023-11-23 15:52:04,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2442100.0, ans=0.125 2023-11-23 15:52:09,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2442100.0, ans=0.125 2023-11-23 15:52:11,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2442100.0, ans=0.125 2023-11-23 15:52:19,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2442166.6666666665, ans=0.125 2023-11-23 15:52:44,447 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366350 2023-11-23 15:52:48,047 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 15:52:51,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2442366.6666666665, ans=0.125 2023-11-23 15:53:00,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.291e+01 8.935e+01 9.655e+01 1.271e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 15:53:02,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2442366.6666666665, ans=0.0 2023-11-23 15:53:05,683 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5650, loss[loss=0.07242, simple_loss=0.09562, pruned_loss=0.01439, audio_tagging_loss=0.01022, over 15215.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09143, pruned_loss=0.0137, audio_tagging_loss=0.009469, over 3046893.68 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 15:53:17,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-23 15:53:18,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-23 15:53:47,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366400 2023-11-23 15:54:02,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-23 15:54:09,827 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5700, loss[loss=0.08285, simple_loss=0.1104, pruned_loss=0.01811, audio_tagging_loss=0.009519, over 15561.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09083, pruned_loss=0.01354, audio_tagging_loss=0.009482, over 3046347.64 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 15:54:45,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2442900.0, ans=0.125 2023-11-23 15:54:52,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366450 2023-11-23 15:55:08,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.337e+01 8.162e+01 8.678e+01 9.471e+01 1.109e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-23 15:55:13,707 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5750, loss[loss=0.06021, simple_loss=0.08879, pruned_loss=0.009222, audio_tagging_loss=0.006592, over 14107.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09005, pruned_loss=0.01339, audio_tagging_loss=0.009406, over 3044998.79 frames. ], batch size: 52, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 15:55:34,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2443166.6666666665, ans=0.125 2023-11-23 15:55:42,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2443233.3333333335, ans=0.125 2023-11-23 15:55:53,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2443300.0, ans=0.1 2023-11-23 15:55:56,271 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366500 2023-11-23 15:56:17,466 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5800, loss[loss=0.06709, simple_loss=0.09347, pruned_loss=0.01111, audio_tagging_loss=0.009252, over 15321.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.0918, pruned_loss=0.01373, audio_tagging_loss=0.009248, over 3051751.98 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:56:24,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2443433.3333333335, ans=0.0 2023-11-23 15:56:30,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-23 15:56:36,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2443500.0, ans=0.0 2023-11-23 15:56:40,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2443500.0, ans=0.125 2023-11-23 15:57:00,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366550 2023-11-23 15:57:11,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2443700.0, ans=0.0 2023-11-23 15:57:18,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.535e+01 9.028e+01 9.591e+01 1.185e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-23 15:57:23,521 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5850, loss[loss=0.07555, simple_loss=0.0987, pruned_loss=0.01643, audio_tagging_loss=0.009765, over 16184.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09152, pruned_loss=0.01361, audio_tagging_loss=0.009219, over 3058664.86 frames. ], batch size: 63, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:58:01,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2443966.6666666665, ans=0.125 2023-11-23 15:58:07,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366600 2023-11-23 15:58:16,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-23 15:58:19,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2444033.3333333335, ans=0.1 2023-11-23 15:58:30,437 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5900, loss[loss=0.05039, simple_loss=0.05868, pruned_loss=0.009404, audio_tagging_loss=0.01165, over 15673.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09191, pruned_loss=0.01383, audio_tagging_loss=0.009141, over 3054458.40 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 15:58:40,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2444100.0, ans=0.1 2023-11-23 15:58:44,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2444166.6666666665, ans=0.2 2023-11-23 15:59:03,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2444233.3333333335, ans=0.125 2023-11-23 15:59:10,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2444300.0, ans=0.125 2023-11-23 15:59:14,264 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366650 2023-11-23 15:59:14,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-11-23 15:59:15,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2444300.0, ans=0.0 2023-11-23 15:59:19,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2444300.0, ans=0.2 2023-11-23 15:59:28,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2444366.6666666665, ans=0.0 2023-11-23 15:59:31,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.501e+01 9.185e+01 9.833e+01 1.392e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-23 15:59:35,329 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 5950, loss[loss=0.0709, simple_loss=0.09314, pruned_loss=0.01329, audio_tagging_loss=0.01103, over 14667.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09198, pruned_loss=0.01379, audio_tagging_loss=0.009103, over 3058455.81 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:00:11,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2023-11-23 16:00:19,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366700 2023-11-23 16:00:30,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2444700.0, ans=0.125 2023-11-23 16:00:41,916 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6000, loss[loss=0.06103, simple_loss=0.07828, pruned_loss=0.01282, audio_tagging_loss=0.009074, over 15968.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09226, pruned_loss=0.01385, audio_tagging_loss=0.00898, over 3055447.08 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:00:41,917 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 16:01:26,550 INFO [train_asr.py:1253] (1/4) Epoch 31, validation: loss=0.05852, simple_loss=0.05109, pruned_loss=0.00511, audio_tagging_loss=0.02786, over 4681554.00 frames. 2023-11-23 16:01:26,551 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 16:01:27,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2023-11-23 16:02:06,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2444966.6666666665, ans=0.2 2023-11-23 16:02:08,790 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366750 2023-11-23 16:02:13,146 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 16:02:22,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2445033.3333333335, ans=0.0 2023-11-23 16:02:26,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.104e+01 8.225e+01 8.734e+01 9.674e+01 1.175e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-23 16:02:30,459 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6050, loss[loss=0.07043, simple_loss=0.0852, pruned_loss=0.01714, audio_tagging_loss=0.01069, over 15620.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09218, pruned_loss=0.01413, audio_tagging_loss=0.008998, over 3055253.25 frames. ], batch size: 60, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:02:48,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2023-11-23 16:02:52,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2445166.6666666665, ans=0.125 2023-11-23 16:03:04,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2445233.3333333335, ans=0.2 2023-11-23 16:03:09,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2445300.0, ans=0.0 2023-11-23 16:03:14,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366800 2023-11-23 16:03:36,438 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6100, loss[loss=0.06546, simple_loss=0.0844, pruned_loss=0.01345, audio_tagging_loss=0.009813, over 16336.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09124, pruned_loss=0.014, audio_tagging_loss=0.009009, over 3050372.91 frames. ], batch size: 61, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:03:43,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2445433.3333333335, ans=0.0 2023-11-23 16:03:46,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2445433.3333333335, ans=0.125 2023-11-23 16:03:56,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2445500.0, ans=0.125 2023-11-23 16:03:57,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2445500.0, ans=0.125 2023-11-23 16:04:05,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2445566.6666666665, ans=0.0 2023-11-23 16:04:19,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366850 2023-11-23 16:04:22,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2445633.3333333335, ans=0.0 2023-11-23 16:04:38,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.426e+01 8.983e+01 9.647e+01 1.287e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 16:04:42,623 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6150, loss[loss=0.08771, simple_loss=0.1181, pruned_loss=0.02333, audio_tagging_loss=0.005346, over 15756.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09204, pruned_loss=0.01392, audio_tagging_loss=0.008952, over 3054908.92 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:04:52,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2445766.6666666665, ans=0.0 2023-11-23 16:05:10,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2445900.0, ans=0.2 2023-11-23 16:05:17,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2445900.0, ans=0.125 2023-11-23 16:05:19,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2445966.6666666665, ans=0.0 2023-11-23 16:05:23,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2445966.6666666665, ans=0.0 2023-11-23 16:05:25,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366900 2023-11-23 16:05:28,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2445966.6666666665, ans=0.125 2023-11-23 16:05:29,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-23 16:05:34,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-23 16:05:46,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:05:47,278 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6200, loss[loss=0.08075, simple_loss=0.1179, pruned_loss=0.01535, audio_tagging_loss=0.006469, over 15835.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09169, pruned_loss=0.01393, audio_tagging_loss=0.009066, over 3052688.42 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:05:55,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2446100.0, ans=0.125 2023-11-23 16:06:04,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:06:09,752 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:06:31,182 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 366950 2023-11-23 16:06:31,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2446300.0, ans=0.1 2023-11-23 16:06:40,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-23 16:06:48,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.382e+01 9.012e+01 9.950e+01 2.061e+02, threshold=1.802e+02, percent-clipped=1.0 2023-11-23 16:06:52,770 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6250, loss[loss=0.07233, simple_loss=0.08386, pruned_loss=0.01522, audio_tagging_loss=0.01518, over 15381.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09103, pruned_loss=0.01376, audio_tagging_loss=0.009229, over 3046767.67 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:06:58,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-23 16:06:59,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2446433.3333333335, ans=0.125 2023-11-23 16:07:02,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2446433.3333333335, ans=0.2 2023-11-23 16:07:08,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2446500.0, ans=0.2 2023-11-23 16:07:27,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2446566.6666666665, ans=0.125 2023-11-23 16:07:30,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2446566.6666666665, ans=0.125 2023-11-23 16:07:32,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2446633.3333333335, ans=0.0 2023-11-23 16:07:35,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367000 2023-11-23 16:07:41,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2446633.3333333335, ans=0.09899494936611666 2023-11-23 16:07:53,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2446700.0, ans=0.125 2023-11-23 16:07:58,535 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6300, loss[loss=0.06274, simple_loss=0.0813, pruned_loss=0.01324, audio_tagging_loss=0.008851, over 15468.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.0919, pruned_loss=0.01379, audio_tagging_loss=0.009206, over 3050520.56 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:08:22,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2446900.0, ans=0.125 2023-11-23 16:08:26,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2446900.0, ans=0.5 2023-11-23 16:08:40,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367050 2023-11-23 16:08:59,067 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.355e+01 9.008e+01 9.768e+01 1.209e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 16:09:02,794 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6350, loss[loss=0.08086, simple_loss=0.1012, pruned_loss=0.02082, audio_tagging_loss=0.009428, over 14493.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09093, pruned_loss=0.01342, audio_tagging_loss=0.009344, over 3047920.16 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:09:05,379 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:09:22,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2447166.6666666665, ans=0.1 2023-11-23 16:09:25,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2447166.6666666665, ans=0.0 2023-11-23 16:09:43,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2447300.0, ans=0.125 2023-11-23 16:09:45,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367100 2023-11-23 16:09:48,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2447300.0, ans=0.0 2023-11-23 16:09:50,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2447300.0, ans=0.1 2023-11-23 16:10:06,307 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6400, loss[loss=0.06504, simple_loss=0.08075, pruned_loss=0.01561, audio_tagging_loss=0.009062, over 14400.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09163, pruned_loss=0.01364, audio_tagging_loss=0.009355, over 3042328.81 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:10:35,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2447566.6666666665, ans=0.5 2023-11-23 16:10:49,455 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367150 2023-11-23 16:10:57,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2447700.0, ans=0.0 2023-11-23 16:10:58,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2447700.0, ans=0.125 2023-11-23 16:11:08,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.384e+01 9.024e+01 9.742e+01 1.267e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 16:11:11,820 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6450, loss[loss=0.07345, simple_loss=0.1076, pruned_loss=0.0107, audio_tagging_loss=0.008948, over 15073.00 frames. ], tot_loss[loss=0.0692, simple_loss=0.09214, pruned_loss=0.01378, audio_tagging_loss=0.009349, over 3041008.51 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:11:16,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2447766.6666666665, ans=0.05 2023-11-23 16:11:28,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2447833.3333333335, ans=0.95 2023-11-23 16:11:34,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2447833.3333333335, ans=0.1 2023-11-23 16:11:53,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367200 2023-11-23 16:12:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2448033.3333333335, ans=0.1 2023-11-23 16:12:17,096 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6500, loss[loss=0.07766, simple_loss=0.108, pruned_loss=0.017, audio_tagging_loss=0.006646, over 15986.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.09346, pruned_loss=0.01397, audio_tagging_loss=0.009274, over 3046054.99 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:12:17,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2448100.0, ans=0.125 2023-11-23 16:12:40,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2448166.6666666665, ans=0.125 2023-11-23 16:12:50,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2448233.3333333335, ans=0.07 2023-11-23 16:13:00,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367250 2023-11-23 16:13:03,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2448300.0, ans=0.1 2023-11-23 16:13:18,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.406e+01 9.076e+01 9.815e+01 1.354e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 16:13:21,314 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6550, loss[loss=0.09317, simple_loss=0.1228, pruned_loss=0.02149, audio_tagging_loss=0.01025, over 15873.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09377, pruned_loss=0.01392, audio_tagging_loss=0.009192, over 3052631.04 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:13:26,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-23 16:13:44,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2448500.0, ans=0.0 2023-11-23 16:13:52,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2448566.6666666665, ans=0.0 2023-11-23 16:14:04,638 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367300 2023-11-23 16:14:17,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2448700.0, ans=0.2 2023-11-23 16:14:25,927 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6600, loss[loss=0.0789, simple_loss=0.1048, pruned_loss=0.0156, audio_tagging_loss=0.01087, over 15428.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09405, pruned_loss=0.01395, audio_tagging_loss=0.009081, over 3056957.23 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:14:45,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-23 16:15:01,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2448900.0, ans=0.0 2023-11-23 16:15:08,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367350 2023-11-23 16:15:16,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-11-23 16:15:24,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2449033.3333333335, ans=0.1 2023-11-23 16:15:31,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.404e+01 9.056e+01 9.809e+01 1.686e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 16:15:31,522 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6650, loss[loss=0.06762, simple_loss=0.08397, pruned_loss=0.01436, audio_tagging_loss=0.01127, over 15644.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09326, pruned_loss=0.01388, audio_tagging_loss=0.009013, over 3055722.24 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:16:14,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367400 2023-11-23 16:16:20,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2449300.0, ans=0.125 2023-11-23 16:16:24,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2023-11-23 16:16:25,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-23 16:16:35,756 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6700, loss[loss=0.07818, simple_loss=0.1047, pruned_loss=0.01699, audio_tagging_loss=0.008846, over 15459.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.09407, pruned_loss=0.0141, audio_tagging_loss=0.008919, over 3055577.72 frames. ], batch size: 60, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:16:37,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2449433.3333333335, ans=0.125 2023-11-23 16:16:44,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2449433.3333333335, ans=0.125 2023-11-23 16:16:51,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-23 16:17:05,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2449566.6666666665, ans=0.1 2023-11-23 16:17:19,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367450 2023-11-23 16:17:28,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-11-23 16:17:38,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.62 vs. limit=10.0 2023-11-23 16:17:40,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.542e+01 9.005e+01 9.931e+01 1.410e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 16:17:40,299 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6750, loss[loss=0.08742, simple_loss=0.1143, pruned_loss=0.02124, audio_tagging_loss=0.009049, over 15890.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.09424, pruned_loss=0.01414, audio_tagging_loss=0.008957, over 3051478.66 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:17:52,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2449833.3333333335, ans=0.0 2023-11-23 16:17:53,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2449833.3333333335, ans=0.0 2023-11-23 16:18:01,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-23 16:18:16,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2449900.0, ans=0.0 2023-11-23 16:18:23,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367500 2023-11-23 16:18:24,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2449966.6666666665, ans=0.0 2023-11-23 16:18:45,571 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6800, loss[loss=0.07676, simple_loss=0.09794, pruned_loss=0.0198, audio_tagging_loss=0.007984, over 15917.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09358, pruned_loss=0.01417, audio_tagging_loss=0.008986, over 3047155.93 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:18:49,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2450100.0, ans=0.125 2023-11-23 16:19:00,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2450166.6666666665, ans=0.125 2023-11-23 16:19:06,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2450166.6666666665, ans=0.125 2023-11-23 16:19:06,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2450166.6666666665, ans=0.0 2023-11-23 16:19:10,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2450233.3333333335, ans=0.0 2023-11-23 16:19:12,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-23 16:19:22,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2450300.0, ans=0.125 2023-11-23 16:19:27,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2023-11-23 16:19:28,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367550 2023-11-23 16:19:28,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2450300.0, ans=0.125 2023-11-23 16:19:37,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2450366.6666666665, ans=0.125 2023-11-23 16:19:50,482 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6850, loss[loss=0.06749, simple_loss=0.08914, pruned_loss=0.009729, audio_tagging_loss=0.01319, over 14706.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09266, pruned_loss=0.0139, audio_tagging_loss=0.008964, over 3045941.68 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:19:51,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.211e+01 8.952e+01 9.815e+01 1.222e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 16:19:53,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2450433.3333333335, ans=0.0 2023-11-23 16:20:03,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2450500.0, ans=0.05 2023-11-23 16:20:09,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2023-11-23 16:20:27,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2450566.6666666665, ans=0.0 2023-11-23 16:20:33,211 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367600 2023-11-23 16:20:56,014 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6900, loss[loss=0.06756, simple_loss=0.08557, pruned_loss=0.01428, audio_tagging_loss=0.0105, over 16859.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09222, pruned_loss=0.01374, audio_tagging_loss=0.008993, over 3056510.98 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:21:06,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2450766.6666666665, ans=0.0 2023-11-23 16:21:33,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-11-23 16:21:38,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367650 2023-11-23 16:21:39,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2450966.6666666665, ans=0.125 2023-11-23 16:21:46,572 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 16:21:49,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2451033.3333333335, ans=0.0 2023-11-23 16:21:51,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2451033.3333333335, ans=0.125 2023-11-23 16:21:56,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2451033.3333333335, ans=0.1 2023-11-23 16:21:56,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2023-11-23 16:22:01,286 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 6950, loss[loss=0.0634, simple_loss=0.0836, pruned_loss=0.01092, audio_tagging_loss=0.01069, over 15057.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09152, pruned_loss=0.01353, audio_tagging_loss=0.009087, over 3058752.15 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:22:02,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.116e+01 8.969e+01 9.790e+01 1.259e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-23 16:22:04,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2451100.0, ans=0.0 2023-11-23 16:22:44,018 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367700 2023-11-23 16:23:05,247 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7000, loss[loss=0.06843, simple_loss=0.08439, pruned_loss=0.01509, audio_tagging_loss=0.01114, over 14737.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09111, pruned_loss=0.01343, audio_tagging_loss=0.009205, over 3049450.92 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:23:13,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2451433.3333333335, ans=0.0 2023-11-23 16:23:14,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2451433.3333333335, ans=0.0 2023-11-23 16:23:17,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-23 16:23:35,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2451566.6666666665, ans=0.0 2023-11-23 16:23:40,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-23 16:23:41,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-23 16:23:48,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367750 2023-11-23 16:23:52,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2451633.3333333335, ans=0.125 2023-11-23 16:23:55,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2023-11-23 16:24:02,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2451700.0, ans=0.0 2023-11-23 16:24:07,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2451700.0, ans=0.2 2023-11-23 16:24:10,630 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7050, loss[loss=0.07845, simple_loss=0.1073, pruned_loss=0.01555, audio_tagging_loss=0.009269, over 17375.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09071, pruned_loss=0.01342, audio_tagging_loss=0.009212, over 3046255.57 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:24:11,784 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.295e+01 8.920e+01 9.938e+01 1.396e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 16:24:14,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=12.0 2023-11-23 16:24:26,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2451833.3333333335, ans=0.1 2023-11-23 16:24:29,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2451833.3333333335, ans=0.5 2023-11-23 16:24:46,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2451900.0, ans=10.0 2023-11-23 16:24:53,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367800 2023-11-23 16:24:53,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2451966.6666666665, ans=0.1 2023-11-23 16:24:54,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2451966.6666666665, ans=0.125 2023-11-23 16:25:13,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2452033.3333333335, ans=0.1 2023-11-23 16:25:15,563 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7100, loss[loss=0.06516, simple_loss=0.09111, pruned_loss=0.01236, audio_tagging_loss=0.007245, over 14334.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09063, pruned_loss=0.01337, audio_tagging_loss=0.009303, over 3045388.00 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:25:17,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2452100.0, ans=0.04949747468305833 2023-11-23 16:25:36,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2452166.6666666665, ans=0.125 2023-11-23 16:25:44,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-23 16:25:58,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367850 2023-11-23 16:25:59,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=12.0 2023-11-23 16:26:15,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2452366.6666666665, ans=0.1 2023-11-23 16:26:20,278 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7150, loss[loss=0.07125, simple_loss=0.09202, pruned_loss=0.01702, audio_tagging_loss=0.008219, over 15784.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.0909, pruned_loss=0.01341, audio_tagging_loss=0.009253, over 3050507.34 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:26:21,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 8.537e+01 9.022e+01 9.919e+01 1.379e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 16:26:23,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-23 16:26:50,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-23 16:26:54,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2023-11-23 16:27:03,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367900 2023-11-23 16:27:24,442 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7200, loss[loss=0.07246, simple_loss=0.1018, pruned_loss=0.01387, audio_tagging_loss=0.007703, over 15323.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09078, pruned_loss=0.01333, audio_tagging_loss=0.009244, over 3041775.86 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:27:47,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2452833.3333333335, ans=0.125 2023-11-23 16:28:06,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 367950 2023-11-23 16:28:15,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2453033.3333333335, ans=0.125 2023-11-23 16:28:27,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2453033.3333333335, ans=0.0 2023-11-23 16:28:27,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-23 16:28:29,967 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7250, loss[loss=0.07204, simple_loss=0.09679, pruned_loss=0.01462, audio_tagging_loss=0.009028, over 15238.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09133, pruned_loss=0.01359, audio_tagging_loss=0.009252, over 3041210.10 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:28:31,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.513e+01 9.044e+01 9.572e+01 1.281e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 16:28:44,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2453166.6666666665, ans=0.125 2023-11-23 16:28:55,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2453233.3333333335, ans=0.125 2023-11-23 16:29:01,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2453233.3333333335, ans=0.0 2023-11-23 16:29:12,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368000 2023-11-23 16:29:38,300 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7300, loss[loss=0.06276, simple_loss=0.07268, pruned_loss=0.01605, audio_tagging_loss=0.01037, over 14663.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09223, pruned_loss=0.01384, audio_tagging_loss=0.009299, over 3042464.91 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:29:46,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-23 16:29:53,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2453500.0, ans=0.2 2023-11-23 16:29:56,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2453500.0, ans=0.0 2023-11-23 16:30:14,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2453566.6666666665, ans=0.125 2023-11-23 16:30:22,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368050 2023-11-23 16:30:40,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2453700.0, ans=0.2 2023-11-23 16:30:42,890 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7350, loss[loss=0.05338, simple_loss=0.0674, pruned_loss=0.01143, audio_tagging_loss=0.008253, over 14658.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09104, pruned_loss=0.01372, audio_tagging_loss=0.009196, over 3044216.51 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:30:43,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2453766.6666666665, ans=0.0 2023-11-23 16:30:44,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.269e+01 8.908e+01 9.511e+01 1.304e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 16:30:46,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2453766.6666666665, ans=0.2 2023-11-23 16:30:47,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2453766.6666666665, ans=0.2 2023-11-23 16:31:00,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2453833.3333333335, ans=0.125 2023-11-23 16:31:07,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2453833.3333333335, ans=0.5 2023-11-23 16:31:16,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=2453900.0, ans=15.0 2023-11-23 16:31:18,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2453900.0, ans=0.125 2023-11-23 16:31:26,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368100 2023-11-23 16:31:37,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2454033.3333333335, ans=0.125 2023-11-23 16:31:38,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2454033.3333333335, ans=0.0 2023-11-23 16:31:46,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2454033.3333333335, ans=0.0 2023-11-23 16:31:49,061 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7400, loss[loss=0.07175, simple_loss=0.09608, pruned_loss=0.0142, audio_tagging_loss=0.009509, over 15910.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09101, pruned_loss=0.0138, audio_tagging_loss=0.009134, over 3038242.45 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:31:56,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2454100.0, ans=0.125 2023-11-23 16:31:59,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2454100.0, ans=0.0 2023-11-23 16:32:08,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2454166.6666666665, ans=0.0 2023-11-23 16:32:20,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2454233.3333333335, ans=0.0 2023-11-23 16:32:32,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368150 2023-11-23 16:32:55,329 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7450, loss[loss=0.08613, simple_loss=0.13, pruned_loss=0.01546, audio_tagging_loss=0.005674, over 14932.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09159, pruned_loss=0.01386, audio_tagging_loss=0.008958, over 3035175.06 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:32:56,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.234e+01 8.919e+01 9.567e+01 1.256e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 16:33:16,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2454500.0, ans=15.0 2023-11-23 16:33:38,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368200 2023-11-23 16:33:57,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2454700.0, ans=0.1 2023-11-23 16:33:59,416 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7500, loss[loss=0.07329, simple_loss=0.09732, pruned_loss=0.01649, audio_tagging_loss=0.008143, over 15245.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09184, pruned_loss=0.01397, audio_tagging_loss=0.008983, over 3038666.17 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:34:24,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2454900.0, ans=0.05 2023-11-23 16:34:35,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2454900.0, ans=0.0 2023-11-23 16:34:42,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368250 2023-11-23 16:34:55,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2455033.3333333335, ans=0.125 2023-11-23 16:34:58,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2455033.3333333335, ans=0.0 2023-11-23 16:35:04,294 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7550, loss[loss=0.03959, simple_loss=0.04232, pruned_loss=0.007424, audio_tagging_loss=0.01101, over 15167.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09079, pruned_loss=0.01374, audio_tagging_loss=0.009071, over 3045538.73 frames. ], batch size: 60, lr: 2.19e-03, grad_scale: 16.0 2023-11-23 16:35:05,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.421e+01 8.983e+01 9.712e+01 1.318e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 16:35:44,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2455300.0, ans=0.0 2023-11-23 16:35:48,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368300 2023-11-23 16:35:56,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2023-11-23 16:35:57,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2455366.6666666665, ans=0.0 2023-11-23 16:36:10,441 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7600, loss[loss=0.06362, simple_loss=0.0881, pruned_loss=0.011, audio_tagging_loss=0.008566, over 15039.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09099, pruned_loss=0.01402, audio_tagging_loss=0.00901, over 3048649.24 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:36:15,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2455433.3333333335, ans=0.0 2023-11-23 16:36:17,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-23 16:36:32,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2455500.0, ans=0.1 2023-11-23 16:36:53,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368350 2023-11-23 16:36:57,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-11-23 16:36:59,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-23 16:37:02,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2455700.0, ans=0.0 2023-11-23 16:37:05,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2455700.0, ans=0.035 2023-11-23 16:37:09,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2455700.0, ans=0.0 2023-11-23 16:37:11,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2455700.0, ans=0.2 2023-11-23 16:37:12,751 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:37:15,075 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7650, loss[loss=0.07123, simple_loss=0.0967, pruned_loss=0.015, audio_tagging_loss=0.007875, over 15711.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09, pruned_loss=0.01368, audio_tagging_loss=0.00908, over 3043272.49 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 32.0 2023-11-23 16:37:15,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2455766.6666666665, ans=0.0 2023-11-23 16:37:16,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.296e+01 8.999e+01 9.815e+01 1.332e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-23 16:37:33,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2455833.3333333335, ans=0.1 2023-11-23 16:37:34,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2455833.3333333335, ans=0.125 2023-11-23 16:37:53,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2455966.6666666665, ans=0.125 2023-11-23 16:37:54,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2455966.6666666665, ans=0.125 2023-11-23 16:37:58,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368400 2023-11-23 16:38:20,885 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7700, loss[loss=0.06435, simple_loss=0.0835, pruned_loss=0.01266, audio_tagging_loss=0.009941, over 14040.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.08991, pruned_loss=0.01359, audio_tagging_loss=0.009058, over 3039017.49 frames. ], batch size: 53, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:38:26,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2456100.0, ans=0.035 2023-11-23 16:38:26,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2456100.0, ans=0.2 2023-11-23 16:38:36,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2456166.6666666665, ans=0.1 2023-11-23 16:38:36,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-23 16:38:37,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2456166.6666666665, ans=0.2 2023-11-23 16:38:38,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2456166.6666666665, ans=0.125 2023-11-23 16:38:51,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-23 16:39:03,994 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368450 2023-11-23 16:39:18,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2023-11-23 16:39:22,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-23 16:39:26,195 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7750, loss[loss=0.07288, simple_loss=0.0951, pruned_loss=0.01105, audio_tagging_loss=0.01428, over 16571.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09092, pruned_loss=0.01361, audio_tagging_loss=0.008988, over 3046102.32 frames. ], batch size: 63, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:39:29,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.166e+01 8.983e+01 1.017e+02 1.300e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 16:39:44,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2456500.0, ans=0.0 2023-11-23 16:39:45,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2456500.0, ans=0.07 2023-11-23 16:39:55,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2456566.6666666665, ans=0.125 2023-11-23 16:40:03,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2456633.3333333335, ans=0.2 2023-11-23 16:40:04,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2456633.3333333335, ans=0.1 2023-11-23 16:40:05,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2456633.3333333335, ans=0.2 2023-11-23 16:40:08,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368500 2023-11-23 16:40:22,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2456700.0, ans=0.125 2023-11-23 16:40:29,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:40:30,644 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7800, loss[loss=0.07892, simple_loss=0.1088, pruned_loss=0.01739, audio_tagging_loss=0.007128, over 16233.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.0904, pruned_loss=0.01356, audio_tagging_loss=0.008972, over 3041247.23 frames. ], batch size: 60, lr: 2.19e-03, grad_scale: 8.0 2023-11-23 16:40:41,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2456766.6666666665, ans=0.125 2023-11-23 16:40:48,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2456833.3333333335, ans=0.125 2023-11-23 16:40:48,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2456833.3333333335, ans=0.125 2023-11-23 16:40:50,019 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 16:40:58,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2456900.0, ans=0.0 2023-11-23 16:41:00,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2023-11-23 16:41:13,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368550 2023-11-23 16:41:17,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2456966.6666666665, ans=0.125 2023-11-23 16:41:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2457100.0, ans=0.125 2023-11-23 16:41:35,455 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7850, loss[loss=0.07719, simple_loss=0.1086, pruned_loss=0.01455, audio_tagging_loss=0.00832, over 15910.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09164, pruned_loss=0.01377, audio_tagging_loss=0.008906, over 3043887.91 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:41:39,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.884e+01 8.382e+01 9.207e+01 1.008e+02 1.649e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-23 16:41:50,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2023-11-23 16:42:02,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-23 16:42:08,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2457233.3333333335, ans=0.125 2023-11-23 16:42:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2457233.3333333335, ans=0.125 2023-11-23 16:42:17,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368600 2023-11-23 16:42:20,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2457300.0, ans=0.2 2023-11-23 16:42:37,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2457366.6666666665, ans=0.0 2023-11-23 16:42:38,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2457366.6666666665, ans=0.125 2023-11-23 16:42:40,591 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7900, loss[loss=0.07456, simple_loss=0.1021, pruned_loss=0.01473, audio_tagging_loss=0.008753, over 15379.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.0924, pruned_loss=0.01383, audio_tagging_loss=0.009025, over 3049515.44 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:42:48,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2457433.3333333335, ans=0.125 2023-11-23 16:42:55,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2457500.0, ans=0.125 2023-11-23 16:42:56,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2457500.0, ans=10.0 2023-11-23 16:42:59,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2457500.0, ans=0.125 2023-11-23 16:43:23,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368650 2023-11-23 16:43:23,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2457633.3333333335, ans=0.2 2023-11-23 16:43:33,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2457700.0, ans=0.0 2023-11-23 16:43:37,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2457700.0, ans=0.0 2023-11-23 16:43:45,887 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 7950, loss[loss=0.04949, simple_loss=0.06437, pruned_loss=0.007516, audio_tagging_loss=0.009788, over 14228.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09273, pruned_loss=0.01392, audio_tagging_loss=0.009178, over 3043037.60 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:43:49,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.234e+01 8.856e+01 9.622e+01 1.359e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-23 16:43:58,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2457833.3333333335, ans=0.125 2023-11-23 16:44:01,291 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 16:44:16,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2457900.0, ans=0.1 2023-11-23 16:44:28,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368700 2023-11-23 16:44:33,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-23 16:44:50,951 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8000, loss[loss=0.05131, simple_loss=0.06608, pruned_loss=0.008485, audio_tagging_loss=0.009785, over 14845.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.0914, pruned_loss=0.01378, audio_tagging_loss=0.009228, over 3041086.28 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:45:00,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.24 vs. limit=10.0 2023-11-23 16:45:00,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-23 16:45:05,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2023-11-23 16:45:06,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2458166.6666666665, ans=0.1 2023-11-23 16:45:09,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2458166.6666666665, ans=0.2 2023-11-23 16:45:33,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368750 2023-11-23 16:45:44,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2458366.6666666665, ans=0.2 2023-11-23 16:45:55,523 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8050, loss[loss=0.06236, simple_loss=0.0851, pruned_loss=0.01035, audio_tagging_loss=0.009452, over 16266.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09072, pruned_loss=0.01364, audio_tagging_loss=0.009322, over 3038013.97 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:46:00,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.362e+01 8.796e+01 9.330e+01 1.321e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 16:46:06,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2458433.3333333335, ans=0.125 2023-11-23 16:46:29,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2458566.6666666665, ans=0.125 2023-11-23 16:46:38,211 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368800 2023-11-23 16:47:00,270 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8100, loss[loss=0.04993, simple_loss=0.06265, pruned_loss=0.009027, audio_tagging_loss=0.009574, over 14299.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.0919, pruned_loss=0.01403, audio_tagging_loss=0.009205, over 3035871.65 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:47:09,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2458766.6666666665, ans=0.0 2023-11-23 16:47:38,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2458966.6666666665, ans=0.125 2023-11-23 16:47:43,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368850 2023-11-23 16:48:04,216 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8150, loss[loss=0.08412, simple_loss=0.1149, pruned_loss=0.02, audio_tagging_loss=0.006664, over 15868.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09188, pruned_loss=0.01404, audio_tagging_loss=0.009153, over 3039036.78 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:48:09,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.557e+01 9.149e+01 9.819e+01 1.226e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 16:48:11,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2459100.0, ans=0.04949747468305833 2023-11-23 16:48:12,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2459100.0, ans=0.125 2023-11-23 16:48:17,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2459166.6666666665, ans=0.0 2023-11-23 16:48:39,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2459233.3333333335, ans=0.025 2023-11-23 16:48:46,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368900 2023-11-23 16:48:51,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-23 16:49:06,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2459366.6666666665, ans=0.125 2023-11-23 16:49:08,926 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8200, loss[loss=0.07331, simple_loss=0.09964, pruned_loss=0.01371, audio_tagging_loss=0.009778, over 15376.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09178, pruned_loss=0.01398, audio_tagging_loss=0.009091, over 3039748.43 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:49:10,218 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 16:49:11,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2459433.3333333335, ans=0.2 2023-11-23 16:49:15,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2459433.3333333335, ans=0.2 2023-11-23 16:49:19,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2459433.3333333335, ans=0.1 2023-11-23 16:49:49,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2459633.3333333335, ans=0.025 2023-11-23 16:49:51,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 368950 2023-11-23 16:50:11,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-23 16:50:13,321 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8250, loss[loss=0.07567, simple_loss=0.1017, pruned_loss=0.01742, audio_tagging_loss=0.007418, over 16456.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09162, pruned_loss=0.01376, audio_tagging_loss=0.008974, over 3035049.38 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:50:15,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2459766.6666666665, ans=0.0 2023-11-23 16:50:18,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.120e+01 8.966e+01 9.710e+01 1.268e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 16:50:22,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-23 16:50:25,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2459833.3333333335, ans=0.125 2023-11-23 16:50:56,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369000 2023-11-23 16:50:57,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2459966.6666666665, ans=0.125 2023-11-23 16:51:18,622 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8300, loss[loss=0.0856, simple_loss=0.1194, pruned_loss=0.02017, audio_tagging_loss=0.005752, over 15930.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09177, pruned_loss=0.01387, audio_tagging_loss=0.008903, over 3040963.22 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:51:55,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-23 16:52:01,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369050 2023-11-23 16:52:21,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2460366.6666666665, ans=0.09899494936611666 2023-11-23 16:52:23,776 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8350, loss[loss=0.04978, simple_loss=0.06248, pruned_loss=0.008113, audio_tagging_loss=0.01042, over 15123.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.09101, pruned_loss=0.01374, audio_tagging_loss=0.008869, over 3043675.54 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 16:52:24,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2460433.3333333335, ans=0.125 2023-11-23 16:52:29,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.427e+01 9.150e+01 9.760e+01 1.195e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 16:53:03,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2023-11-23 16:53:06,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369100 2023-11-23 16:53:09,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2460633.3333333335, ans=0.125 2023-11-23 16:53:18,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2460700.0, ans=0.0 2023-11-23 16:53:28,827 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8400, loss[loss=0.06888, simple_loss=0.09331, pruned_loss=0.01525, audio_tagging_loss=0.006981, over 14937.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09054, pruned_loss=0.01372, audio_tagging_loss=0.008922, over 3041279.76 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:53:45,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-11-23 16:53:47,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2460833.3333333335, ans=0.0 2023-11-23 16:53:48,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2460833.3333333335, ans=0.125 2023-11-23 16:53:54,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2460900.0, ans=0.125 2023-11-23 16:54:01,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2460900.0, ans=0.0 2023-11-23 16:54:10,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369150 2023-11-23 16:54:31,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2023-11-23 16:54:32,574 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8450, loss[loss=0.07909, simple_loss=0.09557, pruned_loss=0.02008, audio_tagging_loss=0.01123, over 14768.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09138, pruned_loss=0.01385, audio_tagging_loss=0.008909, over 3043002.35 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:54:37,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.445e+01 8.837e+01 9.573e+01 1.326e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 16:54:40,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2023-11-23 16:54:50,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2461166.6666666665, ans=0.0 2023-11-23 16:54:56,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2461166.6666666665, ans=0.125 2023-11-23 16:55:05,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2461233.3333333335, ans=0.1 2023-11-23 16:55:15,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-23 16:55:16,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369200 2023-11-23 16:55:21,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2461300.0, ans=0.1 2023-11-23 16:55:38,832 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8500, loss[loss=0.05597, simple_loss=0.07023, pruned_loss=0.009955, audio_tagging_loss=0.0109, over 16621.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09186, pruned_loss=0.01386, audio_tagging_loss=0.00896, over 3041523.09 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:55:46,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-23 16:56:12,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2461566.6666666665, ans=0.1 2023-11-23 16:56:21,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369250 2023-11-23 16:56:23,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2461633.3333333335, ans=0.09899494936611666 2023-11-23 16:56:44,047 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8550, loss[loss=0.06753, simple_loss=0.09691, pruned_loss=0.01083, audio_tagging_loss=0.00824, over 15430.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09131, pruned_loss=0.01368, audio_tagging_loss=0.008987, over 3044716.75 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:56:48,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2461766.6666666665, ans=0.0 2023-11-23 16:56:49,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.318e+01 8.758e+01 9.491e+01 1.282e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-23 16:57:24,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-23 16:57:26,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369300 2023-11-23 16:57:35,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.48 vs. limit=10.0 2023-11-23 16:57:43,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2462033.3333333335, ans=0.125 2023-11-23 16:57:48,082 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8600, loss[loss=0.06435, simple_loss=0.08338, pruned_loss=0.01192, audio_tagging_loss=0.01074, over 14913.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09123, pruned_loss=0.0138, audio_tagging_loss=0.009051, over 3045276.21 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:57:56,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2462100.0, ans=0.0 2023-11-23 16:57:57,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2462100.0, ans=0.1 2023-11-23 16:57:59,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462166.6666666665, ans=0.1 2023-11-23 16:58:08,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2462166.6666666665, ans=0.1 2023-11-23 16:58:25,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-23 16:58:31,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369350 2023-11-23 16:58:36,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2462300.0, ans=0.125 2023-11-23 16:58:52,577 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8650, loss[loss=0.06819, simple_loss=0.08961, pruned_loss=0.01358, audio_tagging_loss=0.009811, over 14879.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.09139, pruned_loss=0.01382, audio_tagging_loss=0.009132, over 3047602.62 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 16:58:53,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2462433.3333333335, ans=0.125 2023-11-23 16:58:57,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.582e+01 9.201e+01 9.858e+01 1.215e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-23 16:59:00,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2462433.3333333335, ans=0.04949747468305833 2023-11-23 16:59:06,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2462500.0, ans=0.2 2023-11-23 16:59:35,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369400 2023-11-23 16:59:53,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2462700.0, ans=0.5 2023-11-23 16:59:57,835 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8700, loss[loss=0.0839, simple_loss=0.1162, pruned_loss=0.01759, audio_tagging_loss=0.008225, over 16024.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09096, pruned_loss=0.01372, audio_tagging_loss=0.00923, over 3051280.66 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:00:36,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2462966.6666666665, ans=0.1 2023-11-23 17:00:39,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2462966.6666666665, ans=0.125 2023-11-23 17:00:39,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2462966.6666666665, ans=0.0 2023-11-23 17:00:39,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2462966.6666666665, ans=0.0 2023-11-23 17:00:41,067 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369450 2023-11-23 17:00:56,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2463033.3333333335, ans=0.05 2023-11-23 17:01:02,652 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8750, loss[loss=0.08567, simple_loss=0.1093, pruned_loss=0.02186, audio_tagging_loss=0.009164, over 15280.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09144, pruned_loss=0.01373, audio_tagging_loss=0.009305, over 3047079.24 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:01:07,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.270e+01 8.973e+01 9.908e+01 1.550e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-23 17:01:23,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2023-11-23 17:01:29,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=22.5 2023-11-23 17:01:41,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2463300.0, ans=0.125 2023-11-23 17:01:45,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369500 2023-11-23 17:01:48,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2463300.0, ans=0.025 2023-11-23 17:02:06,820 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8800, loss[loss=0.07111, simple_loss=0.09466, pruned_loss=0.01385, audio_tagging_loss=0.009933, over 15143.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09201, pruned_loss=0.01395, audio_tagging_loss=0.009402, over 3049552.77 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:02:24,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2463500.0, ans=0.125 2023-11-23 17:02:25,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2463500.0, ans=0.125 2023-11-23 17:02:29,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-23 17:02:30,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2463500.0, ans=0.125 2023-11-23 17:02:37,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2463566.6666666665, ans=0.125 2023-11-23 17:02:37,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-23 17:02:47,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2463633.3333333335, ans=0.125 2023-11-23 17:02:49,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369550 2023-11-23 17:03:12,311 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8850, loss[loss=0.06325, simple_loss=0.08011, pruned_loss=0.01344, audio_tagging_loss=0.009754, over 14304.00 frames. ], tot_loss[loss=0.06982, simple_loss=0.09274, pruned_loss=0.01411, audio_tagging_loss=0.009344, over 3055159.17 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:03:17,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.515e+01 9.142e+01 9.954e+01 1.396e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 17:03:19,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2023-11-23 17:03:22,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2463766.6666666665, ans=0.2 2023-11-23 17:03:24,198 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:03:51,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2463966.6666666665, ans=0.1 2023-11-23 17:03:54,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369600 2023-11-23 17:04:06,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2464033.3333333335, ans=0.125 2023-11-23 17:04:10,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2464033.3333333335, ans=0.1 2023-11-23 17:04:11,636 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:04:17,295 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8900, loss[loss=0.08128, simple_loss=0.1176, pruned_loss=0.01588, audio_tagging_loss=0.006616, over 15660.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.09266, pruned_loss=0.01402, audio_tagging_loss=0.00918, over 3050543.25 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:04:27,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2464100.0, ans=0.2 2023-11-23 17:04:30,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-23 17:04:33,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2464166.6666666665, ans=0.0 2023-11-23 17:04:39,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.35 vs. limit=15.0 2023-11-23 17:04:40,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2464166.6666666665, ans=0.025 2023-11-23 17:04:58,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2464300.0, ans=0.0 2023-11-23 17:04:58,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2464300.0, ans=0.1 2023-11-23 17:05:00,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369650 2023-11-23 17:05:03,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-23 17:05:10,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2464366.6666666665, ans=0.1 2023-11-23 17:05:22,414 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 8950, loss[loss=0.06448, simple_loss=0.08879, pruned_loss=0.01228, audio_tagging_loss=0.00781, over 15364.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09213, pruned_loss=0.01391, audio_tagging_loss=0.009111, over 3049059.64 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:05:26,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2464433.3333333335, ans=0.125 2023-11-23 17:05:27,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.323e+01 9.092e+01 9.814e+01 1.259e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 17:05:29,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2464433.3333333335, ans=0.125 2023-11-23 17:05:49,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2464566.6666666665, ans=0.0 2023-11-23 17:06:04,939 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369700 2023-11-23 17:06:10,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-23 17:06:27,167 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9000, loss[loss=0.07034, simple_loss=0.1015, pruned_loss=0.009836, audio_tagging_loss=0.009749, over 14526.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09225, pruned_loss=0.0139, audio_tagging_loss=0.009093, over 3052584.21 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:06:27,167 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 17:06:53,819 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9994, 4.0273, 3.7828, 3.0326], device='cuda:1') 2023-11-23 17:07:11,615 INFO [train_asr.py:1253] (1/4) Epoch 31, validation: loss=0.05916, simple_loss=0.05108, pruned_loss=0.005166, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-23 17:07:11,616 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 17:07:16,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2464766.6666666665, ans=0.125 2023-11-23 17:07:33,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2464833.3333333335, ans=0.0 2023-11-23 17:07:39,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2023-11-23 17:07:48,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2464900.0, ans=0.1 2023-11-23 17:07:51,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2464966.6666666665, ans=0.125 2023-11-23 17:07:54,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369750 2023-11-23 17:08:15,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2465100.0, ans=0.125 2023-11-23 17:08:15,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2023-11-23 17:08:16,236 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9050, loss[loss=0.05438, simple_loss=0.06768, pruned_loss=0.01129, audio_tagging_loss=0.009244, over 15158.00 frames. ], tot_loss[loss=0.06957, simple_loss=0.09329, pruned_loss=0.01402, audio_tagging_loss=0.008906, over 3054128.21 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:08:16,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2465100.0, ans=0.125 2023-11-23 17:08:16,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2465100.0, ans=0.125 2023-11-23 17:08:21,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.456e+01 8.970e+01 9.727e+01 1.359e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-23 17:08:31,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2465166.6666666665, ans=0.0 2023-11-23 17:08:33,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2465166.6666666665, ans=0.2 2023-11-23 17:08:42,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2465233.3333333335, ans=0.2 2023-11-23 17:08:47,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2465233.3333333335, ans=0.0 2023-11-23 17:08:51,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2465233.3333333335, ans=0.125 2023-11-23 17:08:57,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2465300.0, ans=0.1 2023-11-23 17:08:58,820 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369800 2023-11-23 17:09:00,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2465300.0, ans=0.0 2023-11-23 17:09:09,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2465366.6666666665, ans=0.125 2023-11-23 17:09:20,917 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9100, loss[loss=0.09688, simple_loss=0.137, pruned_loss=0.02275, audio_tagging_loss=0.005654, over 14717.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09183, pruned_loss=0.01376, audio_tagging_loss=0.008953, over 3052783.95 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:10:03,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369850 2023-11-23 17:10:13,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2465700.0, ans=0.125 2023-11-23 17:10:25,944 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9150, loss[loss=0.06269, simple_loss=0.08677, pruned_loss=0.01049, audio_tagging_loss=0.008817, over 15388.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09142, pruned_loss=0.01363, audio_tagging_loss=0.008939, over 3052090.44 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:10:28,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2465766.6666666665, ans=0.2 2023-11-23 17:10:30,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.198e+01 8.578e+01 9.347e+01 1.177e+02, threshold=1.716e+02, percent-clipped=0.0 2023-11-23 17:11:04,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.40 vs. limit=10.0 2023-11-23 17:11:04,683 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:11:06,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2465966.6666666665, ans=0.09899494936611666 2023-11-23 17:11:08,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369900 2023-11-23 17:11:08,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2465966.6666666665, ans=0.0 2023-11-23 17:11:23,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2466033.3333333335, ans=0.125 2023-11-23 17:11:30,961 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9200, loss[loss=0.07843, simple_loss=0.095, pruned_loss=0.01935, audio_tagging_loss=0.01158, over 15530.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.09126, pruned_loss=0.01363, audio_tagging_loss=0.00886, over 3049386.80 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:12:06,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2466233.3333333335, ans=0.125 2023-11-23 17:12:07,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2466233.3333333335, ans=0.2 2023-11-23 17:12:07,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2466233.3333333335, ans=0.04949747468305833 2023-11-23 17:12:13,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 369950 2023-11-23 17:12:15,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2466300.0, ans=10.0 2023-11-23 17:12:17,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2466300.0, ans=0.1 2023-11-23 17:12:35,938 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9250, loss[loss=0.07325, simple_loss=0.09247, pruned_loss=0.01594, audio_tagging_loss=0.01108, over 15427.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09125, pruned_loss=0.01371, audio_tagging_loss=0.008959, over 3051081.55 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:12:40,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.301e+01 8.925e+01 9.795e+01 1.226e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 17:12:43,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2466433.3333333335, ans=0.0 2023-11-23 17:12:56,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2466500.0, ans=0.0 2023-11-23 17:13:18,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370000 2023-11-23 17:13:19,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2466633.3333333335, ans=0.125 2023-11-23 17:13:19,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2466633.3333333335, ans=0.1 2023-11-23 17:13:42,033 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9300, loss[loss=0.06842, simple_loss=0.09548, pruned_loss=0.01231, audio_tagging_loss=0.008369, over 14227.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09097, pruned_loss=0.01366, audio_tagging_loss=0.009004, over 3045243.85 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:13:51,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2466766.6666666665, ans=0.1 2023-11-23 17:13:54,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2466833.3333333335, ans=0.0 2023-11-23 17:14:15,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2466900.0, ans=0.09899494936611666 2023-11-23 17:14:25,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370050 2023-11-23 17:14:29,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2023-11-23 17:14:38,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2467033.3333333335, ans=0.1 2023-11-23 17:14:47,293 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9350, loss[loss=0.07562, simple_loss=0.09893, pruned_loss=0.01474, audio_tagging_loss=0.01141, over 15337.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09039, pruned_loss=0.01348, audio_tagging_loss=0.00905, over 3047105.04 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:14:52,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.132e+01 8.275e+01 9.065e+01 9.778e+01 1.188e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-23 17:15:00,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2467166.6666666665, ans=0.0 2023-11-23 17:15:00,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2023-11-23 17:15:01,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-23 17:15:19,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2467233.3333333335, ans=0.125 2023-11-23 17:15:29,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370100 2023-11-23 17:15:50,778 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9400, loss[loss=0.06148, simple_loss=0.08801, pruned_loss=0.009466, audio_tagging_loss=0.008013, over 15312.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09044, pruned_loss=0.01347, audio_tagging_loss=0.009195, over 3044129.39 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:16:00,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:16:13,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2467500.0, ans=0.125 2023-11-23 17:16:23,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2467566.6666666665, ans=0.07 2023-11-23 17:16:33,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370150 2023-11-23 17:16:36,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2467633.3333333335, ans=0.125 2023-11-23 17:16:46,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2467700.0, ans=0.1 2023-11-23 17:16:53,388 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:16:54,584 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9450, loss[loss=0.06678, simple_loss=0.09627, pruned_loss=0.01249, audio_tagging_loss=0.006148, over 16007.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09048, pruned_loss=0.01342, audio_tagging_loss=0.009259, over 3044530.62 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:16:54,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2467766.6666666665, ans=0.125 2023-11-23 17:16:56,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2467766.6666666665, ans=0.2 2023-11-23 17:16:59,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.446e+01 8.380e+01 9.034e+01 1.002e+02 1.394e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-23 17:17:08,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2467833.3333333335, ans=0.2 2023-11-23 17:17:24,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2467900.0, ans=0.125 2023-11-23 17:17:30,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2467900.0, ans=0.1 2023-11-23 17:17:33,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2467966.6666666665, ans=0.0 2023-11-23 17:17:36,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370200 2023-11-23 17:17:55,714 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:17:58,776 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9500, loss[loss=0.05875, simple_loss=0.07547, pruned_loss=0.01217, audio_tagging_loss=0.008844, over 16143.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09111, pruned_loss=0.01356, audio_tagging_loss=0.009155, over 3046977.45 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:18:40,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370250 2023-11-23 17:18:42,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2468300.0, ans=0.1 2023-11-23 17:19:02,040 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9550, loss[loss=0.07165, simple_loss=0.09511, pruned_loss=0.01665, audio_tagging_loss=0.007445, over 16076.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09139, pruned_loss=0.01361, audio_tagging_loss=0.009253, over 3051251.26 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:19:07,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.559e+01 9.097e+01 9.877e+01 1.537e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-23 17:19:28,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2468566.6666666665, ans=0.125 2023-11-23 17:19:37,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2468566.6666666665, ans=0.125 2023-11-23 17:19:43,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370300 2023-11-23 17:19:52,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2468700.0, ans=0.0 2023-11-23 17:19:57,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-23 17:19:59,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2468700.0, ans=0.125 2023-11-23 17:20:02,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2468700.0, ans=0.125 2023-11-23 17:20:05,168 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9600, loss[loss=0.0725, simple_loss=0.1028, pruned_loss=0.01378, audio_tagging_loss=0.007302, over 15020.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09129, pruned_loss=0.01364, audio_tagging_loss=0.009291, over 3052300.65 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:20:06,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2023-11-23 17:20:46,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370350 2023-11-23 17:20:58,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=22.5 2023-11-23 17:21:06,833 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9650, loss[loss=0.07316, simple_loss=0.09518, pruned_loss=0.01551, audio_tagging_loss=0.01006, over 15899.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09165, pruned_loss=0.01374, audio_tagging_loss=0.009344, over 3040229.01 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:21:11,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.236e+01 8.814e+01 9.557e+01 1.553e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-23 17:21:22,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2469166.6666666665, ans=0.125 2023-11-23 17:21:35,426 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:21:39,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-23 17:21:48,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370400 2023-11-23 17:21:57,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2469366.6666666665, ans=0.125 2023-11-23 17:22:10,461 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9700, loss[loss=0.07101, simple_loss=0.1038, pruned_loss=0.01292, audio_tagging_loss=0.006191, over 16592.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.09289, pruned_loss=0.01399, audio_tagging_loss=0.009147, over 3044579.38 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:22:35,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2469566.6666666665, ans=0.2 2023-11-23 17:22:42,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2469566.6666666665, ans=0.125 2023-11-23 17:22:52,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370450 2023-11-23 17:23:13,575 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9750, loss[loss=0.05462, simple_loss=0.06716, pruned_loss=0.009495, audio_tagging_loss=0.01155, over 14355.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09349, pruned_loss=0.01407, audio_tagging_loss=0.009003, over 3045203.21 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:23:20,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.363e+01 8.980e+01 9.912e+01 1.548e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 17:23:23,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2023-11-23 17:23:46,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2469900.0, ans=0.1 2023-11-23 17:23:55,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370500 2023-11-23 17:24:14,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2470100.0, ans=0.125 2023-11-23 17:24:16,033 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9800, loss[loss=0.08839, simple_loss=0.124, pruned_loss=0.01776, audio_tagging_loss=0.008628, over 14855.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09287, pruned_loss=0.01392, audio_tagging_loss=0.008918, over 3043342.53 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:24:40,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2470233.3333333335, ans=0.125 2023-11-23 17:24:57,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370550 2023-11-23 17:25:04,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2470366.6666666665, ans=0.125 2023-11-23 17:25:10,347 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:25:10,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2470366.6666666665, ans=0.2 2023-11-23 17:25:11,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-23 17:25:18,002 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9850, loss[loss=0.05667, simple_loss=0.07403, pruned_loss=0.009385, audio_tagging_loss=0.01028, over 14721.00 frames. ], tot_loss[loss=0.06979, simple_loss=0.09378, pruned_loss=0.01412, audio_tagging_loss=0.008788, over 3049411.54 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:25:22,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2470433.3333333335, ans=0.1 2023-11-23 17:25:24,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.482e+01 9.119e+01 9.855e+01 1.403e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-23 17:25:24,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2470433.3333333335, ans=0.125 2023-11-23 17:25:40,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2023-11-23 17:25:58,186 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370600 2023-11-23 17:26:01,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2470633.3333333335, ans=0.125 2023-11-23 17:26:05,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2470633.3333333335, ans=0.04949747468305833 2023-11-23 17:26:08,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2470700.0, ans=0.95 2023-11-23 17:26:10,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2470700.0, ans=0.125 2023-11-23 17:26:11,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2470700.0, ans=0.125 2023-11-23 17:26:20,136 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9900, loss[loss=0.06557, simple_loss=0.08663, pruned_loss=0.01375, audio_tagging_loss=0.008506, over 15024.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09328, pruned_loss=0.01401, audio_tagging_loss=0.008798, over 3043216.19 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:26:41,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2470833.3333333335, ans=0.125 2023-11-23 17:27:00,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370650 2023-11-23 17:27:10,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2471033.3333333335, ans=0.125 2023-11-23 17:27:14,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2471033.3333333335, ans=0.2 2023-11-23 17:27:21,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2471100.0, ans=0.0 2023-11-23 17:27:22,241 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 9950, loss[loss=0.05688, simple_loss=0.07434, pruned_loss=0.009539, audio_tagging_loss=0.01018, over 14467.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09333, pruned_loss=0.01413, audio_tagging_loss=0.008819, over 3038687.59 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:27:28,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.205e+01 8.691e+01 9.411e+01 1.200e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-23 17:27:31,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2471100.0, ans=0.125 2023-11-23 17:28:03,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370700 2023-11-23 17:28:24,110 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10000, loss[loss=0.07189, simple_loss=0.09632, pruned_loss=0.01566, audio_tagging_loss=0.008066, over 14953.00 frames. ], tot_loss[loss=0.06955, simple_loss=0.09344, pruned_loss=0.0141, audio_tagging_loss=0.008733, over 3041319.18 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:28:26,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2471433.3333333335, ans=0.125 2023-11-23 17:28:27,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2471433.3333333335, ans=0.125 2023-11-23 17:28:59,303 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:28:59,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2471566.6666666665, ans=0.5 2023-11-23 17:29:04,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370750 2023-11-23 17:29:07,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2471633.3333333335, ans=0.125 2023-11-23 17:29:18,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2471700.0, ans=0.2 2023-11-23 17:29:21,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2471700.0, ans=0.125 2023-11-23 17:29:26,668 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10050, loss[loss=0.07518, simple_loss=0.1037, pruned_loss=0.0145, audio_tagging_loss=0.008816, over 15719.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09307, pruned_loss=0.01408, audio_tagging_loss=0.008802, over 3042521.13 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:29:29,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2471766.6666666665, ans=0.125 2023-11-23 17:29:33,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.306e+01 9.076e+01 9.554e+01 1.175e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 17:29:44,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2471833.3333333335, ans=0.1 2023-11-23 17:29:46,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2023-11-23 17:29:47,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2471833.3333333335, ans=0.125 2023-11-23 17:29:47,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2471833.3333333335, ans=0.0 2023-11-23 17:29:48,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2471833.3333333335, ans=0.2 2023-11-23 17:30:06,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370800 2023-11-23 17:30:08,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2471966.6666666665, ans=0.0 2023-11-23 17:30:25,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-23 17:30:28,773 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10100, loss[loss=0.07449, simple_loss=0.1098, pruned_loss=0.01361, audio_tagging_loss=0.005951, over 15823.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09366, pruned_loss=0.01415, audio_tagging_loss=0.00888, over 3042167.94 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:30:32,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2472100.0, ans=0.125 2023-11-23 17:31:07,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2472300.0, ans=0.1 2023-11-23 17:31:10,030 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370850 2023-11-23 17:31:12,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2023-11-23 17:31:13,763 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:31:17,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2023-11-23 17:31:18,147 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:31:24,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2472366.6666666665, ans=0.125 2023-11-23 17:31:29,902 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10150, loss[loss=0.07102, simple_loss=0.09442, pruned_loss=0.01153, audio_tagging_loss=0.01228, over 15017.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09433, pruned_loss=0.01416, audio_tagging_loss=0.008938, over 3041740.78 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:31:36,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.103e+01 8.204e+01 9.033e+01 9.611e+01 1.160e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-23 17:31:49,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2472500.0, ans=0.0 2023-11-23 17:31:59,356 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:31:59,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2472566.6666666665, ans=0.125 2023-11-23 17:32:04,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-23 17:32:05,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2472566.6666666665, ans=0.125 2023-11-23 17:32:11,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370900 2023-11-23 17:32:16,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2023-11-23 17:32:25,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2472700.0, ans=0.1 2023-11-23 17:32:31,947 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10200, loss[loss=0.07411, simple_loss=0.09745, pruned_loss=0.01605, audio_tagging_loss=0.009335, over 15115.00 frames. ], tot_loss[loss=0.06984, simple_loss=0.09367, pruned_loss=0.01401, audio_tagging_loss=0.008997, over 3047239.64 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:32:51,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-23 17:32:55,242 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:33:13,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 370950 2023-11-23 17:33:14,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2472966.6666666665, ans=0.2 2023-11-23 17:33:35,206 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10250, loss[loss=0.04747, simple_loss=0.05364, pruned_loss=0.009776, audio_tagging_loss=0.01088, over 16119.00 frames. ], tot_loss[loss=0.06991, simple_loss=0.09367, pruned_loss=0.01405, audio_tagging_loss=0.009028, over 3053810.11 frames. ], batch size: 62, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:33:42,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.633e+01 9.319e+01 1.005e+02 1.656e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-23 17:33:49,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-23 17:33:54,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2473166.6666666665, ans=0.125 2023-11-23 17:34:06,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2473233.3333333335, ans=0.0 2023-11-23 17:34:16,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371000 2023-11-23 17:34:23,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2473366.6666666665, ans=0.125 2023-11-23 17:34:28,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2473366.6666666665, ans=0.125 2023-11-23 17:34:36,429 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10300, loss[loss=0.05404, simple_loss=0.07737, pruned_loss=0.004854, audio_tagging_loss=0.0105, over 16014.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09335, pruned_loss=0.01409, audio_tagging_loss=0.009086, over 3060098.83 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:34:38,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-23 17:34:44,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2473433.3333333335, ans=0.0 2023-11-23 17:34:44,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2473433.3333333335, ans=0.125 2023-11-23 17:34:44,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-23 17:34:46,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2473433.3333333335, ans=0.1 2023-11-23 17:34:48,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-23 17:34:49,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2473500.0, ans=0.125 2023-11-23 17:35:00,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2473566.6666666665, ans=0.0 2023-11-23 17:35:12,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2473566.6666666665, ans=0.125 2023-11-23 17:35:17,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371050 2023-11-23 17:35:19,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2473633.3333333335, ans=0.2 2023-11-23 17:35:28,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2473700.0, ans=0.1 2023-11-23 17:35:28,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-23 17:35:34,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-23 17:35:38,907 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10350, loss[loss=0.07476, simple_loss=0.09942, pruned_loss=0.01639, audio_tagging_loss=0.008658, over 17262.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09233, pruned_loss=0.01383, audio_tagging_loss=0.009283, over 3056483.63 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:35:46,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.733e+01 8.382e+01 9.122e+01 9.955e+01 1.360e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-23 17:36:15,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2473900.0, ans=0.1 2023-11-23 17:36:17,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2473966.6666666665, ans=0.1 2023-11-23 17:36:20,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371100 2023-11-23 17:36:34,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2474033.3333333335, ans=0.125 2023-11-23 17:36:40,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2474100.0, ans=0.1 2023-11-23 17:36:41,593 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10400, loss[loss=0.05285, simple_loss=0.06977, pruned_loss=0.007081, audio_tagging_loss=0.01088, over 14515.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09187, pruned_loss=0.01368, audio_tagging_loss=0.009321, over 3052875.92 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:36:43,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2474100.0, ans=0.1 2023-11-23 17:36:48,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2474100.0, ans=0.125 2023-11-23 17:36:50,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2474100.0, ans=0.2 2023-11-23 17:36:57,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2474166.6666666665, ans=0.2 2023-11-23 17:36:58,889 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:37:17,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2474300.0, ans=0.0 2023-11-23 17:37:22,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371150 2023-11-23 17:37:43,936 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10450, loss[loss=0.06395, simple_loss=0.09147, pruned_loss=0.009727, audio_tagging_loss=0.008486, over 14225.00 frames. ], tot_loss[loss=0.06916, simple_loss=0.09202, pruned_loss=0.01385, audio_tagging_loss=0.009304, over 3046536.29 frames. ], batch size: 52, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:37:46,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2474433.3333333335, ans=0.0 2023-11-23 17:37:51,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.367e+01 9.085e+01 9.738e+01 1.469e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 17:38:05,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2474500.0, ans=0.0 2023-11-23 17:38:25,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371200 2023-11-23 17:38:46,218 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10500, loss[loss=0.0756, simple_loss=0.1075, pruned_loss=0.01345, audio_tagging_loss=0.008425, over 15616.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09176, pruned_loss=0.01373, audio_tagging_loss=0.009141, over 3044429.22 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:39:00,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2023-11-23 17:39:01,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2474833.3333333335, ans=0.0 2023-11-23 17:39:05,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2474833.3333333335, ans=0.125 2023-11-23 17:39:27,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371250 2023-11-23 17:39:32,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-23 17:39:44,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2475033.3333333335, ans=0.125 2023-11-23 17:39:48,775 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10550, loss[loss=0.05318, simple_loss=0.07055, pruned_loss=0.006959, audio_tagging_loss=0.01094, over 14745.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.09165, pruned_loss=0.01372, audio_tagging_loss=0.009078, over 3041709.36 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:39:50,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2475100.0, ans=0.0 2023-11-23 17:39:50,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-23 17:39:55,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.253e+01 8.407e+01 8.928e+01 9.553e+01 1.582e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-23 17:40:29,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371300 2023-11-23 17:40:30,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2475300.0, ans=0.125 2023-11-23 17:40:32,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2475300.0, ans=0.2 2023-11-23 17:40:44,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2475366.6666666665, ans=0.2 2023-11-23 17:40:50,997 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10600, loss[loss=0.08543, simple_loss=0.1092, pruned_loss=0.02162, audio_tagging_loss=0.009225, over 15011.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09227, pruned_loss=0.01371, audio_tagging_loss=0.008937, over 3038663.86 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:40:54,721 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:40:56,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2475433.3333333335, ans=0.125 2023-11-23 17:41:07,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2475500.0, ans=0.125 2023-11-23 17:41:07,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2023-11-23 17:41:31,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371350 2023-11-23 17:41:36,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2475633.3333333335, ans=0.125 2023-11-23 17:41:53,198 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10650, loss[loss=0.08226, simple_loss=0.1083, pruned_loss=0.01774, audio_tagging_loss=0.01039, over 14922.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.0919, pruned_loss=0.01383, audio_tagging_loss=0.008953, over 3041306.73 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:41:53,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2475766.6666666665, ans=0.0 2023-11-23 17:42:00,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.024e+01 8.195e+01 8.706e+01 9.399e+01 1.158e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-23 17:42:17,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2475900.0, ans=0.2 2023-11-23 17:42:19,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2475900.0, ans=0.5 2023-11-23 17:42:20,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2475900.0, ans=0.1 2023-11-23 17:42:23,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2475900.0, ans=0.125 2023-11-23 17:42:34,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371400 2023-11-23 17:42:42,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2476033.3333333335, ans=0.125 2023-11-23 17:42:43,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-23 17:42:55,549 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10700, loss[loss=0.05472, simple_loss=0.07262, pruned_loss=0.00924, audio_tagging_loss=0.009173, over 14979.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.0914, pruned_loss=0.01362, audio_tagging_loss=0.009029, over 3049185.70 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:42:55,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476100.0, ans=0.1 2023-11-23 17:43:04,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2476100.0, ans=0.125 2023-11-23 17:43:07,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2476166.6666666665, ans=0.2 2023-11-23 17:43:36,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371450 2023-11-23 17:43:37,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2476300.0, ans=0.0 2023-11-23 17:43:53,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2476366.6666666665, ans=0.1 2023-11-23 17:43:58,058 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10750, loss[loss=0.06393, simple_loss=0.0891, pruned_loss=0.01255, audio_tagging_loss=0.00683, over 15827.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09046, pruned_loss=0.01344, audio_tagging_loss=0.008993, over 3044873.06 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:44:05,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.362e+01 8.891e+01 9.533e+01 1.093e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-23 17:44:23,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2476566.6666666665, ans=0.125 2023-11-23 17:44:37,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2476633.3333333335, ans=0.1 2023-11-23 17:44:39,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371500 2023-11-23 17:44:44,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2476633.3333333335, ans=0.0 2023-11-23 17:44:45,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-23 17:45:01,411 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10800, loss[loss=0.0599, simple_loss=0.07953, pruned_loss=0.01091, audio_tagging_loss=0.009227, over 15620.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09185, pruned_loss=0.01371, audio_tagging_loss=0.008937, over 3045127.36 frames. ], batch size: 62, lr: 2.18e-03, grad_scale: 32.0 2023-11-23 17:45:01,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2476766.6666666665, ans=0.0 2023-11-23 17:45:25,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2476900.0, ans=0.2 2023-11-23 17:45:35,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476900.0, ans=0.1 2023-11-23 17:45:41,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371550 2023-11-23 17:46:03,189 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10850, loss[loss=0.04738, simple_loss=0.06061, pruned_loss=0.007442, audio_tagging_loss=0.009635, over 15706.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09281, pruned_loss=0.01386, audio_tagging_loss=0.009027, over 3044969.07 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:46:06,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-23 17:46:06,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2023-11-23 17:46:14,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.439e+01 8.913e+01 9.439e+01 1.225e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 17:46:30,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2477233.3333333335, ans=0.125 2023-11-23 17:46:32,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2477233.3333333335, ans=0.0 2023-11-23 17:46:43,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371600 2023-11-23 17:47:01,441 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:47:02,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-11-23 17:47:04,938 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10900, loss[loss=0.0755, simple_loss=0.09966, pruned_loss=0.01381, audio_tagging_loss=0.01186, over 14627.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09263, pruned_loss=0.0137, audio_tagging_loss=0.008974, over 3034526.28 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:47:05,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:47:10,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-23 17:47:15,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2477500.0, ans=0.04949747468305833 2023-11-23 17:47:25,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2477500.0, ans=0.1 2023-11-23 17:47:32,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2477566.6666666665, ans=0.125 2023-11-23 17:47:43,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2477633.3333333335, ans=0.1 2023-11-23 17:47:46,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371650 2023-11-23 17:47:52,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-11-23 17:47:56,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2477700.0, ans=0.0 2023-11-23 17:48:06,477 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 10950, loss[loss=0.07164, simple_loss=0.0969, pruned_loss=0.01432, audio_tagging_loss=0.008867, over 16322.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09252, pruned_loss=0.01363, audio_tagging_loss=0.009005, over 3043901.07 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:48:13,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2477766.6666666665, ans=0.0 2023-11-23 17:48:17,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.169e+01 8.798e+01 9.612e+01 4.443e+02, threshold=1.760e+02, percent-clipped=1.0 2023-11-23 17:48:19,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-23 17:48:23,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-23 17:48:47,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371700 2023-11-23 17:48:58,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2478033.3333333335, ans=0.2 2023-11-23 17:49:03,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-23 17:49:08,783 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11000, loss[loss=0.08028, simple_loss=0.1157, pruned_loss=0.01612, audio_tagging_loss=0.006305, over 15338.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09172, pruned_loss=0.01355, audio_tagging_loss=0.009099, over 3045043.94 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:49:15,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2478100.0, ans=0.125 2023-11-23 17:49:18,889 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 17:49:23,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2478166.6666666665, ans=0.0 2023-11-23 17:49:43,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2478233.3333333335, ans=0.125 2023-11-23 17:49:44,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2478300.0, ans=0.1 2023-11-23 17:49:49,412 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371750 2023-11-23 17:49:53,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2478300.0, ans=0.0 2023-11-23 17:49:53,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2478300.0, ans=0.125 2023-11-23 17:49:54,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-23 17:50:02,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2478366.6666666665, ans=0.0 2023-11-23 17:50:11,107 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11050, loss[loss=0.07906, simple_loss=0.1097, pruned_loss=0.01603, audio_tagging_loss=0.008163, over 15696.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09121, pruned_loss=0.01344, audio_tagging_loss=0.00925, over 3051606.74 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:50:15,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2478433.3333333335, ans=0.125 2023-11-23 17:50:21,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.456e+01 8.354e+01 9.034e+01 9.901e+01 1.113e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-23 17:50:38,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2478566.6666666665, ans=15.0 2023-11-23 17:50:52,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371800 2023-11-23 17:51:13,278 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11100, loss[loss=0.07222, simple_loss=0.09599, pruned_loss=0.01238, audio_tagging_loss=0.01184, over 15053.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09111, pruned_loss=0.01358, audio_tagging_loss=0.009385, over 3048235.38 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:51:18,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2478766.6666666665, ans=0.0 2023-11-23 17:51:43,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2478900.0, ans=0.125 2023-11-23 17:51:44,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2478900.0, ans=0.125 2023-11-23 17:51:54,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371850 2023-11-23 17:52:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2478966.6666666665, ans=0.125 2023-11-23 17:52:05,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2479033.3333333335, ans=0.125 2023-11-23 17:52:12,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=22.5 2023-11-23 17:52:15,284 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11150, loss[loss=0.06675, simple_loss=0.08618, pruned_loss=0.01208, audio_tagging_loss=0.01158, over 14917.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09125, pruned_loss=0.01353, audio_tagging_loss=0.009479, over 3039661.28 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 8.0 2023-11-23 17:52:26,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.139e+01 8.417e+01 9.306e+01 9.835e+01 1.557e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-23 17:52:28,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2479166.6666666665, ans=0.0 2023-11-23 17:52:55,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371900 2023-11-23 17:53:05,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2479366.6666666665, ans=0.2 2023-11-23 17:53:17,759 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11200, loss[loss=0.05927, simple_loss=0.07051, pruned_loss=0.01141, audio_tagging_loss=0.0126, over 15070.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09085, pruned_loss=0.01345, audio_tagging_loss=0.009612, over 3049027.79 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 16.0 2023-11-23 17:53:32,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-11-23 17:53:36,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2479500.0, ans=0.125 2023-11-23 17:53:58,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 371950 2023-11-23 17:54:06,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2479700.0, ans=0.1 2023-11-23 17:54:19,187 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11250, loss[loss=0.07243, simple_loss=0.1031, pruned_loss=0.01222, audio_tagging_loss=0.008642, over 15917.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.0919, pruned_loss=0.01363, audio_tagging_loss=0.009445, over 3055487.01 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 17:54:30,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.457e+01 8.293e+01 8.818e+01 9.873e+01 1.266e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-23 17:54:41,271 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 17:54:47,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2479900.0, ans=0.0 2023-11-23 17:55:00,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372000 2023-11-23 17:55:24,130 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11300, loss[loss=0.07729, simple_loss=0.1107, pruned_loss=0.01407, audio_tagging_loss=0.007861, over 15252.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09182, pruned_loss=0.01366, audio_tagging_loss=0.009304, over 3051694.82 frames. ], batch size: 55, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 17:55:51,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-23 17:56:05,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372050 2023-11-23 17:56:21,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2480366.6666666665, ans=0.125 2023-11-23 17:56:27,514 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11350, loss[loss=0.07832, simple_loss=0.1031, pruned_loss=0.01937, audio_tagging_loss=0.007424, over 15563.00 frames. ], tot_loss[loss=0.06939, simple_loss=0.09269, pruned_loss=0.01385, audio_tagging_loss=0.009192, over 3054401.87 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 17:56:27,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2480433.3333333335, ans=0.125 2023-11-23 17:56:39,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.419e+01 8.973e+01 9.627e+01 1.152e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-23 17:56:50,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2480566.6666666665, ans=0.1 2023-11-23 17:56:51,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2480566.6666666665, ans=0.125 2023-11-23 17:56:53,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2480566.6666666665, ans=0.125 2023-11-23 17:57:00,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2480566.6666666665, ans=10.0 2023-11-23 17:57:08,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372100 2023-11-23 17:57:28,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-23 17:57:28,753 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11400, loss[loss=0.06946, simple_loss=0.09188, pruned_loss=0.01275, audio_tagging_loss=0.01076, over 14697.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09215, pruned_loss=0.01387, audio_tagging_loss=0.009084, over 3050513.23 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 17:57:36,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2023-11-23 17:58:03,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-23 17:58:04,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2480900.0, ans=0.125 2023-11-23 17:58:09,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2480966.6666666665, ans=0.125 2023-11-23 17:58:10,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372150 2023-11-23 17:58:12,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-23 17:58:20,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2481033.3333333335, ans=0.0 2023-11-23 17:58:25,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-23 17:58:30,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-23 17:58:31,095 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11450, loss[loss=0.07729, simple_loss=0.1024, pruned_loss=0.02182, audio_tagging_loss=0.00426, over 15447.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09191, pruned_loss=0.0138, audio_tagging_loss=0.008934, over 3045679.91 frames. ], batch size: 55, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 17:58:39,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2481100.0, ans=0.0 2023-11-23 17:58:39,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2481100.0, ans=0.015 2023-11-23 17:58:43,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.130e+01 8.677e+01 9.441e+01 1.245e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-23 17:58:57,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2481233.3333333335, ans=0.0 2023-11-23 17:59:06,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2481233.3333333335, ans=0.07 2023-11-23 17:59:12,551 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372200 2023-11-23 17:59:18,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2481300.0, ans=0.0 2023-11-23 17:59:22,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2481366.6666666665, ans=0.0 2023-11-23 17:59:33,997 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11500, loss[loss=0.05528, simple_loss=0.06809, pruned_loss=0.009225, audio_tagging_loss=0.01201, over 15362.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09243, pruned_loss=0.01389, audio_tagging_loss=0.008949, over 3045646.88 frames. ], batch size: 58, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 17:59:55,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2481500.0, ans=0.125 2023-11-23 17:59:58,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2481566.6666666665, ans=0.0 2023-11-23 18:00:10,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-11-23 18:00:14,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372250 2023-11-23 18:00:33,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2481700.0, ans=0.125 2023-11-23 18:00:35,651 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11550, loss[loss=0.05031, simple_loss=0.06229, pruned_loss=0.008333, audio_tagging_loss=0.01083, over 15163.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09229, pruned_loss=0.01387, audio_tagging_loss=0.009, over 3052411.18 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 8.0 2023-11-23 18:00:48,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.420e+01 9.006e+01 9.672e+01 1.345e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 18:01:10,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2481900.0, ans=0.125 2023-11-23 18:01:13,606 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 18:01:17,176 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372300 2023-11-23 18:01:36,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2482100.0, ans=0.0 2023-11-23 18:01:37,298 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11600, loss[loss=0.1003, simple_loss=0.1386, pruned_loss=0.02402, audio_tagging_loss=0.006962, over 14833.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09242, pruned_loss=0.01384, audio_tagging_loss=0.008989, over 3049577.99 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:01:37,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2482100.0, ans=0.125 2023-11-23 18:01:43,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2482100.0, ans=0.2 2023-11-23 18:01:44,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2482100.0, ans=0.1 2023-11-23 18:01:47,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2482100.0, ans=0.1 2023-11-23 18:01:49,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2482166.6666666665, ans=0.0 2023-11-23 18:02:09,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2482233.3333333335, ans=0.125 2023-11-23 18:02:12,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-11-23 18:02:13,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2482300.0, ans=0.2 2023-11-23 18:02:16,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2482300.0, ans=0.1 2023-11-23 18:02:18,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372350 2023-11-23 18:02:22,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2482300.0, ans=0.04949747468305833 2023-11-23 18:02:38,542 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11650, loss[loss=0.1022, simple_loss=0.1323, pruned_loss=0.02747, audio_tagging_loss=0.008567, over 15206.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09268, pruned_loss=0.01396, audio_tagging_loss=0.008935, over 3054299.04 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:02:50,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.302e+01 8.966e+01 9.575e+01 1.248e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 18:02:56,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2023-11-23 18:02:56,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2482500.0, ans=0.1 2023-11-23 18:02:56,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2482500.0, ans=0.125 2023-11-23 18:03:01,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2482566.6666666665, ans=0.09899494936611666 2023-11-23 18:03:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2482566.6666666665, ans=0.0 2023-11-23 18:03:08,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2482566.6666666665, ans=0.0 2023-11-23 18:03:15,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2482633.3333333335, ans=0.07 2023-11-23 18:03:15,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2482633.3333333335, ans=0.125 2023-11-23 18:03:18,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372400 2023-11-23 18:03:40,970 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11700, loss[loss=0.09471, simple_loss=0.1339, pruned_loss=0.02245, audio_tagging_loss=0.005324, over 14690.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09281, pruned_loss=0.01392, audio_tagging_loss=0.008898, over 3050370.38 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:04:10,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-23 18:04:22,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372450 2023-11-23 18:04:27,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2482966.6666666665, ans=0.0 2023-11-23 18:04:30,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-23 18:04:42,736 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11750, loss[loss=0.06084, simple_loss=0.07974, pruned_loss=0.01066, audio_tagging_loss=0.01031, over 15814.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09267, pruned_loss=0.01394, audio_tagging_loss=0.008939, over 3052052.84 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:04:54,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.352e+01 9.061e+01 9.652e+01 1.174e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-23 18:04:55,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2483166.6666666665, ans=0.2 2023-11-23 18:05:19,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-23 18:05:23,279 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372500 2023-11-23 18:05:44,258 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11800, loss[loss=0.0783, simple_loss=0.1074, pruned_loss=0.01531, audio_tagging_loss=0.009281, over 14624.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09344, pruned_loss=0.0141, audio_tagging_loss=0.008881, over 3048620.82 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:05:57,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2483500.0, ans=0.2 2023-11-23 18:06:00,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2483500.0, ans=0.0 2023-11-23 18:06:01,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2483500.0, ans=0.05 2023-11-23 18:06:08,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2483566.6666666665, ans=0.125 2023-11-23 18:06:25,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372550 2023-11-23 18:06:46,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2483766.6666666665, ans=0.125 2023-11-23 18:06:46,832 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11850, loss[loss=0.06983, simple_loss=0.09662, pruned_loss=0.01136, audio_tagging_loss=0.01016, over 15351.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09253, pruned_loss=0.01392, audio_tagging_loss=0.00904, over 3043855.87 frames. ], batch size: 58, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:06:50,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2483766.6666666665, ans=0.1 2023-11-23 18:06:53,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-11-23 18:06:59,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.390e+01 9.057e+01 9.896e+01 1.267e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 18:07:02,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2483833.3333333335, ans=0.125 2023-11-23 18:07:05,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2483833.3333333335, ans=0.2 2023-11-23 18:07:18,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2483900.0, ans=0.125 2023-11-23 18:07:28,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372600 2023-11-23 18:07:39,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2484033.3333333335, ans=0.0 2023-11-23 18:07:44,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2484033.3333333335, ans=0.125 2023-11-23 18:07:45,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=2484033.3333333335, ans=0.02 2023-11-23 18:07:49,515 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11900, loss[loss=0.06713, simple_loss=0.08974, pruned_loss=0.01325, audio_tagging_loss=0.009009, over 15056.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09265, pruned_loss=0.01393, audio_tagging_loss=0.009148, over 3049639.46 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:07:56,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2484100.0, ans=0.0 2023-11-23 18:08:09,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=12.0 2023-11-23 18:08:20,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2484233.3333333335, ans=0.125 2023-11-23 18:08:29,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2484300.0, ans=0.1 2023-11-23 18:08:30,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372650 2023-11-23 18:08:34,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2023-11-23 18:08:43,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2484366.6666666665, ans=0.0 2023-11-23 18:08:51,638 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 11950, loss[loss=0.07771, simple_loss=0.1013, pruned_loss=0.01814, audio_tagging_loss=0.008914, over 15925.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09447, pruned_loss=0.01434, audio_tagging_loss=0.009094, over 3053450.46 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 16.0 2023-11-23 18:09:04,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.358e+01 9.149e+01 9.907e+01 1.513e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 18:09:08,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-23 18:09:14,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.81 vs. limit=5.0 2023-11-23 18:09:20,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2484566.6666666665, ans=0.125 2023-11-23 18:09:29,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2484633.3333333335, ans=0.1 2023-11-23 18:09:31,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372700 2023-11-23 18:09:45,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2023-11-23 18:09:46,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2484700.0, ans=0.125 2023-11-23 18:09:50,717 INFO [train_asr.py:1221] (1/4) Epoch 31, batch 12000, loss[loss=0.06659, simple_loss=0.0926, pruned_loss=0.01065, audio_tagging_loss=0.009641, over 15589.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09279, pruned_loss=0.01408, audio_tagging_loss=0.009298, over 3048219.87 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 32.0 2023-11-23 18:09:50,718 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 18:10:31,547 INFO [train_asr.py:1253] (1/4) Epoch 31, validation: loss=0.05782, simple_loss=0.05111, pruned_loss=0.00519, audio_tagging_loss=0.02708, over 4681554.00 frames. 2023-11-23 18:10:31,547 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 18:10:44,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2484833.3333333335, ans=0.125 2023-11-23 18:11:33,925 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 0, loss[loss=0.08041, simple_loss=0.0835, pruned_loss=0.01309, audio_tagging_loss=0.02557, over 15270.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.0835, pruned_loss=0.01309, audio_tagging_loss=0.02557, over 15270.00 frames. ], batch size: 57, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:11:33,926 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 18:12:09,581 INFO [train_asr.py:1253] (1/4) Epoch 32, validation: loss=0.05806, simple_loss=0.0511, pruned_loss=0.005184, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-23 18:12:09,582 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 18:12:09,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2484926.6666666665, ans=0.125 2023-11-23 18:12:20,790 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372750 2023-11-23 18:12:26,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2484993.3333333335, ans=0.1 2023-11-23 18:12:32,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2484993.3333333335, ans=0.1 2023-11-23 18:12:35,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2485060.0, ans=0.125 2023-11-23 18:12:49,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2485126.6666666665, ans=0.125 2023-11-23 18:12:52,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2485126.6666666665, ans=0.1 2023-11-23 18:12:54,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.631e+01 9.416e+01 1.070e+02 1.612e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-23 18:13:06,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2485193.3333333335, ans=0.07 2023-11-23 18:13:11,368 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 50, loss[loss=0.06391, simple_loss=0.06757, pruned_loss=0.009886, audio_tagging_loss=0.02024, over 14993.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09472, pruned_loss=0.01391, audio_tagging_loss=0.01759, over 690960.36 frames. ], batch size: 56, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:13:14,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2485260.0, ans=0.2 2023-11-23 18:13:22,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372800 2023-11-23 18:13:35,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2485393.3333333335, ans=0.125 2023-11-23 18:14:13,827 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 100, loss[loss=0.0862, simple_loss=0.108, pruned_loss=0.01763, audio_tagging_loss=0.01459, over 14947.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09274, pruned_loss=0.01419, audio_tagging_loss=0.01677, over 1210058.35 frames. ], batch size: 53, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:14:21,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2485593.3333333335, ans=0.0 2023-11-23 18:14:25,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372850 2023-11-23 18:14:30,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2485660.0, ans=0.2 2023-11-23 18:14:34,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2023-11-23 18:14:38,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2485726.6666666665, ans=0.1 2023-11-23 18:14:44,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2485726.6666666665, ans=0.05 2023-11-23 18:14:52,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2485793.3333333335, ans=0.04949747468305833 2023-11-23 18:14:59,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.892e+01 9.428e+01 1.026e+02 1.429e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-23 18:14:59,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2485793.3333333335, ans=0.125 2023-11-23 18:15:16,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-23 18:15:17,199 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 150, loss[loss=0.07759, simple_loss=0.1003, pruned_loss=0.01516, audio_tagging_loss=0.01228, over 15185.00 frames. ], tot_loss[loss=0.07573, simple_loss=0.09373, pruned_loss=0.01401, audio_tagging_loss=0.01485, over 1609727.44 frames. ], batch size: 58, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:15:28,451 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372900 2023-11-23 18:15:34,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2485993.3333333335, ans=0.07 2023-11-23 18:15:54,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2486126.6666666665, ans=0.0 2023-11-23 18:15:56,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2486126.6666666665, ans=0.125 2023-11-23 18:16:05,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2486193.3333333335, ans=0.125 2023-11-23 18:16:07,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-23 18:16:18,929 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 200, loss[loss=0.04729, simple_loss=0.04871, pruned_loss=0.01148, audio_tagging_loss=0.01145, over 14799.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.09449, pruned_loss=0.01435, audio_tagging_loss=0.01323, over 1930588.41 frames. ], batch size: 58, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:16:21,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2486260.0, ans=0.125 2023-11-23 18:16:25,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2486260.0, ans=0.07 2023-11-23 18:16:30,262 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 372950 2023-11-23 18:16:46,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2486393.3333333335, ans=0.2 2023-11-23 18:16:47,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2486393.3333333335, ans=0.125 2023-11-23 18:16:47,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2486393.3333333335, ans=0.0 2023-11-23 18:17:04,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.534e+01 9.090e+01 9.831e+01 1.209e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 18:17:05,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2486460.0, ans=0.125 2023-11-23 18:17:20,649 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 250, loss[loss=0.0574, simple_loss=0.07535, pruned_loss=0.01029, audio_tagging_loss=0.009433, over 15694.00 frames. ], tot_loss[loss=0.07345, simple_loss=0.0943, pruned_loss=0.01429, audio_tagging_loss=0.01201, over 2187787.83 frames. ], batch size: 61, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:17:31,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373000 2023-11-23 18:17:46,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2486726.6666666665, ans=0.2 2023-11-23 18:18:01,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2486793.3333333335, ans=0.1 2023-11-23 18:18:06,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2486793.3333333335, ans=0.1 2023-11-23 18:18:22,652 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 300, loss[loss=0.05947, simple_loss=0.07712, pruned_loss=0.01296, audio_tagging_loss=0.007954, over 15855.00 frames. ], tot_loss[loss=0.07253, simple_loss=0.09405, pruned_loss=0.01434, audio_tagging_loss=0.01116, over 2377292.43 frames. ], batch size: 60, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:18:31,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-23 18:18:34,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373050 2023-11-23 18:18:49,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2487060.0, ans=0.0 2023-11-23 18:18:57,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2487060.0, ans=0.125 2023-11-23 18:19:08,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.767e+01 9.315e+01 9.987e+01 1.181e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-23 18:19:24,679 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 350, loss[loss=0.072, simple_loss=0.1006, pruned_loss=0.01329, audio_tagging_loss=0.008414, over 15099.00 frames. ], tot_loss[loss=0.07134, simple_loss=0.09349, pruned_loss=0.01409, audio_tagging_loss=0.01051, over 2536329.40 frames. ], batch size: 55, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:19:36,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373100 2023-11-23 18:19:39,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2023-11-23 18:20:00,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2487460.0, ans=0.125 2023-11-23 18:20:06,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2487460.0, ans=0.04949747468305833 2023-11-23 18:20:17,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2487526.6666666665, ans=0.0 2023-11-23 18:20:26,834 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 400, loss[loss=0.063, simple_loss=0.08337, pruned_loss=0.0114, audio_tagging_loss=0.009919, over 15655.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.09366, pruned_loss=0.0141, audio_tagging_loss=0.01017, over 2649866.95 frames. ], batch size: 58, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:20:34,037 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 18:20:37,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373150 2023-11-23 18:20:57,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2487726.6666666665, ans=0.0 2023-11-23 18:21:00,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2487726.6666666665, ans=0.125 2023-11-23 18:21:08,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-23 18:21:12,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.510e+01 9.091e+01 9.716e+01 1.253e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 18:21:28,543 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 450, loss[loss=0.06073, simple_loss=0.08409, pruned_loss=0.01053, audio_tagging_loss=0.00816, over 14640.00 frames. ], tot_loss[loss=0.07047, simple_loss=0.09283, pruned_loss=0.01416, audio_tagging_loss=0.00989, over 2741480.02 frames. ], batch size: 57, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:21:39,995 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373200 2023-11-23 18:21:46,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=15.0 2023-11-23 18:21:50,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2487993.3333333335, ans=0.125 2023-11-23 18:22:13,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-23 18:22:14,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2488126.6666666665, ans=0.1 2023-11-23 18:22:23,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2488193.3333333335, ans=0.125 2023-11-23 18:22:31,178 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 500, loss[loss=0.07971, simple_loss=0.1182, pruned_loss=0.01486, audio_tagging_loss=0.005764, over 15976.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09226, pruned_loss=0.01404, audio_tagging_loss=0.009712, over 2818684.19 frames. ], batch size: 56, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:22:33,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2488260.0, ans=0.5 2023-11-23 18:22:42,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373250 2023-11-23 18:22:53,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2488326.6666666665, ans=0.0 2023-11-23 18:23:15,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-23 18:23:17,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.516e+01 9.137e+01 9.960e+01 1.270e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-23 18:23:32,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-11-23 18:23:34,139 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 550, loss[loss=0.06918, simple_loss=0.08536, pruned_loss=0.01568, audio_tagging_loss=0.01082, over 14092.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.09229, pruned_loss=0.01398, audio_tagging_loss=0.009506, over 2867357.93 frames. ], batch size: 53, lr: 2.14e-03, grad_scale: 32.0 2023-11-23 18:23:35,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2488593.3333333335, ans=0.0 2023-11-23 18:23:44,838 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373300 2023-11-23 18:23:47,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2488660.0, ans=0.125 2023-11-23 18:23:52,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2488660.0, ans=0.125 2023-11-23 18:24:12,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2488793.3333333335, ans=0.125 2023-11-23 18:24:14,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2488793.3333333335, ans=0.125 2023-11-23 18:24:18,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2488793.3333333335, ans=0.02 2023-11-23 18:24:22,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2488860.0, ans=0.0 2023-11-23 18:24:25,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2488860.0, ans=0.1 2023-11-23 18:24:36,005 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 600, loss[loss=0.08317, simple_loss=0.1121, pruned_loss=0.01946, audio_tagging_loss=0.007659, over 14730.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09262, pruned_loss=0.01407, audio_tagging_loss=0.009393, over 2907028.84 frames. ], batch size: 55, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:24:46,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373350 2023-11-23 18:24:59,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2489060.0, ans=0.125 2023-11-23 18:25:01,024 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 18:25:06,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2489060.0, ans=0.125 2023-11-23 18:25:20,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2489126.6666666665, ans=0.125 2023-11-23 18:25:23,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.713e+01 9.332e+01 1.014e+02 1.298e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-23 18:25:29,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-23 18:25:38,131 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 650, loss[loss=0.0666, simple_loss=0.09407, pruned_loss=0.01077, audio_tagging_loss=0.008794, over 15649.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09308, pruned_loss=0.01404, audio_tagging_loss=0.009227, over 2946768.31 frames. ], batch size: 61, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:25:48,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2489260.0, ans=0.0 2023-11-23 18:25:49,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373400 2023-11-23 18:26:05,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2489393.3333333335, ans=0.0 2023-11-23 18:26:10,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2489393.3333333335, ans=0.125 2023-11-23 18:26:14,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2489460.0, ans=0.0 2023-11-23 18:26:34,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2489526.6666666665, ans=0.1 2023-11-23 18:26:41,070 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 700, loss[loss=0.07246, simple_loss=0.1054, pruned_loss=0.01157, audio_tagging_loss=0.008207, over 15282.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09324, pruned_loss=0.01403, audio_tagging_loss=0.009223, over 2964763.23 frames. ], batch size: 58, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:26:41,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2023-11-23 18:26:42,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2489593.3333333335, ans=0.125 2023-11-23 18:26:51,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373450 2023-11-23 18:27:18,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2489793.3333333335, ans=0.125 2023-11-23 18:27:20,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2489793.3333333335, ans=0.025 2023-11-23 18:27:29,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.089e+01 8.956e+01 9.594e+01 1.307e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 18:27:42,838 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 750, loss[loss=0.08405, simple_loss=0.1155, pruned_loss=0.01871, audio_tagging_loss=0.007586, over 15540.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09377, pruned_loss=0.01408, audio_tagging_loss=0.009247, over 2988109.18 frames. ], batch size: 55, lr: 2.14e-03, grad_scale: 8.0 2023-11-23 18:27:53,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373500 2023-11-23 18:28:04,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2489993.3333333335, ans=0.0 2023-11-23 18:28:05,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2489993.3333333335, ans=0.0 2023-11-23 18:28:14,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2490060.0, ans=0.2 2023-11-23 18:28:24,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2490126.6666666665, ans=0.1 2023-11-23 18:28:30,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2490126.6666666665, ans=0.2 2023-11-23 18:28:41,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2490193.3333333335, ans=0.1 2023-11-23 18:28:41,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2490193.3333333335, ans=0.2 2023-11-23 18:28:42,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2023-11-23 18:28:45,047 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 800, loss[loss=0.07494, simple_loss=0.1144, pruned_loss=0.01241, audio_tagging_loss=0.005342, over 15041.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09236, pruned_loss=0.0138, audio_tagging_loss=0.009343, over 3004802.02 frames. ], batch size: 55, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:28:55,836 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373550 2023-11-23 18:29:05,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2490326.6666666665, ans=0.2 2023-11-23 18:29:09,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2490393.3333333335, ans=0.125 2023-11-23 18:29:12,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2490393.3333333335, ans=0.125 2023-11-23 18:29:13,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2490393.3333333335, ans=0.125 2023-11-23 18:29:18,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-11-23 18:29:33,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.422e+01 8.967e+01 9.780e+01 1.208e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 18:29:36,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2490526.6666666665, ans=0.0 2023-11-23 18:29:46,992 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 850, loss[loss=0.08534, simple_loss=0.119, pruned_loss=0.01951, audio_tagging_loss=0.006318, over 14918.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09313, pruned_loss=0.01397, audio_tagging_loss=0.009267, over 3017933.50 frames. ], batch size: 54, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:29:55,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2490593.3333333335, ans=0.0 2023-11-23 18:29:57,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2490593.3333333335, ans=0.125 2023-11-23 18:29:58,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373600 2023-11-23 18:30:34,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2490793.3333333335, ans=0.125 2023-11-23 18:30:43,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2023-11-23 18:30:49,691 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 900, loss[loss=0.07649, simple_loss=0.09068, pruned_loss=0.02028, audio_tagging_loss=0.01087, over 14060.00 frames. ], tot_loss[loss=0.07, simple_loss=0.09353, pruned_loss=0.01399, audio_tagging_loss=0.009251, over 3017450.06 frames. ], batch size: 53, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:30:58,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2490926.6666666665, ans=0.125 2023-11-23 18:31:01,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373650 2023-11-23 18:31:05,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.49 vs. limit=22.5 2023-11-23 18:31:37,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2491126.6666666665, ans=0.07 2023-11-23 18:31:38,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.291e+01 8.989e+01 9.605e+01 1.191e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-23 18:31:47,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2023-11-23 18:31:49,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2491193.3333333335, ans=0.015 2023-11-23 18:31:49,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=22.5 2023-11-23 18:31:51,780 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 950, loss[loss=0.07034, simple_loss=0.09499, pruned_loss=0.01329, audio_tagging_loss=0.009554, over 15541.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09373, pruned_loss=0.01406, audio_tagging_loss=0.009269, over 3028129.08 frames. ], batch size: 58, lr: 2.14e-03, grad_scale: 16.0 2023-11-23 18:32:02,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2491260.0, ans=0.05 2023-11-23 18:32:03,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373700 2023-11-23 18:32:42,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2491526.6666666665, ans=0.0 2023-11-23 18:32:45,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2491526.6666666665, ans=0.2 2023-11-23 18:32:53,758 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1000, loss[loss=0.08912, simple_loss=0.1109, pruned_loss=0.02211, audio_tagging_loss=0.01157, over 15348.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09266, pruned_loss=0.01385, audio_tagging_loss=0.009106, over 3027608.55 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:32:57,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2491593.3333333335, ans=0.2 2023-11-23 18:32:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2491593.3333333335, ans=0.1 2023-11-23 18:33:04,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2491593.3333333335, ans=0.125 2023-11-23 18:33:05,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373750 2023-11-23 18:33:18,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2491726.6666666665, ans=0.125 2023-11-23 18:33:20,535 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 18:33:29,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2491726.6666666665, ans=0.2 2023-11-23 18:33:29,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2491726.6666666665, ans=0.5 2023-11-23 18:33:40,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2491793.3333333335, ans=0.125 2023-11-23 18:33:42,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.316e+01 8.954e+01 9.512e+01 1.152e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 18:33:55,923 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1050, loss[loss=0.05717, simple_loss=0.07584, pruned_loss=0.01094, audio_tagging_loss=0.008316, over 14531.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.0926, pruned_loss=0.01368, audio_tagging_loss=0.009078, over 3027582.88 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:34:04,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2491926.6666666665, ans=0.0 2023-11-23 18:34:04,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2491926.6666666665, ans=0.09899494936611666 2023-11-23 18:34:07,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373800 2023-11-23 18:34:09,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2023-11-23 18:34:12,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-11-23 18:34:19,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2491993.3333333335, ans=0.0 2023-11-23 18:34:19,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2023-11-23 18:34:27,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2492060.0, ans=0.125 2023-11-23 18:34:32,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2492060.0, ans=0.125 2023-11-23 18:34:33,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2492126.6666666665, ans=0.1 2023-11-23 18:34:38,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2492126.6666666665, ans=0.0 2023-11-23 18:34:42,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2492126.6666666665, ans=0.0 2023-11-23 18:34:59,569 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1100, loss[loss=0.06192, simple_loss=0.09008, pruned_loss=0.009672, audio_tagging_loss=0.007203, over 15418.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.09198, pruned_loss=0.0137, audio_tagging_loss=0.009024, over 3028348.30 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:35:02,529 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 18:35:10,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373850 2023-11-23 18:35:36,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2492460.0, ans=0.0 2023-11-23 18:35:44,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2492460.0, ans=0.09899494936611666 2023-11-23 18:35:48,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.110e+01 8.335e+01 9.013e+01 9.723e+01 1.742e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-23 18:35:54,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2492526.6666666665, ans=10.0 2023-11-23 18:36:01,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2492593.3333333335, ans=0.125 2023-11-23 18:36:01,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2492593.3333333335, ans=0.125 2023-11-23 18:36:02,006 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1150, loss[loss=0.04316, simple_loss=0.04739, pruned_loss=0.006414, audio_tagging_loss=0.01305, over 14002.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09197, pruned_loss=0.01366, audio_tagging_loss=0.008919, over 3029524.68 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:36:13,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373900 2023-11-23 18:36:17,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2492660.0, ans=0.1 2023-11-23 18:36:26,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2492726.6666666665, ans=0.1 2023-11-23 18:36:47,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2492793.3333333335, ans=0.2 2023-11-23 18:36:50,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2492860.0, ans=0.125 2023-11-23 18:36:50,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2492860.0, ans=0.125 2023-11-23 18:36:51,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2492860.0, ans=0.125 2023-11-23 18:37:04,181 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1200, loss[loss=0.07185, simple_loss=0.1012, pruned_loss=0.01496, audio_tagging_loss=0.006287, over 14782.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09242, pruned_loss=0.01363, audio_tagging_loss=0.008884, over 3034604.86 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:37:16,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 373950 2023-11-23 18:37:52,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.465e+01 8.983e+01 9.576e+01 1.363e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 18:38:06,957 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1250, loss[loss=0.06335, simple_loss=0.07369, pruned_loss=0.01097, audio_tagging_loss=0.01554, over 13593.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09219, pruned_loss=0.01367, audio_tagging_loss=0.008853, over 3033751.08 frames. ], batch size: 53, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:38:07,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2023-11-23 18:38:16,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2493260.0, ans=0.2 2023-11-23 18:38:18,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374000 2023-11-23 18:38:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2493393.3333333335, ans=0.1 2023-11-23 18:38:42,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=22.5 2023-11-23 18:38:44,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2493460.0, ans=0.125 2023-11-23 18:38:48,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2493460.0, ans=0.0 2023-11-23 18:38:53,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2493460.0, ans=10.0 2023-11-23 18:39:00,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2493526.6666666665, ans=0.125 2023-11-23 18:39:09,325 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1300, loss[loss=0.07886, simple_loss=0.109, pruned_loss=0.0161, audio_tagging_loss=0.008273, over 15251.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09211, pruned_loss=0.01374, audio_tagging_loss=0.00881, over 3028566.32 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:39:11,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2493593.3333333335, ans=0.125 2023-11-23 18:39:20,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374050 2023-11-23 18:39:35,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2493726.6666666665, ans=0.1 2023-11-23 18:39:59,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.471e+01 8.997e+01 9.654e+01 1.238e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 18:40:11,611 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1350, loss[loss=0.06711, simple_loss=0.08012, pruned_loss=0.01274, audio_tagging_loss=0.01431, over 15385.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09246, pruned_loss=0.01376, audio_tagging_loss=0.008756, over 3034987.56 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:40:15,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2493926.6666666665, ans=0.5 2023-11-23 18:40:19,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-23 18:40:22,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374100 2023-11-23 18:40:23,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2493993.3333333335, ans=0.125 2023-11-23 18:40:25,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2493993.3333333335, ans=0.1 2023-11-23 18:40:26,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-23 18:40:43,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2494060.0, ans=0.0 2023-11-23 18:40:44,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2494060.0, ans=0.0 2023-11-23 18:40:48,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2494126.6666666665, ans=10.0 2023-11-23 18:40:53,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2494126.6666666665, ans=0.0 2023-11-23 18:40:55,034 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 18:41:10,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2494193.3333333335, ans=0.0 2023-11-23 18:41:13,472 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1400, loss[loss=0.05978, simple_loss=0.0799, pruned_loss=0.009741, audio_tagging_loss=0.01009, over 14572.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09224, pruned_loss=0.01367, audio_tagging_loss=0.008855, over 3044141.14 frames. ], batch size: 53, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:41:16,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2494260.0, ans=0.95 2023-11-23 18:41:23,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2494260.0, ans=0.1 2023-11-23 18:41:25,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374150 2023-11-23 18:41:40,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2494393.3333333335, ans=0.2 2023-11-23 18:41:47,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2494393.3333333335, ans=0.07 2023-11-23 18:41:51,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=22.5 2023-11-23 18:42:03,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.437e+01 8.832e+01 9.783e+01 1.644e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-23 18:42:03,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2494526.6666666665, ans=0.0 2023-11-23 18:42:15,818 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1450, loss[loss=0.0836, simple_loss=0.1164, pruned_loss=0.01721, audio_tagging_loss=0.008183, over 16036.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09233, pruned_loss=0.01358, audio_tagging_loss=0.008918, over 3041450.99 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:42:16,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2494593.3333333335, ans=0.1 2023-11-23 18:42:27,306 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374200 2023-11-23 18:42:44,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2494726.6666666665, ans=0.2 2023-11-23 18:42:47,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2494726.6666666665, ans=0.0 2023-11-23 18:42:55,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2494793.3333333335, ans=0.125 2023-11-23 18:42:58,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2494793.3333333335, ans=0.125 2023-11-23 18:43:10,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2494860.0, ans=0.125 2023-11-23 18:43:17,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2494926.6666666665, ans=0.125 2023-11-23 18:43:18,566 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1500, loss[loss=0.07105, simple_loss=0.1, pruned_loss=0.01121, audio_tagging_loss=0.009829, over 14842.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09283, pruned_loss=0.01379, audio_tagging_loss=0.00897, over 3037099.23 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:43:19,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2494926.6666666665, ans=0.0 2023-11-23 18:43:29,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374250 2023-11-23 18:43:32,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2494993.3333333335, ans=0.125 2023-11-23 18:43:35,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-23 18:43:51,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=2495060.0, ans=15.0 2023-11-23 18:44:08,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.386e+01 9.077e+01 9.732e+01 1.149e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 18:44:21,217 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1550, loss[loss=0.06581, simple_loss=0.0862, pruned_loss=0.01134, audio_tagging_loss=0.01138, over 15287.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09206, pruned_loss=0.01364, audio_tagging_loss=0.009103, over 3031749.78 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 18:44:22,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2495260.0, ans=0.0 2023-11-23 18:44:27,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2495260.0, ans=0.0 2023-11-23 18:44:32,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374300 2023-11-23 18:44:36,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2495326.6666666665, ans=0.05 2023-11-23 18:44:38,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2495326.6666666665, ans=0.125 2023-11-23 18:44:50,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2495393.3333333335, ans=0.0 2023-11-23 18:44:59,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2495460.0, ans=0.0 2023-11-23 18:45:05,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-23 18:45:09,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2495460.0, ans=0.125 2023-11-23 18:45:23,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2495593.3333333335, ans=0.2 2023-11-23 18:45:24,458 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1600, loss[loss=0.0628, simple_loss=0.08686, pruned_loss=0.009692, audio_tagging_loss=0.009675, over 16184.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09255, pruned_loss=0.01385, audio_tagging_loss=0.009137, over 3039069.50 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:45:25,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2495593.3333333335, ans=0.125 2023-11-23 18:45:35,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374350 2023-11-23 18:45:46,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2495660.0, ans=0.1 2023-11-23 18:45:47,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2495726.6666666665, ans=0.95 2023-11-23 18:46:03,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2495793.3333333335, ans=0.0 2023-11-23 18:46:14,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.412e+01 9.043e+01 9.774e+01 1.173e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 18:46:25,938 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1650, loss[loss=0.04863, simple_loss=0.06407, pruned_loss=0.006426, audio_tagging_loss=0.01017, over 14457.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09216, pruned_loss=0.01388, audio_tagging_loss=0.009258, over 3039621.68 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:46:37,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374400 2023-11-23 18:47:09,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-23 18:47:22,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2496193.3333333335, ans=0.2 2023-11-23 18:47:28,982 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1700, loss[loss=0.07407, simple_loss=0.09705, pruned_loss=0.01699, audio_tagging_loss=0.008561, over 14897.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09264, pruned_loss=0.01385, audio_tagging_loss=0.009336, over 3044882.85 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:47:39,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374450 2023-11-23 18:48:07,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2496460.0, ans=0.125 2023-11-23 18:48:18,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.417e+01 9.193e+01 9.971e+01 1.264e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-23 18:48:20,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2496526.6666666665, ans=0.0 2023-11-23 18:48:31,081 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1750, loss[loss=0.06146, simple_loss=0.08988, pruned_loss=0.009862, audio_tagging_loss=0.006658, over 14656.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09215, pruned_loss=0.01383, audio_tagging_loss=0.009248, over 3039762.10 frames. ], batch size: 52, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:48:42,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374500 2023-11-23 18:48:46,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2496660.0, ans=0.2 2023-11-23 18:48:51,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-11-23 18:48:52,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2496660.0, ans=0.0 2023-11-23 18:49:00,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=15.0 2023-11-23 18:49:17,978 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 18:49:26,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2496860.0, ans=0.0 2023-11-23 18:49:33,570 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1800, loss[loss=0.052, simple_loss=0.06696, pruned_loss=0.008027, audio_tagging_loss=0.01049, over 15192.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09264, pruned_loss=0.01381, audio_tagging_loss=0.00913, over 3040384.01 frames. ], batch size: 60, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:49:38,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2496926.6666666665, ans=0.0 2023-11-23 18:49:45,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374550 2023-11-23 18:49:49,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2496993.3333333335, ans=0.125 2023-11-23 18:50:09,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2497060.0, ans=0.125 2023-11-23 18:50:23,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.228e+01 8.763e+01 9.519e+01 1.190e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-23 18:50:35,993 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1850, loss[loss=0.07755, simple_loss=0.1119, pruned_loss=0.01361, audio_tagging_loss=0.007978, over 15509.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09185, pruned_loss=0.0137, audio_tagging_loss=0.009044, over 3044934.17 frames. ], batch size: 60, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:50:47,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374600 2023-11-23 18:51:26,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-23 18:51:35,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-23 18:51:38,704 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1900, loss[loss=0.06971, simple_loss=0.09102, pruned_loss=0.01529, audio_tagging_loss=0.008911, over 15150.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09214, pruned_loss=0.01378, audio_tagging_loss=0.009082, over 3036958.47 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:51:50,226 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374650 2023-11-23 18:51:51,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-23 18:52:17,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2023-11-23 18:52:29,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.306e+01 8.793e+01 9.456e+01 1.356e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-23 18:52:33,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-23 18:52:35,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-23 18:52:41,801 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 1950, loss[loss=0.05778, simple_loss=0.08496, pruned_loss=0.006892, audio_tagging_loss=0.008413, over 13983.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.09343, pruned_loss=0.0139, audio_tagging_loss=0.008941, over 3039456.50 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:52:51,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2497926.6666666665, ans=0.0 2023-11-23 18:52:53,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374700 2023-11-23 18:53:01,289 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 18:53:15,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2498060.0, ans=0.125 2023-11-23 18:53:22,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-23 18:53:43,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2498193.3333333335, ans=0.0 2023-11-23 18:53:45,425 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2000, loss[loss=0.05584, simple_loss=0.0788, pruned_loss=0.008468, audio_tagging_loss=0.007974, over 15824.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09212, pruned_loss=0.01392, audio_tagging_loss=0.008976, over 3035339.95 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:53:48,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2498260.0, ans=0.2 2023-11-23 18:53:56,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374750 2023-11-23 18:53:57,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2498326.6666666665, ans=0.125 2023-11-23 18:54:35,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.441e+01 9.066e+01 9.690e+01 1.160e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-23 18:54:48,065 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2050, loss[loss=0.0685, simple_loss=0.09873, pruned_loss=0.01452, audio_tagging_loss=0.004613, over 15118.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09144, pruned_loss=0.01377, audio_tagging_loss=0.008996, over 3028501.12 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:54:48,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2498593.3333333335, ans=0.125 2023-11-23 18:54:54,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2498593.3333333335, ans=0.09899494936611666 2023-11-23 18:54:57,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2023-11-23 18:54:59,437 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374800 2023-11-23 18:54:59,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2498660.0, ans=0.0 2023-11-23 18:55:33,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2498793.3333333335, ans=0.0 2023-11-23 18:55:50,645 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2100, loss[loss=0.06116, simple_loss=0.07378, pruned_loss=0.01289, audio_tagging_loss=0.01138, over 16591.00 frames. ], tot_loss[loss=0.06877, simple_loss=0.09189, pruned_loss=0.01388, audio_tagging_loss=0.008941, over 3028610.48 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:56:01,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374850 2023-11-23 18:56:29,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2499126.6666666665, ans=0.125 2023-11-23 18:56:39,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.622e+01 9.048e+01 9.766e+01 1.225e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 18:56:41,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2023-11-23 18:56:52,559 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2150, loss[loss=0.06857, simple_loss=0.09328, pruned_loss=0.01267, audio_tagging_loss=0.009254, over 15369.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09231, pruned_loss=0.01405, audio_tagging_loss=0.008972, over 3032231.95 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:57:04,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374900 2023-11-23 18:57:14,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2499326.6666666665, ans=0.2 2023-11-23 18:57:15,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2499326.6666666665, ans=0.125 2023-11-23 18:57:21,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2499393.3333333335, ans=0.125 2023-11-23 18:57:24,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2499393.3333333335, ans=0.1 2023-11-23 18:57:28,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2499460.0, ans=0.0 2023-11-23 18:57:29,789 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 18:57:46,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2499526.6666666665, ans=0.125 2023-11-23 18:57:54,951 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2200, loss[loss=0.08036, simple_loss=0.1195, pruned_loss=0.01284, audio_tagging_loss=0.007771, over 15411.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09252, pruned_loss=0.01394, audio_tagging_loss=0.00898, over 3032967.94 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:58:00,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2023-11-23 18:58:02,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2499593.3333333335, ans=0.0 2023-11-23 18:58:06,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 374950 2023-11-23 18:58:13,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-23 18:58:14,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2499660.0, ans=0.125 2023-11-23 18:58:37,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2499793.3333333335, ans=0.0 2023-11-23 18:58:44,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.395e+01 9.074e+01 9.645e+01 1.441e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 18:58:49,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2499860.0, ans=0.07 2023-11-23 18:58:55,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2499926.6666666665, ans=0.1 2023-11-23 18:58:56,706 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2250, loss[loss=0.08249, simple_loss=0.1068, pruned_loss=0.02059, audio_tagging_loss=0.008475, over 14405.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09283, pruned_loss=0.01398, audio_tagging_loss=0.008959, over 3033920.82 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 18:59:00,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-11-23 18:59:01,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2499926.6666666665, ans=0.125 2023-11-23 18:59:07,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375000 2023-11-23 18:59:36,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2500126.6666666665, ans=0.125 2023-11-23 18:59:38,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2500126.6666666665, ans=0.0 2023-11-23 18:59:41,673 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 18:59:58,407 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2300, loss[loss=0.08159, simple_loss=0.114, pruned_loss=0.01841, audio_tagging_loss=0.006157, over 15572.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.09271, pruned_loss=0.01392, audio_tagging_loss=0.009035, over 3043654.18 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:00:10,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375050 2023-11-23 19:00:11,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2500326.6666666665, ans=0.1 2023-11-23 19:00:11,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2500326.6666666665, ans=0.2 2023-11-23 19:00:21,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2500326.6666666665, ans=0.0 2023-11-23 19:00:28,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2500393.3333333335, ans=0.2 2023-11-23 19:00:40,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2500460.0, ans=0.2 2023-11-23 19:00:44,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2500460.0, ans=0.125 2023-11-23 19:00:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2500460.0, ans=0.125 2023-11-23 19:00:47,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.536e+01 8.606e+01 9.166e+01 9.861e+01 1.207e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 19:00:52,507 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:01:00,770 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2350, loss[loss=0.08413, simple_loss=0.1188, pruned_loss=0.01891, audio_tagging_loss=0.005834, over 15501.00 frames. ], tot_loss[loss=0.06968, simple_loss=0.09292, pruned_loss=0.01408, audio_tagging_loss=0.009143, over 3048965.60 frames. ], batch size: 60, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:01:07,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2500593.3333333335, ans=0.1 2023-11-23 19:01:12,038 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375100 2023-11-23 19:01:14,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2500660.0, ans=0.125 2023-11-23 19:01:16,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2500660.0, ans=0.125 2023-11-23 19:01:43,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2500793.3333333335, ans=0.125 2023-11-23 19:01:58,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2500860.0, ans=0.0 2023-11-23 19:02:03,150 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2400, loss[loss=0.09584, simple_loss=0.1347, pruned_loss=0.02058, audio_tagging_loss=0.007926, over 15005.00 frames. ], tot_loss[loss=0.06984, simple_loss=0.09299, pruned_loss=0.01417, audio_tagging_loss=0.009172, over 3050280.46 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:02:13,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375150 2023-11-23 19:02:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2500993.3333333335, ans=0.125 2023-11-23 19:02:21,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2500993.3333333335, ans=0.125 2023-11-23 19:02:30,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-23 19:02:38,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2023-11-23 19:02:41,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2501126.6666666665, ans=0.0 2023-11-23 19:02:53,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2501193.3333333335, ans=0.125 2023-11-23 19:02:54,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.466e+01 8.908e+01 9.614e+01 2.076e+02, threshold=1.782e+02, percent-clipped=1.0 2023-11-23 19:02:58,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2501193.3333333335, ans=0.07 2023-11-23 19:03:05,680 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2450, loss[loss=0.09056, simple_loss=0.1318, pruned_loss=0.01856, audio_tagging_loss=0.006105, over 16156.00 frames. ], tot_loss[loss=0.06993, simple_loss=0.09319, pruned_loss=0.01412, audio_tagging_loss=0.009212, over 3046017.14 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:03:16,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375200 2023-11-23 19:03:17,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2501326.6666666665, ans=0.125 2023-11-23 19:03:53,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2023-11-23 19:03:55,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2501526.6666666665, ans=0.2 2023-11-23 19:03:56,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2023-11-23 19:03:58,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2023-11-23 19:04:07,613 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2500, loss[loss=0.0656, simple_loss=0.08351, pruned_loss=0.01378, audio_tagging_loss=0.01006, over 14726.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09296, pruned_loss=0.01396, audio_tagging_loss=0.009223, over 3048204.43 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:04:09,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2501593.3333333335, ans=0.125 2023-11-23 19:04:19,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375250 2023-11-23 19:04:20,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2501660.0, ans=0.125 2023-11-23 19:04:37,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2501726.6666666665, ans=0.2 2023-11-23 19:04:57,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2501860.0, ans=0.0 2023-11-23 19:05:01,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.437e+01 9.142e+01 9.814e+01 1.303e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 19:05:10,379 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2550, loss[loss=0.07753, simple_loss=0.1044, pruned_loss=0.01563, audio_tagging_loss=0.009675, over 15245.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09251, pruned_loss=0.01385, audio_tagging_loss=0.009152, over 3041806.06 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 8.0 2023-11-23 19:05:21,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375300 2023-11-23 19:06:08,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2502193.3333333335, ans=0.1 2023-11-23 19:06:12,335 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2600, loss[loss=0.06776, simple_loss=0.09713, pruned_loss=0.01121, audio_tagging_loss=0.007983, over 15018.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09158, pruned_loss=0.01359, audio_tagging_loss=0.009028, over 3041161.66 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 8.0 2023-11-23 19:06:23,674 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375350 2023-11-23 19:06:23,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2502326.6666666665, ans=0.125 2023-11-23 19:06:36,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2502393.3333333335, ans=0.0 2023-11-23 19:07:06,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.376e+01 8.898e+01 9.967e+01 2.098e+02, threshold=1.780e+02, percent-clipped=2.0 2023-11-23 19:07:07,128 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:07:13,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2502526.6666666665, ans=0.07 2023-11-23 19:07:15,731 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2650, loss[loss=0.04271, simple_loss=0.0522, pruned_loss=0.006943, audio_tagging_loss=0.009669, over 15500.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09116, pruned_loss=0.01353, audio_tagging_loss=0.009071, over 3041132.71 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 8.0 2023-11-23 19:07:26,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375400 2023-11-23 19:07:39,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2502726.6666666665, ans=0.5 2023-11-23 19:07:44,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2502726.6666666665, ans=0.125 2023-11-23 19:07:46,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2502726.6666666665, ans=0.5 2023-11-23 19:08:02,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2502793.3333333335, ans=0.04949747468305833 2023-11-23 19:08:06,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2502860.0, ans=0.0 2023-11-23 19:08:15,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2502860.0, ans=0.125 2023-11-23 19:08:17,543 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2700, loss[loss=0.04115, simple_loss=0.05267, pruned_loss=0.006735, audio_tagging_loss=0.008077, over 14804.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09201, pruned_loss=0.01368, audio_tagging_loss=0.008973, over 3040757.83 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 8.0 2023-11-23 19:08:25,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2502926.6666666665, ans=0.2 2023-11-23 19:08:28,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375450 2023-11-23 19:08:51,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2503060.0, ans=0.125 2023-11-23 19:08:52,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2503060.0, ans=0.05 2023-11-23 19:08:52,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2503060.0, ans=0.0 2023-11-23 19:09:04,475 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:09:07,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2503193.3333333335, ans=0.1 2023-11-23 19:09:10,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.867e+01 8.343e+01 9.045e+01 9.938e+01 1.315e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 19:09:18,821 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2750, loss[loss=0.07394, simple_loss=0.09714, pruned_loss=0.01645, audio_tagging_loss=0.008924, over 15228.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09207, pruned_loss=0.01364, audio_tagging_loss=0.008915, over 3045753.76 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 8.0 2023-11-23 19:09:30,067 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375500 2023-11-23 19:10:01,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2503460.0, ans=0.0 2023-11-23 19:10:11,504 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:10:13,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-23 19:10:14,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2503526.6666666665, ans=0.1 2023-11-23 19:10:16,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2503526.6666666665, ans=0.125 2023-11-23 19:10:19,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2503593.3333333335, ans=0.1 2023-11-23 19:10:20,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2503593.3333333335, ans=0.0 2023-11-23 19:10:20,905 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2800, loss[loss=0.04537, simple_loss=0.04579, pruned_loss=0.009119, audio_tagging_loss=0.01336, over 14665.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.09108, pruned_loss=0.01362, audio_tagging_loss=0.009001, over 3050825.78 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:10:32,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375550 2023-11-23 19:10:38,827 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:10:48,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2503726.6666666665, ans=0.0 2023-11-23 19:11:02,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2503793.3333333335, ans=0.0 2023-11-23 19:11:05,794 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:11:14,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.662e+01 8.252e+01 8.866e+01 9.555e+01 1.297e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 19:11:16,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2503860.0, ans=0.0 2023-11-23 19:11:22,775 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2850, loss[loss=0.0724, simple_loss=0.1009, pruned_loss=0.01441, audio_tagging_loss=0.007552, over 15155.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09127, pruned_loss=0.01357, audio_tagging_loss=0.008809, over 3046527.95 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:11:24,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2503926.6666666665, ans=0.1 2023-11-23 19:11:27,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2503926.6666666665, ans=0.125 2023-11-23 19:11:34,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375600 2023-11-23 19:11:34,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-23 19:11:36,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2503993.3333333335, ans=0.07 2023-11-23 19:12:05,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-23 19:12:26,389 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2900, loss[loss=0.07342, simple_loss=0.09823, pruned_loss=0.01611, audio_tagging_loss=0.008202, over 16659.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09259, pruned_loss=0.01378, audio_tagging_loss=0.008841, over 3054314.17 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:12:29,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=22.5 2023-11-23 19:12:35,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2504260.0, ans=0.0 2023-11-23 19:12:36,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2504260.0, ans=0.2 2023-11-23 19:12:37,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-23 19:12:37,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375650 2023-11-23 19:13:06,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2504460.0, ans=0.04949747468305833 2023-11-23 19:13:09,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2504460.0, ans=0.1 2023-11-23 19:13:20,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.540e+01 9.133e+01 9.789e+01 1.546e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-23 19:13:20,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2504526.6666666665, ans=0.1 2023-11-23 19:13:25,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2504526.6666666665, ans=0.125 2023-11-23 19:13:28,622 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 2950, loss[loss=0.06826, simple_loss=0.08202, pruned_loss=0.01478, audio_tagging_loss=0.01248, over 14634.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09249, pruned_loss=0.01377, audio_tagging_loss=0.008935, over 3053230.83 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:13:36,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2504593.3333333335, ans=0.2 2023-11-23 19:13:39,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2504593.3333333335, ans=0.1 2023-11-23 19:13:40,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375700 2023-11-23 19:13:51,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2504660.0, ans=0.05 2023-11-23 19:14:04,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2504726.6666666665, ans=0.0 2023-11-23 19:14:16,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2504793.3333333335, ans=0.125 2023-11-23 19:14:31,238 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3000, loss[loss=0.06227, simple_loss=0.08581, pruned_loss=0.01221, audio_tagging_loss=0.007153, over 15954.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09277, pruned_loss=0.01384, audio_tagging_loss=0.008947, over 3047378.83 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:14:31,239 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 19:14:50,850 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4121, 1.9518, 3.1139, 3.2333, 3.0190, 2.9838, 2.8834, 3.1521], device='cuda:1') 2023-11-23 19:14:56,260 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7995, 4.9475, 5.0934, 4.8896], device='cuda:1') 2023-11-23 19:15:09,801 INFO [train_asr.py:1253] (1/4) Epoch 32, validation: loss=0.05818, simple_loss=0.05104, pruned_loss=0.005158, audio_tagging_loss=0.0275, over 4681554.00 frames. 2023-11-23 19:15:09,802 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 19:15:14,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2023-11-23 19:15:21,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375750 2023-11-23 19:15:42,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2505060.0, ans=0.125 2023-11-23 19:15:47,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2505126.6666666665, ans=0.0 2023-11-23 19:15:54,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2023-11-23 19:16:03,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.656e+01 9.141e+01 9.887e+01 1.235e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 19:16:11,777 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3050, loss[loss=0.06845, simple_loss=0.08522, pruned_loss=0.01699, audio_tagging_loss=0.00885, over 15130.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09209, pruned_loss=0.01387, audio_tagging_loss=0.009059, over 3047493.17 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:16:23,131 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375800 2023-11-23 19:16:23,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2505326.6666666665, ans=0.2 2023-11-23 19:16:31,282 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:16:33,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-23 19:16:35,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2505393.3333333335, ans=0.0 2023-11-23 19:16:43,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2505393.3333333335, ans=0.0 2023-11-23 19:16:48,175 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:16:48,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2505460.0, ans=0.2 2023-11-23 19:16:49,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-23 19:17:10,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-11-23 19:17:13,713 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3100, loss[loss=0.07052, simple_loss=0.09752, pruned_loss=0.01275, audio_tagging_loss=0.009004, over 15880.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09118, pruned_loss=0.01356, audio_tagging_loss=0.009132, over 3047555.64 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:17:19,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2505593.3333333335, ans=0.015 2023-11-23 19:17:25,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375850 2023-11-23 19:17:25,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2505660.0, ans=0.125 2023-11-23 19:18:06,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2505860.0, ans=0.125 2023-11-23 19:18:07,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.481e+01 9.058e+01 9.503e+01 1.492e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-23 19:18:15,790 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3150, loss[loss=0.06585, simple_loss=0.08355, pruned_loss=0.01338, audio_tagging_loss=0.0107, over 15247.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.0914, pruned_loss=0.01364, audio_tagging_loss=0.009259, over 3049008.33 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:18:17,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-23 19:18:27,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375900 2023-11-23 19:18:27,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2505993.3333333335, ans=0.125 2023-11-23 19:18:30,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2505993.3333333335, ans=0.1 2023-11-23 19:18:36,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2505993.3333333335, ans=0.125 2023-11-23 19:18:43,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-23 19:19:06,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-23 19:19:14,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-11-23 19:19:18,399 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3200, loss[loss=0.07618, simple_loss=0.09278, pruned_loss=0.01846, audio_tagging_loss=0.01133, over 14712.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09142, pruned_loss=0.01357, audio_tagging_loss=0.00932, over 3048881.85 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:19:27,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2023-11-23 19:19:29,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 375950 2023-11-23 19:19:32,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2506326.6666666665, ans=0.0 2023-11-23 19:20:03,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2506460.0, ans=0.1 2023-11-23 19:20:05,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2506460.0, ans=0.0 2023-11-23 19:20:13,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.152e+01 8.777e+01 9.495e+01 2.540e+02, threshold=1.755e+02, percent-clipped=1.0 2023-11-23 19:20:20,384 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3250, loss[loss=0.08672, simple_loss=0.1066, pruned_loss=0.02403, audio_tagging_loss=0.009365, over 15199.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09127, pruned_loss=0.0134, audio_tagging_loss=0.009406, over 3046796.46 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:20:27,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2506593.3333333335, ans=0.95 2023-11-23 19:20:31,839 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376000 2023-11-23 19:20:38,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-23 19:20:43,633 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:20:49,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2506726.6666666665, ans=0.125 2023-11-23 19:20:57,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2506726.6666666665, ans=0.0 2023-11-23 19:21:00,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2506793.3333333335, ans=0.125 2023-11-23 19:21:26,143 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3300, loss[loss=0.07237, simple_loss=0.09936, pruned_loss=0.01337, audio_tagging_loss=0.009324, over 15712.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09131, pruned_loss=0.01338, audio_tagging_loss=0.009408, over 3049844.55 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:21:37,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376050 2023-11-23 19:21:47,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2506993.3333333335, ans=0.0 2023-11-23 19:22:18,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2507193.3333333335, ans=0.125 2023-11-23 19:22:19,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2507193.3333333335, ans=0.2 2023-11-23 19:22:20,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.685e+01 9.326e+01 1.032e+02 1.222e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-23 19:22:21,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2507193.3333333335, ans=0.125 2023-11-23 19:22:24,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2507193.3333333335, ans=0.0 2023-11-23 19:22:28,027 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3350, loss[loss=0.07687, simple_loss=0.1131, pruned_loss=0.01447, audio_tagging_loss=0.005854, over 15707.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09067, pruned_loss=0.01325, audio_tagging_loss=0.009367, over 3053354.92 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:22:34,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2023-11-23 19:22:39,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376100 2023-11-23 19:23:00,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2023-11-23 19:23:06,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2507460.0, ans=0.2 2023-11-23 19:23:12,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2507460.0, ans=10.0 2023-11-23 19:23:23,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2023-11-23 19:23:26,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-11-23 19:23:30,752 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3400, loss[loss=0.05975, simple_loss=0.0749, pruned_loss=0.01252, audio_tagging_loss=0.009783, over 14841.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09048, pruned_loss=0.01331, audio_tagging_loss=0.009224, over 3042752.87 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:23:35,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2507593.3333333335, ans=0.125 2023-11-23 19:23:41,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376150 2023-11-23 19:23:47,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2507660.0, ans=0.0 2023-11-23 19:24:15,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2507793.3333333335, ans=0.0 2023-11-23 19:24:16,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2507793.3333333335, ans=0.95 2023-11-23 19:24:24,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.627e+01 8.340e+01 8.680e+01 9.645e+01 1.165e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-23 19:24:32,683 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3450, loss[loss=0.07038, simple_loss=0.09829, pruned_loss=0.01367, audio_tagging_loss=0.007555, over 15236.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09054, pruned_loss=0.01343, audio_tagging_loss=0.009118, over 3047232.43 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:24:43,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376200 2023-11-23 19:24:59,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2508060.0, ans=0.1 2023-11-23 19:24:59,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2508060.0, ans=0.125 2023-11-23 19:25:26,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2508193.3333333335, ans=0.125 2023-11-23 19:25:33,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2508193.3333333335, ans=0.125 2023-11-23 19:25:35,025 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3500, loss[loss=0.0707, simple_loss=0.09308, pruned_loss=0.01549, audio_tagging_loss=0.008673, over 14469.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09123, pruned_loss=0.01355, audio_tagging_loss=0.008982, over 3053476.23 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:25:37,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2508260.0, ans=0.2 2023-11-23 19:25:46,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376250 2023-11-23 19:25:50,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2508326.6666666665, ans=0.125 2023-11-23 19:26:06,583 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:26:09,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2508393.3333333335, ans=0.2 2023-11-23 19:26:09,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2508393.3333333335, ans=0.0 2023-11-23 19:26:21,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2508460.0, ans=0.0 2023-11-23 19:26:23,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-23 19:26:29,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.248e+01 9.005e+01 9.777e+01 1.238e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 19:26:38,185 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3550, loss[loss=0.07372, simple_loss=0.09925, pruned_loss=0.01478, audio_tagging_loss=0.009313, over 14124.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09163, pruned_loss=0.01362, audio_tagging_loss=0.008933, over 3054918.36 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:26:49,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376300 2023-11-23 19:26:59,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2508660.0, ans=0.04949747468305833 2023-11-23 19:27:13,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2508726.6666666665, ans=0.2 2023-11-23 19:27:17,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-23 19:27:22,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2508793.3333333335, ans=0.125 2023-11-23 19:27:22,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-23 19:27:28,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2508860.0, ans=0.125 2023-11-23 19:27:32,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2508860.0, ans=0.09899494936611666 2023-11-23 19:27:36,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2508860.0, ans=0.0 2023-11-23 19:27:40,976 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3600, loss[loss=0.05534, simple_loss=0.07349, pruned_loss=0.01006, audio_tagging_loss=0.008537, over 14251.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09139, pruned_loss=0.01351, audio_tagging_loss=0.008921, over 3059314.24 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:27:41,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2508926.6666666665, ans=0.0 2023-11-23 19:27:51,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376350 2023-11-23 19:27:56,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2508993.3333333335, ans=0.1 2023-11-23 19:27:57,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2023-11-23 19:28:13,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2509060.0, ans=0.125 2023-11-23 19:28:35,122 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.359e+01 9.090e+01 9.890e+01 1.396e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 19:28:41,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2509260.0, ans=0.0 2023-11-23 19:28:42,873 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3650, loss[loss=0.08086, simple_loss=0.111, pruned_loss=0.0171, audio_tagging_loss=0.008243, over 14237.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09163, pruned_loss=0.01351, audio_tagging_loss=0.008967, over 3051385.39 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:28:48,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2023-11-23 19:28:54,284 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376400 2023-11-23 19:28:54,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2509326.6666666665, ans=0.0 2023-11-23 19:29:00,738 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:29:05,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2509326.6666666665, ans=0.125 2023-11-23 19:29:44,986 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3700, loss[loss=0.07236, simple_loss=0.1037, pruned_loss=0.01264, audio_tagging_loss=0.007861, over 15240.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09324, pruned_loss=0.01388, audio_tagging_loss=0.008859, over 3055651.86 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:29:56,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376450 2023-11-23 19:30:08,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-23 19:30:11,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2509726.6666666665, ans=0.125 2023-11-23 19:30:16,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2509726.6666666665, ans=0.04949747468305833 2023-11-23 19:30:34,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2509860.0, ans=0.0 2023-11-23 19:30:40,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.395e+01 9.094e+01 9.680e+01 1.215e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-23 19:30:43,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2509860.0, ans=0.09899494936611666 2023-11-23 19:30:48,256 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3750, loss[loss=0.07884, simple_loss=0.1045, pruned_loss=0.01722, audio_tagging_loss=0.00938, over 15716.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.09322, pruned_loss=0.01384, audio_tagging_loss=0.008892, over 3063217.60 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:30:57,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2509926.6666666665, ans=0.0 2023-11-23 19:30:59,310 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376500 2023-11-23 19:31:14,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2510060.0, ans=0.125 2023-11-23 19:31:30,739 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:31:32,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2510126.6666666665, ans=0.125 2023-11-23 19:31:50,356 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3800, loss[loss=0.06291, simple_loss=0.0858, pruned_loss=0.01026, audio_tagging_loss=0.009752, over 14825.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09341, pruned_loss=0.01405, audio_tagging_loss=0.008955, over 3062230.45 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:31:52,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2023-11-23 19:31:53,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2023-11-23 19:31:55,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2510260.0, ans=0.1 2023-11-23 19:32:01,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376550 2023-11-23 19:32:26,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2023-11-23 19:32:30,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2510460.0, ans=0.5 2023-11-23 19:32:36,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2510460.0, ans=0.125 2023-11-23 19:32:45,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.658e+01 9.226e+01 1.003e+02 1.868e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-23 19:32:52,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2510593.3333333335, ans=0.125 2023-11-23 19:32:52,907 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3850, loss[loss=0.07603, simple_loss=0.1061, pruned_loss=0.01496, audio_tagging_loss=0.008002, over 15978.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.09347, pruned_loss=0.01412, audio_tagging_loss=0.009081, over 3066910.67 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:32:57,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2510593.3333333335, ans=0.0 2023-11-23 19:33:01,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2510593.3333333335, ans=0.1 2023-11-23 19:33:03,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376600 2023-11-23 19:33:12,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-23 19:33:35,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2510793.3333333335, ans=0.0 2023-11-23 19:33:54,687 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3900, loss[loss=0.07974, simple_loss=0.1049, pruned_loss=0.01554, audio_tagging_loss=0.01175, over 15503.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09405, pruned_loss=0.01407, audio_tagging_loss=0.0091, over 3055619.72 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:34:05,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376650 2023-11-23 19:34:17,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2510993.3333333335, ans=0.1 2023-11-23 19:34:49,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.305e+01 8.896e+01 9.680e+01 1.276e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 19:34:51,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2511193.3333333335, ans=15.0 2023-11-23 19:34:56,881 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 3950, loss[loss=0.05985, simple_loss=0.07679, pruned_loss=0.01201, audio_tagging_loss=0.009447, over 15689.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09282, pruned_loss=0.01378, audio_tagging_loss=0.009153, over 3048787.06 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:35:08,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376700 2023-11-23 19:35:15,099 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:35:29,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2511393.3333333335, ans=0.125 2023-11-23 19:35:31,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-11-23 19:35:42,002 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:35:52,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2511526.6666666665, ans=0.1 2023-11-23 19:35:59,032 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4000, loss[loss=0.08254, simple_loss=0.1149, pruned_loss=0.01728, audio_tagging_loss=0.007803, over 15746.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09319, pruned_loss=0.01378, audio_tagging_loss=0.009238, over 3048165.80 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:36:10,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376750 2023-11-23 19:36:18,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-23 19:36:19,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2511660.0, ans=0.125 2023-11-23 19:36:53,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-23 19:36:53,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.509e+01 8.978e+01 9.607e+01 1.233e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-23 19:37:00,842 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4050, loss[loss=0.08211, simple_loss=0.109, pruned_loss=0.01929, audio_tagging_loss=0.008304, over 15984.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09294, pruned_loss=0.01382, audio_tagging_loss=0.009297, over 3044850.43 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:37:03,263 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:37:06,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.61 vs. limit=10.0 2023-11-23 19:37:07,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2511926.6666666665, ans=0.125 2023-11-23 19:37:12,622 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376800 2023-11-23 19:37:12,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2511993.3333333335, ans=0.125 2023-11-23 19:37:40,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2512126.6666666665, ans=0.5 2023-11-23 19:37:53,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2512193.3333333335, ans=0.125 2023-11-23 19:38:03,251 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4100, loss[loss=0.07688, simple_loss=0.108, pruned_loss=0.01554, audio_tagging_loss=0.007318, over 15086.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09341, pruned_loss=0.01396, audio_tagging_loss=0.009288, over 3038301.57 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:38:05,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2512260.0, ans=0.0 2023-11-23 19:38:14,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376850 2023-11-23 19:38:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2512460.0, ans=0.0 2023-11-23 19:38:39,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2512460.0, ans=0.0 2023-11-23 19:38:42,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2512460.0, ans=0.1 2023-11-23 19:38:46,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2512460.0, ans=0.04949747468305833 2023-11-23 19:38:59,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.590e+01 9.258e+01 1.002e+02 1.554e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-23 19:39:05,549 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4150, loss[loss=0.05852, simple_loss=0.07451, pruned_loss=0.01072, audio_tagging_loss=0.01054, over 14910.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09337, pruned_loss=0.01394, audio_tagging_loss=0.009131, over 3044232.96 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:39:15,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2512593.3333333335, ans=0.2 2023-11-23 19:39:16,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376900 2023-11-23 19:39:19,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2512660.0, ans=0.1 2023-11-23 19:39:28,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-23 19:39:31,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-23 19:39:33,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-23 19:39:41,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2512793.3333333335, ans=0.2 2023-11-23 19:39:49,388 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:40:07,871 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4200, loss[loss=0.0514, simple_loss=0.06149, pruned_loss=0.008867, audio_tagging_loss=0.01179, over 16881.00 frames. ], tot_loss[loss=0.06953, simple_loss=0.0933, pruned_loss=0.01392, audio_tagging_loss=0.008961, over 3049999.97 frames. ], batch size: 64, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:40:19,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 376950 2023-11-23 19:40:19,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2512993.3333333335, ans=0.125 2023-11-23 19:40:35,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-23 19:40:38,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2513060.0, ans=0.125 2023-11-23 19:40:41,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.76 vs. limit=10.0 2023-11-23 19:40:54,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2513126.6666666665, ans=0.125 2023-11-23 19:41:03,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2513193.3333333335, ans=0.0 2023-11-23 19:41:03,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2513193.3333333335, ans=0.0 2023-11-23 19:41:04,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.429e+01 8.916e+01 9.542e+01 1.145e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 19:41:10,207 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4250, loss[loss=0.05881, simple_loss=0.0741, pruned_loss=0.0109, audio_tagging_loss=0.01086, over 14866.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09364, pruned_loss=0.01393, audio_tagging_loss=0.008903, over 3049066.25 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:41:22,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377000 2023-11-23 19:41:29,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2513326.6666666665, ans=0.1 2023-11-23 19:41:35,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2513393.3333333335, ans=0.2 2023-11-23 19:41:50,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2513460.0, ans=0.125 2023-11-23 19:41:57,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-23 19:42:09,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2513526.6666666665, ans=0.125 2023-11-23 19:42:13,220 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4300, loss[loss=0.1014, simple_loss=0.1343, pruned_loss=0.02771, audio_tagging_loss=0.006511, over 16468.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.09402, pruned_loss=0.01407, audio_tagging_loss=0.008786, over 3048192.00 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:42:13,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2513593.3333333335, ans=0.125 2023-11-23 19:42:24,750 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377050 2023-11-23 19:42:46,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2513726.6666666665, ans=10.0 2023-11-23 19:43:09,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.254e+01 8.867e+01 9.585e+01 1.161e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 19:43:15,452 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4350, loss[loss=0.05726, simple_loss=0.08294, pruned_loss=0.006685, audio_tagging_loss=0.009105, over 15455.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.09471, pruned_loss=0.01404, audio_tagging_loss=0.008709, over 3044804.46 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2023-11-23 19:43:26,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377100 2023-11-23 19:43:31,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2513993.3333333335, ans=0.04949747468305833 2023-11-23 19:43:44,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2514060.0, ans=0.125 2023-11-23 19:43:53,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2514126.6666666665, ans=0.125 2023-11-23 19:44:00,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2514126.6666666665, ans=0.125 2023-11-23 19:44:01,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2514126.6666666665, ans=0.125 2023-11-23 19:44:17,157 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4400, loss[loss=0.06243, simple_loss=0.08485, pruned_loss=0.0108, audio_tagging_loss=0.009204, over 14879.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09457, pruned_loss=0.0141, audio_tagging_loss=0.008705, over 3053478.93 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:44:28,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377150 2023-11-23 19:44:31,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2514326.6666666665, ans=0.0 2023-11-23 19:44:37,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2514326.6666666665, ans=0.0 2023-11-23 19:45:13,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.470e+01 9.052e+01 9.834e+01 1.276e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 19:45:20,342 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4450, loss[loss=0.07138, simple_loss=0.09756, pruned_loss=0.01296, audio_tagging_loss=0.009638, over 15223.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09438, pruned_loss=0.0142, audio_tagging_loss=0.008674, over 3055137.37 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:45:22,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2023-11-23 19:45:31,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377200 2023-11-23 19:45:41,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2514660.0, ans=0.125 2023-11-23 19:46:00,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2514793.3333333335, ans=0.2 2023-11-23 19:46:10,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-23 19:46:22,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2514926.6666666665, ans=0.0 2023-11-23 19:46:23,427 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4500, loss[loss=0.06157, simple_loss=0.08872, pruned_loss=0.009475, audio_tagging_loss=0.007736, over 15059.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09358, pruned_loss=0.01394, audio_tagging_loss=0.008724, over 3055393.76 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2023-11-23 19:46:34,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377250 2023-11-23 19:46:37,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-11-23 19:47:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2515126.6666666665, ans=0.2 2023-11-23 19:47:15,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2515193.3333333335, ans=0.2 2023-11-23 19:47:18,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.414e+01 9.190e+01 9.821e+01 1.226e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-23 19:47:25,478 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4550, loss[loss=0.09186, simple_loss=0.131, pruned_loss=0.01891, audio_tagging_loss=0.007455, over 15657.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09321, pruned_loss=0.01387, audio_tagging_loss=0.008756, over 3056304.49 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:47:29,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2515260.0, ans=0.0 2023-11-23 19:47:34,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-23 19:47:36,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377300 2023-11-23 19:47:42,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2515326.6666666665, ans=0.1 2023-11-23 19:47:44,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2515326.6666666665, ans=10.0 2023-11-23 19:47:44,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2515326.6666666665, ans=0.0 2023-11-23 19:47:48,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2515326.6666666665, ans=0.125 2023-11-23 19:47:49,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2515393.3333333335, ans=0.125 2023-11-23 19:48:02,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-23 19:48:11,895 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 19:48:27,999 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4600, loss[loss=0.05928, simple_loss=0.08616, pruned_loss=0.009265, audio_tagging_loss=0.006934, over 14940.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09248, pruned_loss=0.01396, audio_tagging_loss=0.008967, over 3051040.01 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:48:28,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2515593.3333333335, ans=0.0 2023-11-23 19:48:29,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2515593.3333333335, ans=0.0 2023-11-23 19:48:39,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377350 2023-11-23 19:48:45,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2515660.0, ans=0.125 2023-11-23 19:48:52,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-23 19:48:59,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2515726.6666666665, ans=0.125 2023-11-23 19:49:08,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-11-23 19:49:11,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2515793.3333333335, ans=0.125 2023-11-23 19:49:23,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.494e+01 9.080e+01 9.829e+01 1.199e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-23 19:49:27,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2515860.0, ans=0.0 2023-11-23 19:49:30,636 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4650, loss[loss=0.08746, simple_loss=0.118, pruned_loss=0.01779, audio_tagging_loss=0.01068, over 17772.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09154, pruned_loss=0.01388, audio_tagging_loss=0.009092, over 3056186.84 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:49:33,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2515926.6666666665, ans=0.0 2023-11-23 19:49:41,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377400 2023-11-23 19:49:51,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2515993.3333333335, ans=15.0 2023-11-23 19:50:11,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2516126.6666666665, ans=0.0 2023-11-23 19:50:33,075 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4700, loss[loss=0.06873, simple_loss=0.08432, pruned_loss=0.01899, audio_tagging_loss=0.00758, over 15289.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09152, pruned_loss=0.01389, audio_tagging_loss=0.009073, over 3052565.26 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:50:42,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-23 19:50:43,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377450 2023-11-23 19:50:44,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-23 19:51:08,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2516460.0, ans=0.0 2023-11-23 19:51:28,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.521e+01 9.088e+01 9.578e+01 1.245e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 19:51:29,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2516526.6666666665, ans=0.0 2023-11-23 19:51:34,068 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4750, loss[loss=0.06867, simple_loss=0.09086, pruned_loss=0.01269, audio_tagging_loss=0.01054, over 14872.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09259, pruned_loss=0.01405, audio_tagging_loss=0.009157, over 3051896.37 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:51:45,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377500 2023-11-23 19:51:46,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2516660.0, ans=0.125 2023-11-23 19:51:49,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2516660.0, ans=0.125 2023-11-23 19:51:54,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-11-23 19:51:56,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2516660.0, ans=0.125 2023-11-23 19:51:58,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-23 19:52:09,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2516726.6666666665, ans=0.0 2023-11-23 19:52:28,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2516860.0, ans=0.125 2023-11-23 19:52:28,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2516860.0, ans=0.1 2023-11-23 19:52:36,085 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4800, loss[loss=0.07037, simple_loss=0.09953, pruned_loss=0.01223, audio_tagging_loss=0.008379, over 14518.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09146, pruned_loss=0.0138, audio_tagging_loss=0.009391, over 3042700.40 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 19:52:47,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377550 2023-11-23 19:53:00,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2517060.0, ans=0.0 2023-11-23 19:53:06,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-23 19:53:08,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-23 19:53:09,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-23 19:53:17,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-11-23 19:53:22,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2517126.6666666665, ans=0.1 2023-11-23 19:53:24,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2023-11-23 19:53:27,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2517193.3333333335, ans=0.0 2023-11-23 19:53:33,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.424e+01 8.895e+01 9.657e+01 1.306e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 19:53:36,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2517193.3333333335, ans=0.125 2023-11-23 19:53:38,424 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4850, loss[loss=0.06207, simple_loss=0.08017, pruned_loss=0.01271, audio_tagging_loss=0.009277, over 14659.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09175, pruned_loss=0.01386, audio_tagging_loss=0.009374, over 3041172.27 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:53:45,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2517260.0, ans=0.1 2023-11-23 19:53:45,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2517260.0, ans=0.125 2023-11-23 19:53:46,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2517260.0, ans=0.0 2023-11-23 19:53:48,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-23 19:53:49,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377600 2023-11-23 19:53:51,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2517326.6666666665, ans=0.0 2023-11-23 19:54:12,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-23 19:54:19,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-23 19:54:33,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2517526.6666666665, ans=0.0 2023-11-23 19:54:40,500 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4900, loss[loss=0.07253, simple_loss=0.09309, pruned_loss=0.01308, audio_tagging_loss=0.01291, over 14786.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09183, pruned_loss=0.01385, audio_tagging_loss=0.00933, over 3041100.68 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:54:51,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377650 2023-11-23 19:54:56,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2517660.0, ans=0.125 2023-11-23 19:55:09,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2517726.6666666665, ans=0.0 2023-11-23 19:55:15,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2517726.6666666665, ans=0.0 2023-11-23 19:55:31,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2517860.0, ans=0.125 2023-11-23 19:55:32,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2517860.0, ans=0.125 2023-11-23 19:55:38,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.293e+01 8.753e+01 9.560e+01 1.963e+02, threshold=1.751e+02, percent-clipped=1.0 2023-11-23 19:55:43,392 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 4950, loss[loss=0.05466, simple_loss=0.07124, pruned_loss=0.0104, audio_tagging_loss=0.008638, over 15055.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09259, pruned_loss=0.01394, audio_tagging_loss=0.009166, over 3039271.33 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:55:54,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377700 2023-11-23 19:56:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2517993.3333333335, ans=0.2 2023-11-23 19:56:06,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2517993.3333333335, ans=0.0 2023-11-23 19:56:13,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-23 19:56:24,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2518126.6666666665, ans=0.2 2023-11-23 19:56:37,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2518193.3333333335, ans=0.1 2023-11-23 19:56:42,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2518193.3333333335, ans=0.125 2023-11-23 19:56:45,962 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5000, loss[loss=0.0905, simple_loss=0.1292, pruned_loss=0.01898, audio_tagging_loss=0.006928, over 15182.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.0915, pruned_loss=0.01367, audio_tagging_loss=0.009083, over 3051207.72 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:56:58,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377750 2023-11-23 19:57:01,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2518326.6666666665, ans=0.125 2023-11-23 19:57:20,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2518393.3333333335, ans=0.125 2023-11-23 19:57:35,065 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 19:57:36,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2518526.6666666665, ans=0.125 2023-11-23 19:57:43,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.445e+01 8.514e+01 9.045e+01 9.813e+01 1.165e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 19:57:48,790 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5050, loss[loss=0.05713, simple_loss=0.07648, pruned_loss=0.007676, audio_tagging_loss=0.01122, over 14889.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.0911, pruned_loss=0.01363, audio_tagging_loss=0.00911, over 3048703.26 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:58:00,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377800 2023-11-23 19:58:43,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2518860.0, ans=0.2 2023-11-23 19:58:50,375 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5100, loss[loss=0.07514, simple_loss=0.09974, pruned_loss=0.01856, audio_tagging_loss=0.006711, over 14265.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09221, pruned_loss=0.01394, audio_tagging_loss=0.009002, over 3044619.01 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 19:58:55,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2518926.6666666665, ans=0.125 2023-11-23 19:59:01,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377850 2023-11-23 19:59:05,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2518993.3333333335, ans=0.125 2023-11-23 19:59:16,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2519060.0, ans=0.1 2023-11-23 19:59:28,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2023-11-23 19:59:29,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2519126.6666666665, ans=0.0 2023-11-23 19:59:42,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2519193.3333333335, ans=0.125 2023-11-23 19:59:43,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2519193.3333333335, ans=10.0 2023-11-23 19:59:47,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.550e+01 9.235e+01 1.014e+02 1.258e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-23 19:59:52,258 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5150, loss[loss=0.06429, simple_loss=0.08579, pruned_loss=0.01123, audio_tagging_loss=0.01016, over 14933.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09174, pruned_loss=0.01383, audio_tagging_loss=0.008985, over 3041493.03 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:00:03,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377900 2023-11-23 20:00:03,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2023-11-23 20:00:06,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2519326.6666666665, ans=0.125 2023-11-23 20:00:33,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2519460.0, ans=0.1 2023-11-23 20:00:34,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.85 vs. limit=10.0 2023-11-23 20:00:34,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2519460.0, ans=0.2 2023-11-23 20:00:47,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2519526.6666666665, ans=0.125 2023-11-23 20:00:54,658 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5200, loss[loss=0.05596, simple_loss=0.06935, pruned_loss=0.01194, audio_tagging_loss=0.009344, over 15538.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09161, pruned_loss=0.0138, audio_tagging_loss=0.008966, over 3035435.78 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:00:59,703 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 20:01:05,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 377950 2023-11-23 20:01:18,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-23 20:01:31,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2519793.3333333335, ans=0.125 2023-11-23 20:01:32,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=22.5 2023-11-23 20:01:51,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.438e+01 9.195e+01 9.999e+01 1.195e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-23 20:01:52,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2519860.0, ans=0.2 2023-11-23 20:01:56,763 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5250, loss[loss=0.04313, simple_loss=0.04588, pruned_loss=0.007334, audio_tagging_loss=0.01286, over 15359.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09231, pruned_loss=0.01386, audio_tagging_loss=0.008905, over 3040183.36 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:01:58,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2519926.6666666665, ans=0.125 2023-11-23 20:02:01,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2519926.6666666665, ans=0.0 2023-11-23 20:02:07,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378000 2023-11-23 20:02:08,158 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 20:02:16,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2519993.3333333335, ans=0.125 2023-11-23 20:02:22,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-23 20:02:23,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2520060.0, ans=0.125 2023-11-23 20:02:34,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=22.5 2023-11-23 20:02:40,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2520126.6666666665, ans=0.0 2023-11-23 20:02:59,107 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5300, loss[loss=0.06824, simple_loss=0.09908, pruned_loss=0.011, audio_tagging_loss=0.0077, over 14860.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.0924, pruned_loss=0.01384, audio_tagging_loss=0.008788, over 3040778.75 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:03:10,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378050 2023-11-23 20:03:42,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2520460.0, ans=0.1 2023-11-23 20:03:56,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.458e+01 9.027e+01 9.985e+01 1.662e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 20:03:57,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2520526.6666666665, ans=0.125 2023-11-23 20:04:01,553 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5350, loss[loss=0.07927, simple_loss=0.1025, pruned_loss=0.01663, audio_tagging_loss=0.01139, over 14729.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09289, pruned_loss=0.01391, audio_tagging_loss=0.008861, over 3042930.71 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:04:13,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378100 2023-11-23 20:04:17,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-23 20:04:27,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-23 20:04:40,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2520793.3333333335, ans=0.125 2023-11-23 20:04:57,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2520860.0, ans=0.125 2023-11-23 20:05:04,833 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5400, loss[loss=0.04123, simple_loss=0.04948, pruned_loss=0.004578, audio_tagging_loss=0.01192, over 13898.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09232, pruned_loss=0.01379, audio_tagging_loss=0.008961, over 3032910.03 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:05:08,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2520926.6666666665, ans=0.1 2023-11-23 20:05:15,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378150 2023-11-23 20:05:17,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2520993.3333333335, ans=0.125 2023-11-23 20:05:28,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2521060.0, ans=0.0 2023-11-23 20:05:52,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2521126.6666666665, ans=0.125 2023-11-23 20:06:01,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.400e+01 9.032e+01 9.850e+01 1.216e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-23 20:06:05,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2521260.0, ans=0.125 2023-11-23 20:06:06,745 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5450, loss[loss=0.09328, simple_loss=0.1367, pruned_loss=0.02043, audio_tagging_loss=0.004525, over 15633.00 frames. ], tot_loss[loss=0.06882, simple_loss=0.09227, pruned_loss=0.01374, audio_tagging_loss=0.008942, over 3039410.35 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:06:17,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378200 2023-11-23 20:06:23,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2521326.6666666665, ans=0.0 2023-11-23 20:06:27,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2023-11-23 20:06:53,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2521460.0, ans=0.1 2023-11-23 20:07:08,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2521593.3333333335, ans=0.125 2023-11-23 20:07:09,666 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5500, loss[loss=0.06325, simple_loss=0.08252, pruned_loss=0.0113, audio_tagging_loss=0.0107, over 14772.00 frames. ], tot_loss[loss=0.06879, simple_loss=0.09223, pruned_loss=0.01368, audio_tagging_loss=0.009, over 3045792.33 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:07:20,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378250 2023-11-23 20:07:21,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2521660.0, ans=0.0 2023-11-23 20:07:22,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2023-11-23 20:07:41,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2521726.6666666665, ans=0.0 2023-11-23 20:07:47,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2521793.3333333335, ans=0.0 2023-11-23 20:07:59,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2521860.0, ans=0.0 2023-11-23 20:08:06,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.320e+01 8.896e+01 9.871e+01 1.353e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 20:08:12,055 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5550, loss[loss=0.06508, simple_loss=0.08528, pruned_loss=0.01286, audio_tagging_loss=0.009582, over 14687.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09259, pruned_loss=0.01364, audio_tagging_loss=0.0091, over 3051106.30 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:08:23,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378300 2023-11-23 20:08:43,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2522060.0, ans=0.125 2023-11-23 20:08:49,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2522126.6666666665, ans=0.0 2023-11-23 20:09:13,829 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5600, loss[loss=0.07493, simple_loss=0.111, pruned_loss=0.008605, audio_tagging_loss=0.01084, over 15381.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09208, pruned_loss=0.01343, audio_tagging_loss=0.009221, over 3049778.31 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:09:25,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378350 2023-11-23 20:09:57,319 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 20:09:59,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2522460.0, ans=0.0 2023-11-23 20:10:11,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.330e+01 9.147e+01 9.848e+01 1.519e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-23 20:10:15,354 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5650, loss[loss=0.0696, simple_loss=0.09037, pruned_loss=0.01255, audio_tagging_loss=0.01186, over 15082.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09223, pruned_loss=0.01368, audio_tagging_loss=0.009336, over 3053095.02 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:10:26,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378400 2023-11-23 20:10:26,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2522660.0, ans=0.1 2023-11-23 20:10:28,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2023-11-23 20:10:32,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2522660.0, ans=0.0 2023-11-23 20:11:15,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2522860.0, ans=0.125 2023-11-23 20:11:17,840 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5700, loss[loss=0.07644, simple_loss=0.09465, pruned_loss=0.01833, audio_tagging_loss=0.01079, over 15561.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09229, pruned_loss=0.01382, audio_tagging_loss=0.009293, over 3049352.93 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:11:20,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2522926.6666666665, ans=0.025 2023-11-23 20:11:23,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2522926.6666666665, ans=0.0 2023-11-23 20:11:26,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2522926.6666666665, ans=0.0 2023-11-23 20:11:26,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2522926.6666666665, ans=0.0 2023-11-23 20:11:28,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378450 2023-11-23 20:11:45,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-23 20:12:10,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-23 20:12:14,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.343e+01 8.908e+01 9.732e+01 1.345e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 20:12:15,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2523193.3333333335, ans=0.125 2023-11-23 20:12:18,282 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5750, loss[loss=0.06224, simple_loss=0.08807, pruned_loss=0.01199, audio_tagging_loss=0.006216, over 15000.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09202, pruned_loss=0.01389, audio_tagging_loss=0.009174, over 3044119.92 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:12:29,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378500 2023-11-23 20:13:08,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2523526.6666666665, ans=0.1 2023-11-23 20:13:20,185 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5800, loss[loss=0.05873, simple_loss=0.06909, pruned_loss=0.0144, audio_tagging_loss=0.009788, over 14340.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09193, pruned_loss=0.01386, audio_tagging_loss=0.009123, over 3039744.29 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:13:31,210 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378550 2023-11-23 20:13:40,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2523660.0, ans=0.125 2023-11-23 20:13:44,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2523726.6666666665, ans=0.0 2023-11-23 20:13:46,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2523726.6666666665, ans=0.125 2023-11-23 20:14:13,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2523860.0, ans=0.125 2023-11-23 20:14:19,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.343e+01 9.004e+01 9.818e+01 1.259e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 20:14:21,782 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5850, loss[loss=0.05213, simple_loss=0.06241, pruned_loss=0.01002, audio_tagging_loss=0.0109, over 15284.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09106, pruned_loss=0.01366, audio_tagging_loss=0.009012, over 3037111.62 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 8.0 2023-11-23 20:14:22,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2523926.6666666665, ans=0.0 2023-11-23 20:14:24,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2523926.6666666665, ans=0.1 2023-11-23 20:14:33,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378600 2023-11-23 20:14:55,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2524060.0, ans=6.0 2023-11-23 20:15:24,460 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5900, loss[loss=0.06742, simple_loss=0.08998, pruned_loss=0.01378, audio_tagging_loss=0.008652, over 15373.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09137, pruned_loss=0.01367, audio_tagging_loss=0.008964, over 3039690.49 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 8.0 2023-11-23 20:15:35,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378650 2023-11-23 20:16:09,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-23 20:16:11,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2524460.0, ans=0.125 2023-11-23 20:16:23,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.483e+01 8.986e+01 9.661e+01 2.610e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-23 20:16:26,889 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 5950, loss[loss=0.06099, simple_loss=0.08125, pruned_loss=0.01016, audio_tagging_loss=0.01021, over 14817.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09192, pruned_loss=0.01388, audio_tagging_loss=0.008969, over 3043786.11 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 8.0 2023-11-23 20:16:27,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2524593.3333333335, ans=0.2 2023-11-23 20:16:38,063 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378700 2023-11-23 20:16:43,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2524660.0, ans=0.125 2023-11-23 20:16:54,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2524726.6666666665, ans=0.95 2023-11-23 20:17:01,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2524726.6666666665, ans=0.125 2023-11-23 20:17:17,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2524860.0, ans=0.125 2023-11-23 20:17:28,286 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6000, loss[loss=0.06581, simple_loss=0.08185, pruned_loss=0.01565, audio_tagging_loss=0.009237, over 14668.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09237, pruned_loss=0.01384, audio_tagging_loss=0.008994, over 3039152.57 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:17:28,287 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 20:17:46,144 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4749, 3.4766, 4.2825, 4.2263, 4.0875, 4.2121, 4.0415, 4.1932], device='cuda:1') 2023-11-23 20:18:07,444 INFO [train_asr.py:1253] (1/4) Epoch 32, validation: loss=0.05807, simple_loss=0.05104, pruned_loss=0.005144, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-23 20:18:07,445 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 20:18:15,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2524926.6666666665, ans=0.05 2023-11-23 20:18:18,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378750 2023-11-23 20:18:30,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2524993.3333333335, ans=0.125 2023-11-23 20:18:33,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2525060.0, ans=0.125 2023-11-23 20:18:52,183 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 20:19:06,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.325e+01 9.027e+01 9.614e+01 1.325e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-23 20:19:09,998 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6050, loss[loss=0.1076, simple_loss=0.1459, pruned_loss=0.02822, audio_tagging_loss=0.006489, over 16194.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09262, pruned_loss=0.01394, audio_tagging_loss=0.008981, over 3044931.65 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:19:18,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=12.0 2023-11-23 20:19:21,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378800 2023-11-23 20:19:24,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2525326.6666666665, ans=0.125 2023-11-23 20:19:24,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2525326.6666666665, ans=0.0 2023-11-23 20:19:30,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2525326.6666666665, ans=0.0 2023-11-23 20:19:33,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2525326.6666666665, ans=0.1 2023-11-23 20:19:34,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2525393.3333333335, ans=0.125 2023-11-23 20:20:07,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 20:20:12,630 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6100, loss[loss=0.07061, simple_loss=0.09741, pruned_loss=0.01503, audio_tagging_loss=0.006874, over 14970.00 frames. ], tot_loss[loss=0.06968, simple_loss=0.0933, pruned_loss=0.01412, audio_tagging_loss=0.008908, over 3051939.35 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:20:24,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378850 2023-11-23 20:20:40,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2525726.6666666665, ans=0.2 2023-11-23 20:20:49,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2525793.3333333335, ans=0.0 2023-11-23 20:20:50,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2525793.3333333335, ans=0.0 2023-11-23 20:21:00,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2525793.3333333335, ans=0.125 2023-11-23 20:21:09,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2525860.0, ans=0.125 2023-11-23 20:21:12,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.468e+01 9.222e+01 1.005e+02 1.157e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-23 20:21:15,191 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6150, loss[loss=0.06576, simple_loss=0.08986, pruned_loss=0.01395, audio_tagging_loss=0.006884, over 14727.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09295, pruned_loss=0.01395, audio_tagging_loss=0.008928, over 3051763.35 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:21:26,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378900 2023-11-23 20:21:42,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-23 20:22:03,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2526193.3333333335, ans=0.0 2023-11-23 20:22:06,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2526193.3333333335, ans=0.125 2023-11-23 20:22:16,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2526260.0, ans=0.125 2023-11-23 20:22:17,095 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6200, loss[loss=0.0788, simple_loss=0.1065, pruned_loss=0.01735, audio_tagging_loss=0.00819, over 15746.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.09173, pruned_loss=0.01383, audio_tagging_loss=0.009011, over 3049612.45 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:22:18,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-23 20:22:29,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 378950 2023-11-23 20:22:40,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2526326.6666666665, ans=0.125 2023-11-23 20:22:43,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=2526393.3333333335, ans=0.02 2023-11-23 20:23:15,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2526526.6666666665, ans=0.0 2023-11-23 20:23:17,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.457e+01 9.040e+01 9.939e+01 1.355e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-23 20:23:20,351 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6250, loss[loss=0.05405, simple_loss=0.06714, pruned_loss=0.01237, audio_tagging_loss=0.008107, over 14298.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09201, pruned_loss=0.01392, audio_tagging_loss=0.009069, over 3048844.34 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:23:30,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2526593.3333333335, ans=0.125 2023-11-23 20:23:31,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379000 2023-11-23 20:23:31,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2526660.0, ans=0.2 2023-11-23 20:23:33,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-11-23 20:23:48,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2526726.6666666665, ans=0.125 2023-11-23 20:23:56,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2526793.3333333335, ans=0.125 2023-11-23 20:24:22,787 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6300, loss[loss=0.05771, simple_loss=0.07524, pruned_loss=0.01138, audio_tagging_loss=0.00871, over 15045.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09267, pruned_loss=0.01389, audio_tagging_loss=0.009189, over 3046640.01 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:24:34,250 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379050 2023-11-23 20:24:44,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-23 20:24:48,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2527060.0, ans=0.125 2023-11-23 20:24:57,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2527060.0, ans=0.015 2023-11-23 20:25:17,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2527193.3333333335, ans=0.125 2023-11-23 20:25:21,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.520e+01 9.067e+01 9.897e+01 1.384e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-23 20:25:24,461 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6350, loss[loss=0.07871, simple_loss=0.106, pruned_loss=0.01897, audio_tagging_loss=0.006769, over 15163.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09226, pruned_loss=0.01378, audio_tagging_loss=0.009285, over 3047756.75 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:25:35,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379100 2023-11-23 20:25:59,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2527393.3333333335, ans=0.125 2023-11-23 20:26:26,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2527593.3333333335, ans=0.1 2023-11-23 20:26:27,031 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6400, loss[loss=0.06536, simple_loss=0.0882, pruned_loss=0.01357, audio_tagging_loss=0.007687, over 15100.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.0926, pruned_loss=0.01384, audio_tagging_loss=0.009303, over 3050344.61 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:26:28,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-23 20:26:35,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2527593.3333333335, ans=0.1 2023-11-23 20:26:38,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379150 2023-11-23 20:27:02,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2527726.6666666665, ans=0.0 2023-11-23 20:27:06,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2527793.3333333335, ans=0.2 2023-11-23 20:27:26,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2527860.0, ans=0.1 2023-11-23 20:27:27,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.271e+01 8.849e+01 9.635e+01 1.697e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-23 20:27:29,879 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6450, loss[loss=0.06189, simple_loss=0.08141, pruned_loss=0.01034, audio_tagging_loss=0.01084, over 15182.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.0922, pruned_loss=0.0138, audio_tagging_loss=0.009348, over 3045771.75 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:27:35,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2527926.6666666665, ans=0.125 2023-11-23 20:27:41,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379200 2023-11-23 20:27:45,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2527993.3333333335, ans=0.2 2023-11-23 20:27:51,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2527993.3333333335, ans=15.0 2023-11-23 20:27:53,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2528060.0, ans=0.125 2023-11-23 20:27:55,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-11-23 20:27:56,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2528060.0, ans=0.1 2023-11-23 20:28:03,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2528060.0, ans=0.2 2023-11-23 20:28:15,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2528126.6666666665, ans=0.0 2023-11-23 20:28:20,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-23 20:28:28,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2528193.3333333335, ans=0.0 2023-11-23 20:28:32,305 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6500, loss[loss=0.06138, simple_loss=0.07927, pruned_loss=0.01273, audio_tagging_loss=0.009013, over 15455.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09198, pruned_loss=0.01386, audio_tagging_loss=0.009272, over 3038172.93 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:28:33,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2528260.0, ans=0.125 2023-11-23 20:28:36,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2528260.0, ans=0.125 2023-11-23 20:28:43,259 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379250 2023-11-23 20:28:46,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2528326.6666666665, ans=0.1 2023-11-23 20:29:18,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2023-11-23 20:29:20,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2528460.0, ans=0.0 2023-11-23 20:29:31,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2023-11-23 20:29:32,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.606e+01 9.202e+01 9.720e+01 1.328e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-23 20:29:35,065 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6550, loss[loss=0.09531, simple_loss=0.1401, pruned_loss=0.02001, audio_tagging_loss=0.005233, over 16307.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.0924, pruned_loss=0.01371, audio_tagging_loss=0.009048, over 3039681.79 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:29:43,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-23 20:29:45,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-23 20:29:46,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379300 2023-11-23 20:29:48,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2528660.0, ans=0.1 2023-11-23 20:29:49,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2528660.0, ans=0.125 2023-11-23 20:29:55,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-23 20:30:04,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2528726.6666666665, ans=0.0 2023-11-23 20:30:04,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2528726.6666666665, ans=0.1 2023-11-23 20:30:05,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2528726.6666666665, ans=0.125 2023-11-23 20:30:13,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2528793.3333333335, ans=0.125 2023-11-23 20:30:16,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-23 20:30:27,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-11-23 20:30:37,589 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6600, loss[loss=0.07065, simple_loss=0.09691, pruned_loss=0.01438, audio_tagging_loss=0.007813, over 14852.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.0921, pruned_loss=0.01364, audio_tagging_loss=0.00887, over 3043683.54 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:30:44,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-23 20:30:44,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2528926.6666666665, ans=0.0 2023-11-23 20:30:49,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379350 2023-11-23 20:31:00,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-11-23 20:31:12,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2529060.0, ans=0.125 2023-11-23 20:31:37,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.263e+01 8.994e+01 9.633e+01 1.189e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 20:31:40,323 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6650, loss[loss=0.07607, simple_loss=0.1091, pruned_loss=0.01536, audio_tagging_loss=0.006159, over 15671.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09237, pruned_loss=0.0136, audio_tagging_loss=0.008776, over 3047179.92 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:31:41,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2529260.0, ans=0.125 2023-11-23 20:31:50,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379400 2023-11-23 20:31:54,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2529326.6666666665, ans=0.125 2023-11-23 20:32:00,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=22.5 2023-11-23 20:32:42,228 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6700, loss[loss=0.08732, simple_loss=0.1214, pruned_loss=0.01892, audio_tagging_loss=0.007678, over 16239.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09218, pruned_loss=0.01364, audio_tagging_loss=0.008821, over 3042459.87 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:32:42,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2529593.3333333335, ans=0.125 2023-11-23 20:32:53,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379450 2023-11-23 20:32:56,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2529660.0, ans=0.125 2023-11-23 20:33:20,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2529793.3333333335, ans=0.125 2023-11-23 20:33:42,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.739e+01 8.296e+01 8.825e+01 9.532e+01 1.402e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-23 20:33:42,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=2529860.0, ans=12.0 2023-11-23 20:33:45,223 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6750, loss[loss=0.07269, simple_loss=0.1053, pruned_loss=0.01272, audio_tagging_loss=0.007313, over 15883.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09229, pruned_loss=0.01367, audio_tagging_loss=0.008853, over 3039646.90 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:33:56,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379500 2023-11-23 20:34:12,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2530060.0, ans=0.1 2023-11-23 20:34:19,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2530060.0, ans=0.0 2023-11-23 20:34:35,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-23 20:34:42,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2530193.3333333335, ans=0.07 2023-11-23 20:34:44,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-11-23 20:34:48,045 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6800, loss[loss=0.05302, simple_loss=0.0649, pruned_loss=0.01059, audio_tagging_loss=0.00998, over 14935.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09152, pruned_loss=0.01366, audio_tagging_loss=0.008871, over 3040909.53 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:34:58,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379550 2023-11-23 20:35:01,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2530326.6666666665, ans=0.0 2023-11-23 20:35:24,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2530460.0, ans=0.125 2023-11-23 20:35:28,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2530460.0, ans=0.1 2023-11-23 20:35:45,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2023-11-23 20:35:48,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.273e+01 8.968e+01 9.529e+01 1.211e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-23 20:35:49,316 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6850, loss[loss=0.06235, simple_loss=0.08571, pruned_loss=0.009193, audio_tagging_loss=0.0103, over 16459.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09102, pruned_loss=0.01339, audio_tagging_loss=0.008935, over 3034345.73 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:35:57,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2530593.3333333335, ans=0.0 2023-11-23 20:36:00,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379600 2023-11-23 20:36:51,613 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6900, loss[loss=0.06706, simple_loss=0.09932, pruned_loss=0.01104, audio_tagging_loss=0.006364, over 15569.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09112, pruned_loss=0.01342, audio_tagging_loss=0.008937, over 3036452.11 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:37:02,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379650 2023-11-23 20:37:38,831 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 20:37:52,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.634e+01 8.329e+01 8.996e+01 9.675e+01 1.152e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 20:37:53,380 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 6950, loss[loss=0.06704, simple_loss=0.08164, pruned_loss=0.01553, audio_tagging_loss=0.0107, over 14786.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.09233, pruned_loss=0.01362, audio_tagging_loss=0.008821, over 3039688.87 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:37:57,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2531260.0, ans=0.125 2023-11-23 20:38:02,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2531260.0, ans=0.0 2023-11-23 20:38:04,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379700 2023-11-23 20:38:16,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2531326.6666666665, ans=0.125 2023-11-23 20:38:19,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2531393.3333333335, ans=0.0 2023-11-23 20:38:21,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-23 20:38:21,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2531393.3333333335, ans=0.125 2023-11-23 20:38:24,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2531393.3333333335, ans=0.2 2023-11-23 20:38:27,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2531393.3333333335, ans=0.0 2023-11-23 20:38:48,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2531526.6666666665, ans=0.125 2023-11-23 20:38:55,303 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7000, loss[loss=0.07433, simple_loss=0.09065, pruned_loss=0.01706, audio_tagging_loss=0.01195, over 14742.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09186, pruned_loss=0.0134, audio_tagging_loss=0.008789, over 3039224.15 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:39:06,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379750 2023-11-23 20:39:10,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2531660.0, ans=0.0 2023-11-23 20:39:18,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2531726.6666666665, ans=0.2 2023-11-23 20:39:18,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2531726.6666666665, ans=0.125 2023-11-23 20:39:30,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2531726.6666666665, ans=0.0 2023-11-23 20:39:37,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2531793.3333333335, ans=0.125 2023-11-23 20:39:52,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2531860.0, ans=0.1 2023-11-23 20:39:54,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2531860.0, ans=0.1 2023-11-23 20:39:55,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.253e+01 8.867e+01 9.647e+01 1.261e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 20:39:56,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2531926.6666666665, ans=0.125 2023-11-23 20:39:57,193 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7050, loss[loss=0.06597, simple_loss=0.0875, pruned_loss=0.01142, audio_tagging_loss=0.0108, over 15685.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09295, pruned_loss=0.01359, audio_tagging_loss=0.008824, over 3041392.47 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:40:02,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2023-11-23 20:40:08,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379800 2023-11-23 20:40:12,549 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 20:40:32,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2532060.0, ans=0.0 2023-11-23 20:40:35,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2532126.6666666665, ans=0.0 2023-11-23 20:40:41,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2532126.6666666665, ans=0.125 2023-11-23 20:40:50,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2532193.3333333335, ans=0.1 2023-11-23 20:40:59,236 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7100, loss[loss=0.07698, simple_loss=0.09891, pruned_loss=0.01852, audio_tagging_loss=0.008999, over 15261.00 frames. ], tot_loss[loss=0.06907, simple_loss=0.09313, pruned_loss=0.01358, audio_tagging_loss=0.008921, over 3042622.24 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:41:04,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2532260.0, ans=0.1 2023-11-23 20:41:11,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379850 2023-11-23 20:41:18,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2532326.6666666665, ans=0.125 2023-11-23 20:42:01,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.490e+01 9.157e+01 9.882e+01 1.439e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-23 20:42:01,954 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7150, loss[loss=0.07729, simple_loss=0.106, pruned_loss=0.01612, audio_tagging_loss=0.008191, over 14894.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09251, pruned_loss=0.01352, audio_tagging_loss=0.008982, over 3043017.04 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:42:02,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2532593.3333333335, ans=0.2 2023-11-23 20:42:13,326 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379900 2023-11-23 20:42:19,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2532660.0, ans=0.2 2023-11-23 20:42:38,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2532793.3333333335, ans=0.0 2023-11-23 20:43:04,343 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7200, loss[loss=0.04215, simple_loss=0.05364, pruned_loss=0.005419, audio_tagging_loss=0.009908, over 15557.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09243, pruned_loss=0.01354, audio_tagging_loss=0.009084, over 3047194.75 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:43:15,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 379950 2023-11-23 20:43:15,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2532993.3333333335, ans=0.125 2023-11-23 20:43:19,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2532993.3333333335, ans=0.125 2023-11-23 20:43:29,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2533060.0, ans=0.0 2023-11-23 20:43:46,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533126.6666666665, ans=0.1 2023-11-23 20:43:58,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2533193.3333333335, ans=0.025 2023-11-23 20:44:05,858 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7250, loss[loss=0.07416, simple_loss=0.1003, pruned_loss=0.01355, audio_tagging_loss=0.01047, over 15800.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09197, pruned_loss=0.01342, audio_tagging_loss=0.009164, over 3041906.80 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:44:06,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.202e+01 8.739e+01 9.383e+01 1.704e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-23 20:44:15,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-23 20:44:17,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380000 2023-11-23 20:44:26,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2533326.6666666665, ans=0.125 2023-11-23 20:44:28,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2533326.6666666665, ans=0.125 2023-11-23 20:44:50,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-23 20:45:10,595 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7300, loss[loss=0.06324, simple_loss=0.08456, pruned_loss=0.01239, audio_tagging_loss=0.008569, over 15270.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09185, pruned_loss=0.01334, audio_tagging_loss=0.009114, over 3042261.36 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:45:17,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2533593.3333333335, ans=0.0 2023-11-23 20:45:22,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380050 2023-11-23 20:45:23,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2533660.0, ans=0.0 2023-11-23 20:45:23,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2533660.0, ans=0.0 2023-11-23 20:45:38,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2533726.6666666665, ans=0.125 2023-11-23 20:45:42,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2533726.6666666665, ans=0.125 2023-11-23 20:45:44,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533726.6666666665, ans=0.1 2023-11-23 20:45:52,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.17 vs. limit=10.0 2023-11-23 20:46:00,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2533860.0, ans=0.125 2023-11-23 20:46:07,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533860.0, ans=0.1 2023-11-23 20:46:14,066 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7350, loss[loss=0.07328, simple_loss=0.1037, pruned_loss=0.01136, audio_tagging_loss=0.01009, over 15859.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09165, pruned_loss=0.01345, audio_tagging_loss=0.008977, over 3042130.34 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:46:15,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.465e+01 9.022e+01 1.001e+02 1.675e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-23 20:46:15,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2533926.6666666665, ans=0.0 2023-11-23 20:46:25,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380100 2023-11-23 20:46:35,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2533993.3333333335, ans=0.2 2023-11-23 20:46:46,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2534060.0, ans=0.035 2023-11-23 20:46:54,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2534126.6666666665, ans=0.2 2023-11-23 20:46:56,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2534126.6666666665, ans=0.0 2023-11-23 20:47:10,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2534193.3333333335, ans=0.2 2023-11-23 20:47:11,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2534193.3333333335, ans=0.2 2023-11-23 20:47:16,152 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7400, loss[loss=0.07811, simple_loss=0.1103, pruned_loss=0.01234, audio_tagging_loss=0.01063, over 14503.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09163, pruned_loss=0.01356, audio_tagging_loss=0.008969, over 3039725.61 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:47:22,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2534260.0, ans=0.0 2023-11-23 20:47:27,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380150 2023-11-23 20:47:34,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2534326.6666666665, ans=0.2 2023-11-23 20:47:34,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2534326.6666666665, ans=0.125 2023-11-23 20:47:40,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2534393.3333333335, ans=0.125 2023-11-23 20:48:04,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2534526.6666666665, ans=0.2 2023-11-23 20:48:16,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2534526.6666666665, ans=0.07 2023-11-23 20:48:17,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-23 20:48:18,438 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7450, loss[loss=0.06739, simple_loss=0.08985, pruned_loss=0.01523, audio_tagging_loss=0.007233, over 13939.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09174, pruned_loss=0.01354, audio_tagging_loss=0.008847, over 3045490.25 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:48:19,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.435e+01 9.167e+01 9.796e+01 1.283e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 20:48:22,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2534593.3333333335, ans=0.2 2023-11-23 20:48:23,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-23 20:48:29,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380200 2023-11-23 20:48:45,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2534726.6666666665, ans=0.125 2023-11-23 20:49:17,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2534860.0, ans=0.1 2023-11-23 20:49:20,992 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7500, loss[loss=0.07865, simple_loss=0.0975, pruned_loss=0.02044, audio_tagging_loss=0.009464, over 16147.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.0918, pruned_loss=0.01365, audio_tagging_loss=0.008869, over 3045377.84 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:49:32,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380250 2023-11-23 20:49:37,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2534993.3333333335, ans=0.0 2023-11-23 20:49:52,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=12.0 2023-11-23 20:49:58,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2535126.6666666665, ans=0.125 2023-11-23 20:49:59,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-23 20:50:23,269 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7550, loss[loss=0.09337, simple_loss=0.1283, pruned_loss=0.02199, audio_tagging_loss=0.007221, over 15277.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09182, pruned_loss=0.0136, audio_tagging_loss=0.008873, over 3048889.84 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2023-11-23 20:50:24,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.259e+01 8.879e+01 9.874e+01 1.226e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-23 20:50:34,018 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380300 2023-11-23 20:50:34,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2535326.6666666665, ans=0.0 2023-11-23 20:50:40,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2535326.6666666665, ans=0.1 2023-11-23 20:50:45,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2535326.6666666665, ans=0.0 2023-11-23 20:50:51,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2535393.3333333335, ans=0.0 2023-11-23 20:50:56,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2535393.3333333335, ans=0.125 2023-11-23 20:51:09,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2535460.0, ans=0.125 2023-11-23 20:51:25,737 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7600, loss[loss=0.06535, simple_loss=0.08144, pruned_loss=0.01192, audio_tagging_loss=0.01271, over 15832.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09185, pruned_loss=0.01366, audio_tagging_loss=0.008965, over 3053145.62 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:51:30,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2535593.3333333335, ans=0.1 2023-11-23 20:51:36,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380350 2023-11-23 20:52:02,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2535793.3333333335, ans=0.0 2023-11-23 20:52:02,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2535793.3333333335, ans=0.0 2023-11-23 20:52:03,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2535793.3333333335, ans=0.0 2023-11-23 20:52:18,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-23 20:52:21,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2535860.0, ans=0.1 2023-11-23 20:52:27,483 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7650, loss[loss=0.07558, simple_loss=0.09288, pruned_loss=0.0214, audio_tagging_loss=0.007746, over 14942.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09117, pruned_loss=0.01365, audio_tagging_loss=0.008934, over 3044192.72 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:52:29,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.327e+01 8.912e+01 9.610e+01 1.311e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-23 20:52:39,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380400 2023-11-23 20:52:41,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2023-11-23 20:52:42,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2535993.3333333335, ans=0.125 2023-11-23 20:53:08,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2536126.6666666665, ans=0.125 2023-11-23 20:53:16,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2536193.3333333335, ans=0.125 2023-11-23 20:53:18,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-23 20:53:21,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2536193.3333333335, ans=0.125 2023-11-23 20:53:31,356 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7700, loss[loss=0.05971, simple_loss=0.06917, pruned_loss=0.01292, audio_tagging_loss=0.0122, over 14232.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09178, pruned_loss=0.01381, audio_tagging_loss=0.008917, over 3041478.89 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:53:42,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380450 2023-11-23 20:53:56,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-23 20:54:01,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2536393.3333333335, ans=0.1 2023-11-23 20:54:07,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2023-11-23 20:54:10,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-23 20:54:32,980 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7750, loss[loss=0.06166, simple_loss=0.08297, pruned_loss=0.0101, audio_tagging_loss=0.01007, over 15696.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09127, pruned_loss=0.0136, audio_tagging_loss=0.009013, over 3038615.80 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:54:34,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.446e+01 9.126e+01 9.765e+01 1.172e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 20:54:44,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380500 2023-11-23 20:55:04,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536726.6666666665, ans=0.1 2023-11-23 20:55:16,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2536793.3333333335, ans=0.125 2023-11-23 20:55:31,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2536860.0, ans=0.125 2023-11-23 20:55:34,686 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7800, loss[loss=0.0588, simple_loss=0.0794, pruned_loss=0.009143, audio_tagging_loss=0.009961, over 15349.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.091, pruned_loss=0.01349, audio_tagging_loss=0.009048, over 3033134.74 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:55:46,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380550 2023-11-23 20:56:07,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-23 20:56:08,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2023-11-23 20:56:10,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2537060.0, ans=0.125 2023-11-23 20:56:10,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2537060.0, ans=0.125 2023-11-23 20:56:37,562 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7850, loss[loss=0.07295, simple_loss=0.1039, pruned_loss=0.01236, audio_tagging_loss=0.008663, over 15211.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09171, pruned_loss=0.01361, audio_tagging_loss=0.009118, over 3035680.28 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:56:38,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.376e+01 9.070e+01 9.715e+01 1.480e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-23 20:56:39,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2537260.0, ans=0.125 2023-11-23 20:56:49,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380600 2023-11-23 20:56:55,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2537326.6666666665, ans=0.2 2023-11-23 20:56:57,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2537326.6666666665, ans=0.1 2023-11-23 20:57:14,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2537460.0, ans=0.125 2023-11-23 20:57:19,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-23 20:57:39,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-23 20:57:40,335 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7900, loss[loss=0.04659, simple_loss=0.05801, pruned_loss=0.007944, audio_tagging_loss=0.009647, over 15408.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09253, pruned_loss=0.01374, audio_tagging_loss=0.009128, over 3036035.35 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:57:41,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2537593.3333333335, ans=0.04949747468305833 2023-11-23 20:57:51,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380650 2023-11-23 20:57:55,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2537660.0, ans=0.0 2023-11-23 20:58:20,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2537793.3333333335, ans=0.125 2023-11-23 20:58:25,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2023-11-23 20:58:42,634 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 7950, loss[loss=0.06912, simple_loss=0.09288, pruned_loss=0.01213, audio_tagging_loss=0.01055, over 15410.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09246, pruned_loss=0.01377, audio_tagging_loss=0.009259, over 3031035.93 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:58:43,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.414e+01 9.153e+01 9.684e+01 1.303e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-23 20:58:54,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380700 2023-11-23 20:58:57,530 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 20:58:58,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2537993.3333333335, ans=0.1 2023-11-23 20:59:27,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2538126.6666666665, ans=0.0 2023-11-23 20:59:37,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2538193.3333333335, ans=0.0 2023-11-23 20:59:44,780 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8000, loss[loss=0.06929, simple_loss=0.0956, pruned_loss=0.01351, audio_tagging_loss=0.007979, over 15533.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09198, pruned_loss=0.01369, audio_tagging_loss=0.009346, over 3033397.37 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 20:59:56,911 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380750 2023-11-23 20:59:59,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2538326.6666666665, ans=0.125 2023-11-23 21:00:47,869 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8050, loss[loss=0.06523, simple_loss=0.08406, pruned_loss=0.01405, audio_tagging_loss=0.009149, over 15041.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09154, pruned_loss=0.01364, audio_tagging_loss=0.009372, over 3037521.85 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2023-11-23 21:00:48,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.478e+01 9.043e+01 9.667e+01 1.192e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 21:00:49,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2538593.3333333335, ans=0.2 2023-11-23 21:00:59,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380800 2023-11-23 21:01:04,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-23 21:01:21,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2538726.6666666665, ans=0.0 2023-11-23 21:01:30,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2538793.3333333335, ans=0.0 2023-11-23 21:01:40,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2538860.0, ans=0.5 2023-11-23 21:01:41,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2538860.0, ans=0.125 2023-11-23 21:01:50,364 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8100, loss[loss=0.05929, simple_loss=0.07918, pruned_loss=0.01026, audio_tagging_loss=0.009447, over 14821.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09148, pruned_loss=0.01361, audio_tagging_loss=0.00933, over 3037549.07 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:01:57,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2023-11-23 21:02:01,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380850 2023-11-23 21:02:04,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2538993.3333333335, ans=0.0 2023-11-23 21:02:20,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2539060.0, ans=0.0 2023-11-23 21:02:21,133 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:02:50,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-23 21:02:52,169 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8150, loss[loss=0.06632, simple_loss=0.0839, pruned_loss=0.01499, audio_tagging_loss=0.009377, over 13788.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09174, pruned_loss=0.01354, audio_tagging_loss=0.009177, over 3043550.19 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:02:52,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2539260.0, ans=0.2 2023-11-23 21:02:54,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.341e+01 9.005e+01 9.405e+01 1.221e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 21:03:03,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380900 2023-11-23 21:03:09,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2539326.6666666665, ans=0.125 2023-11-23 21:03:54,163 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8200, loss[loss=0.07294, simple_loss=0.09769, pruned_loss=0.01467, audio_tagging_loss=0.009426, over 14825.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09214, pruned_loss=0.01336, audio_tagging_loss=0.009036, over 3043021.92 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:03:55,302 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:04:05,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 380950 2023-11-23 21:04:08,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2539660.0, ans=0.1 2023-11-23 21:04:12,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2539660.0, ans=0.0 2023-11-23 21:04:13,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2539660.0, ans=0.2 2023-11-23 21:04:20,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2539726.6666666665, ans=0.125 2023-11-23 21:04:22,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2539726.6666666665, ans=0.125 2023-11-23 21:04:56,982 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8250, loss[loss=0.06528, simple_loss=0.08741, pruned_loss=0.0122, audio_tagging_loss=0.009376, over 15304.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09167, pruned_loss=0.01332, audio_tagging_loss=0.009083, over 3050092.37 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:04:59,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.237e+01 8.988e+01 9.644e+01 1.224e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-23 21:05:07,903 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381000 2023-11-23 21:05:09,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2539993.3333333335, ans=0.1 2023-11-23 21:05:20,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2023-11-23 21:05:22,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2540060.0, ans=0.1 2023-11-23 21:05:38,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2540126.6666666665, ans=0.1 2023-11-23 21:05:40,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2540126.6666666665, ans=0.125 2023-11-23 21:05:47,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2540193.3333333335, ans=0.0 2023-11-23 21:05:54,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2540193.3333333335, ans=0.1 2023-11-23 21:05:59,291 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8300, loss[loss=0.08306, simple_loss=0.1149, pruned_loss=0.01633, audio_tagging_loss=0.00927, over 15820.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.0913, pruned_loss=0.01312, audio_tagging_loss=0.009056, over 3051337.04 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:06:10,714 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381050 2023-11-23 21:06:15,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2540326.6666666665, ans=0.125 2023-11-23 21:06:16,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2540326.6666666665, ans=0.125 2023-11-23 21:06:23,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2540393.3333333335, ans=0.2 2023-11-23 21:06:30,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:06:35,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2540460.0, ans=0.0 2023-11-23 21:07:01,058 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8350, loss[loss=0.0661, simple_loss=0.08959, pruned_loss=0.01048, audio_tagging_loss=0.01082, over 16960.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09132, pruned_loss=0.01318, audio_tagging_loss=0.009015, over 3053152.27 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:07:03,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.451e+01 9.185e+01 9.824e+01 1.570e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-23 21:07:06,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-23 21:07:10,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2540593.3333333335, ans=0.0 2023-11-23 21:07:11,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381100 2023-11-23 21:07:11,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2540660.0, ans=0.1 2023-11-23 21:07:18,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2540660.0, ans=0.2 2023-11-23 21:07:27,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2540726.6666666665, ans=0.2 2023-11-23 21:07:55,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2540860.0, ans=0.125 2023-11-23 21:08:02,553 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8400, loss[loss=0.0578, simple_loss=0.07974, pruned_loss=0.009977, audio_tagging_loss=0.007949, over 15136.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09115, pruned_loss=0.01332, audio_tagging_loss=0.008974, over 3044310.73 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:08:06,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2540926.6666666665, ans=0.0 2023-11-23 21:08:13,835 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381150 2023-11-23 21:08:13,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2540993.3333333335, ans=0.0 2023-11-23 21:08:27,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2541060.0, ans=0.125 2023-11-23 21:08:31,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2541060.0, ans=0.125 2023-11-23 21:08:36,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2541060.0, ans=0.1 2023-11-23 21:08:37,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2541060.0, ans=0.1 2023-11-23 21:08:37,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2541060.0, ans=0.1 2023-11-23 21:08:43,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2541126.6666666665, ans=0.125 2023-11-23 21:09:04,787 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8450, loss[loss=0.06948, simple_loss=0.09743, pruned_loss=0.01414, audio_tagging_loss=0.006621, over 13218.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09121, pruned_loss=0.01333, audio_tagging_loss=0.008992, over 3046623.96 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:09:08,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.427e+01 8.936e+01 9.652e+01 1.220e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 21:09:15,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381200 2023-11-23 21:09:21,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.93 vs. limit=15.0 2023-11-23 21:09:24,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2541326.6666666665, ans=0.0 2023-11-23 21:09:25,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2541326.6666666665, ans=0.125 2023-11-23 21:09:35,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2541393.3333333335, ans=0.125 2023-11-23 21:09:46,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2541460.0, ans=0.2 2023-11-23 21:09:52,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-23 21:10:07,116 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8500, loss[loss=0.0793, simple_loss=0.1039, pruned_loss=0.01757, audio_tagging_loss=0.009757, over 15182.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09196, pruned_loss=0.01358, audio_tagging_loss=0.008979, over 3051371.25 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:10:15,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2541593.3333333335, ans=0.125 2023-11-23 21:10:18,031 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381250 2023-11-23 21:10:30,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2541726.6666666665, ans=0.05 2023-11-23 21:10:42,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2541726.6666666665, ans=0.125 2023-11-23 21:10:48,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2541793.3333333335, ans=0.125 2023-11-23 21:11:06,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2541860.0, ans=0.04949747468305833 2023-11-23 21:11:08,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2541926.6666666665, ans=0.125 2023-11-23 21:11:09,329 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8550, loss[loss=0.05956, simple_loss=0.06816, pruned_loss=0.01112, audio_tagging_loss=0.01437, over 14975.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.0919, pruned_loss=0.01348, audio_tagging_loss=0.008996, over 3054553.46 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:11:10,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2541926.6666666665, ans=0.125 2023-11-23 21:11:12,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.400e+01 9.293e+01 9.776e+01 1.237e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-23 21:11:13,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-23 21:11:20,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381300 2023-11-23 21:11:20,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2541993.3333333335, ans=0.0 2023-11-23 21:11:23,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2541993.3333333335, ans=0.125 2023-11-23 21:11:30,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2541993.3333333335, ans=0.125 2023-11-23 21:11:36,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-23 21:12:00,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2542193.3333333335, ans=0.2 2023-11-23 21:12:05,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2542193.3333333335, ans=0.125 2023-11-23 21:12:10,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2542260.0, ans=0.125 2023-11-23 21:12:11,436 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8600, loss[loss=0.07839, simple_loss=0.1036, pruned_loss=0.01691, audio_tagging_loss=0.009672, over 15626.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09281, pruned_loss=0.01371, audio_tagging_loss=0.00902, over 3051228.13 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:12:16,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2542260.0, ans=0.125 2023-11-23 21:12:22,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381350 2023-11-23 21:12:22,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2542326.6666666665, ans=0.1 2023-11-23 21:12:23,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2023-11-23 21:12:24,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-11-23 21:12:32,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2542326.6666666665, ans=0.0 2023-11-23 21:12:37,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2542393.3333333335, ans=0.2 2023-11-23 21:12:37,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2542393.3333333335, ans=0.0 2023-11-23 21:12:44,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2542393.3333333335, ans=0.125 2023-11-23 21:12:51,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2542460.0, ans=0.125 2023-11-23 21:12:54,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-23 21:12:57,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2542460.0, ans=0.035 2023-11-23 21:13:13,091 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8650, loss[loss=0.05465, simple_loss=0.07275, pruned_loss=0.007726, audio_tagging_loss=0.01055, over 15615.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09241, pruned_loss=0.01354, audio_tagging_loss=0.009092, over 3049186.47 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:13:16,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.560e+01 9.209e+01 9.798e+01 1.197e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-23 21:13:23,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381400 2023-11-23 21:13:28,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-23 21:13:35,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2542660.0, ans=0.1 2023-11-23 21:13:59,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2542793.3333333335, ans=0.05 2023-11-23 21:14:15,420 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8700, loss[loss=0.08895, simple_loss=0.1217, pruned_loss=0.02088, audio_tagging_loss=0.007203, over 14590.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09312, pruned_loss=0.01355, audio_tagging_loss=0.00903, over 3054489.26 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:14:24,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2542926.6666666665, ans=0.125 2023-11-23 21:14:26,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381450 2023-11-23 21:14:33,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-23 21:14:43,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=22.5 2023-11-23 21:14:52,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-23 21:15:06,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2543193.3333333335, ans=0.1 2023-11-23 21:15:10,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2543193.3333333335, ans=0.1 2023-11-23 21:15:13,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2543193.3333333335, ans=0.125 2023-11-23 21:15:14,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2543193.3333333335, ans=0.125 2023-11-23 21:15:17,575 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8750, loss[loss=0.0707, simple_loss=0.09416, pruned_loss=0.01479, audio_tagging_loss=0.008829, over 15656.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09218, pruned_loss=0.01347, audio_tagging_loss=0.009112, over 3051619.06 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:15:21,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.453e+01 8.955e+01 9.752e+01 1.523e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-23 21:15:29,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381500 2023-11-23 21:15:39,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2543326.6666666665, ans=0.125 2023-11-23 21:16:01,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2543460.0, ans=0.1 2023-11-23 21:16:13,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2543526.6666666665, ans=0.1 2023-11-23 21:16:20,095 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8800, loss[loss=0.07493, simple_loss=0.1015, pruned_loss=0.01322, audio_tagging_loss=0.01093, over 17176.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.09193, pruned_loss=0.01346, audio_tagging_loss=0.009191, over 3053399.43 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:16:23,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2543593.3333333335, ans=0.04949747468305833 2023-11-23 21:16:31,273 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381550 2023-11-23 21:16:48,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2543726.6666666665, ans=0.0 2023-11-23 21:17:22,304 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8850, loss[loss=0.06368, simple_loss=0.08098, pruned_loss=0.01313, audio_tagging_loss=0.01006, over 14041.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09189, pruned_loss=0.01363, audio_tagging_loss=0.009235, over 3045087.67 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:17:23,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-23 21:17:25,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.519e+01 9.142e+01 9.754e+01 1.117e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 21:17:31,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2543926.6666666665, ans=0.025 2023-11-23 21:17:33,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381600 2023-11-23 21:17:35,044 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:17:42,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2543993.3333333335, ans=0.125 2023-11-23 21:17:54,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2544060.0, ans=0.0 2023-11-23 21:18:18,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2544193.3333333335, ans=0.05 2023-11-23 21:18:25,679 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8900, loss[loss=0.09495, simple_loss=0.1264, pruned_loss=0.0214, audio_tagging_loss=0.01037, over 15250.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09235, pruned_loss=0.01363, audio_tagging_loss=0.009051, over 3041847.51 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:18:32,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2544260.0, ans=0.1 2023-11-23 21:18:33,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2544260.0, ans=0.125 2023-11-23 21:18:36,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381650 2023-11-23 21:19:04,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2023-11-23 21:19:05,854 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:19:15,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-23 21:19:16,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2544526.6666666665, ans=0.2 2023-11-23 21:19:24,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-23 21:19:27,554 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 8950, loss[loss=0.08104, simple_loss=0.1191, pruned_loss=0.01623, audio_tagging_loss=0.005261, over 14788.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09188, pruned_loss=0.01361, audio_tagging_loss=0.008901, over 3042078.58 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:19:30,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2544593.3333333335, ans=0.125 2023-11-23 21:19:32,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.310e+01 8.895e+01 9.557e+01 1.131e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 21:19:39,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381700 2023-11-23 21:19:43,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2544660.0, ans=0.125 2023-11-23 21:19:53,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2544726.6666666665, ans=0.1 2023-11-23 21:19:59,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2544726.6666666665, ans=0.125 2023-11-23 21:20:11,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2544793.3333333335, ans=0.125 2023-11-23 21:20:30,423 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9000, loss[loss=0.05195, simple_loss=0.06776, pruned_loss=0.01009, audio_tagging_loss=0.007981, over 16180.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09162, pruned_loss=0.01349, audio_tagging_loss=0.00888, over 3046807.52 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:20:30,423 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 21:21:10,440 INFO [train_asr.py:1253] (1/4) Epoch 32, validation: loss=0.05909, simple_loss=0.05099, pruned_loss=0.005136, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-23 21:21:10,441 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 21:21:12,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2544926.6666666665, ans=0.0 2023-11-23 21:21:20,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2544926.6666666665, ans=0.125 2023-11-23 21:21:21,702 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381750 2023-11-23 21:21:43,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2545060.0, ans=0.04949747468305833 2023-11-23 21:21:53,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2545126.6666666665, ans=0.125 2023-11-23 21:21:58,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2545126.6666666665, ans=0.0 2023-11-23 21:22:04,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2545193.3333333335, ans=0.0 2023-11-23 21:22:06,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2545193.3333333335, ans=0.04949747468305833 2023-11-23 21:22:12,915 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9050, loss[loss=0.04961, simple_loss=0.06133, pruned_loss=0.008851, audio_tagging_loss=0.0101, over 15928.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09254, pruned_loss=0.01369, audio_tagging_loss=0.008819, over 3052080.26 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:22:16,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2545260.0, ans=0.0 2023-11-23 21:22:17,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.661e+01 9.152e+01 9.951e+01 1.252e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 21:22:17,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2545260.0, ans=0.125 2023-11-23 21:22:24,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381800 2023-11-23 21:22:40,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2545393.3333333335, ans=0.125 2023-11-23 21:22:42,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2545393.3333333335, ans=0.125 2023-11-23 21:23:02,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2545526.6666666665, ans=0.0 2023-11-23 21:23:15,527 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9100, loss[loss=0.06447, simple_loss=0.08584, pruned_loss=0.01217, audio_tagging_loss=0.009376, over 14125.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09194, pruned_loss=0.0136, audio_tagging_loss=0.00882, over 3056826.11 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:23:23,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2023-11-23 21:23:26,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381850 2023-11-23 21:24:01,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-23 21:24:02,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-23 21:24:05,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2545860.0, ans=0.125 2023-11-23 21:24:07,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-11-23 21:24:17,817 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9150, loss[loss=0.08026, simple_loss=0.1146, pruned_loss=0.01349, audio_tagging_loss=0.009466, over 15951.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09209, pruned_loss=0.01349, audio_tagging_loss=0.008818, over 3055110.80 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:24:23,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.321e+01 8.921e+01 9.557e+01 1.166e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 21:24:24,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2545926.6666666665, ans=0.1 2023-11-23 21:24:29,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381900 2023-11-23 21:24:31,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2545993.3333333335, ans=0.0 2023-11-23 21:24:32,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=8.0 2023-11-23 21:24:43,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2546060.0, ans=0.125 2023-11-23 21:24:43,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=12.0 2023-11-23 21:24:47,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2546060.0, ans=0.125 2023-11-23 21:24:54,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2546126.6666666665, ans=0.125 2023-11-23 21:25:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2546193.3333333335, ans=0.125 2023-11-23 21:25:20,076 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9200, loss[loss=0.07419, simple_loss=0.1007, pruned_loss=0.01498, audio_tagging_loss=0.008861, over 15602.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09104, pruned_loss=0.01334, audio_tagging_loss=0.008945, over 3055831.20 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:25:25,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2546260.0, ans=0.125 2023-11-23 21:25:28,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2546260.0, ans=0.125 2023-11-23 21:25:31,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 381950 2023-11-23 21:25:31,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2546326.6666666665, ans=0.1 2023-11-23 21:25:35,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2546326.6666666665, ans=0.2 2023-11-23 21:25:40,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-23 21:25:43,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2546393.3333333335, ans=0.1 2023-11-23 21:26:03,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2023-11-23 21:26:22,409 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9250, loss[loss=0.07553, simple_loss=0.0965, pruned_loss=0.01846, audio_tagging_loss=0.00882, over 15821.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09102, pruned_loss=0.01334, audio_tagging_loss=0.008922, over 3062362.63 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:26:26,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2546593.3333333335, ans=0.1 2023-11-23 21:26:28,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-23 21:26:28,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.321e+01 8.948e+01 9.831e+01 1.146e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-23 21:26:29,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2546593.3333333335, ans=0.2 2023-11-23 21:26:33,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382000 2023-11-23 21:26:44,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2546660.0, ans=0.125 2023-11-23 21:26:46,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2546660.0, ans=0.125 2023-11-23 21:27:06,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2546793.3333333335, ans=0.125 2023-11-23 21:27:07,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2546793.3333333335, ans=15.0 2023-11-23 21:27:18,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2546860.0, ans=0.05 2023-11-23 21:27:25,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=22.5 2023-11-23 21:27:25,765 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9300, loss[loss=0.07431, simple_loss=0.1052, pruned_loss=0.01404, audio_tagging_loss=0.007678, over 16347.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09184, pruned_loss=0.01352, audio_tagging_loss=0.008849, over 3066268.12 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:27:36,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382050 2023-11-23 21:27:46,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2023-11-23 21:27:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2547060.0, ans=15.0 2023-11-23 21:28:11,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2547126.6666666665, ans=0.1 2023-11-23 21:28:21,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-23 21:28:27,239 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9350, loss[loss=0.05122, simple_loss=0.05397, pruned_loss=0.01198, audio_tagging_loss=0.01226, over 15787.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09106, pruned_loss=0.01344, audio_tagging_loss=0.008939, over 3060056.27 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:28:32,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2547260.0, ans=0.0 2023-11-23 21:28:33,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.368e+01 8.391e+01 8.927e+01 9.865e+01 1.174e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 21:28:38,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382100 2023-11-23 21:28:46,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2547326.6666666665, ans=0.2 2023-11-23 21:28:58,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2547393.3333333335, ans=0.1 2023-11-23 21:29:19,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2547526.6666666665, ans=0.0 2023-11-23 21:29:27,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2547526.6666666665, ans=0.125 2023-11-23 21:29:29,303 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9400, loss[loss=0.06744, simple_loss=0.0868, pruned_loss=0.01436, audio_tagging_loss=0.009676, over 16303.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09175, pruned_loss=0.01366, audio_tagging_loss=0.008969, over 3057160.59 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:29:33,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2547593.3333333335, ans=0.2 2023-11-23 21:29:35,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2547593.3333333335, ans=0.02 2023-11-23 21:29:40,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382150 2023-11-23 21:29:42,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2547660.0, ans=0.125 2023-11-23 21:29:45,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2547660.0, ans=0.125 2023-11-23 21:29:58,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2547726.6666666665, ans=0.125 2023-11-23 21:30:29,122 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:30:31,437 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9450, loss[loss=0.07926, simple_loss=0.09313, pruned_loss=0.02071, audio_tagging_loss=0.01198, over 15898.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09169, pruned_loss=0.0136, audio_tagging_loss=0.009085, over 3052381.07 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:30:37,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.414e+01 9.183e+01 9.854e+01 1.204e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-23 21:30:43,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382200 2023-11-23 21:30:43,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2547993.3333333335, ans=0.0 2023-11-23 21:31:15,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2548126.6666666665, ans=0.2 2023-11-23 21:31:16,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2548126.6666666665, ans=0.0 2023-11-23 21:31:24,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2548193.3333333335, ans=0.1 2023-11-23 21:31:34,401 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9500, loss[loss=0.05338, simple_loss=0.07289, pruned_loss=0.007819, audio_tagging_loss=0.009119, over 14937.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09269, pruned_loss=0.0138, audio_tagging_loss=0.009107, over 3062515.75 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:31:34,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2548260.0, ans=0.5 2023-11-23 21:31:35,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2023-11-23 21:31:41,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-23 21:31:45,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382250 2023-11-23 21:31:48,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2548326.6666666665, ans=0.2 2023-11-23 21:32:08,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2548393.3333333335, ans=0.125 2023-11-23 21:32:13,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2548460.0, ans=0.1 2023-11-23 21:32:17,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2548460.0, ans=0.1 2023-11-23 21:32:25,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2548526.6666666665, ans=0.125 2023-11-23 21:32:34,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2548526.6666666665, ans=0.05 2023-11-23 21:32:36,310 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9550, loss[loss=0.04704, simple_loss=0.0623, pruned_loss=0.007106, audio_tagging_loss=0.00878, over 14576.00 frames. ], tot_loss[loss=0.06963, simple_loss=0.09321, pruned_loss=0.01392, audio_tagging_loss=0.009112, over 3061290.16 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:32:42,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.598e+01 9.160e+01 9.932e+01 1.326e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-23 21:32:47,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382300 2023-11-23 21:32:47,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-23 21:32:50,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2023-11-23 21:32:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2548660.0, ans=0.0 2023-11-23 21:32:52,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2548660.0, ans=0.0 2023-11-23 21:32:59,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2548660.0, ans=0.0 2023-11-23 21:33:22,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2548793.3333333335, ans=0.125 2023-11-23 21:33:30,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-23 21:33:37,960 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9600, loss[loss=0.08079, simple_loss=0.1118, pruned_loss=0.01571, audio_tagging_loss=0.0092, over 14780.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09261, pruned_loss=0.01377, audio_tagging_loss=0.009159, over 3051273.60 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:33:41,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2548926.6666666665, ans=0.125 2023-11-23 21:33:45,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2548926.6666666665, ans=0.2 2023-11-23 21:33:50,059 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382350 2023-11-23 21:33:50,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-23 21:33:52,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2548993.3333333335, ans=0.125 2023-11-23 21:33:52,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2548993.3333333335, ans=0.1 2023-11-23 21:33:53,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2548993.3333333335, ans=0.0 2023-11-23 21:33:55,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.76 vs. limit=22.5 2023-11-23 21:34:08,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2549060.0, ans=0.0 2023-11-23 21:34:31,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2549193.3333333335, ans=0.125 2023-11-23 21:34:39,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2549193.3333333335, ans=0.1 2023-11-23 21:34:41,429 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9650, loss[loss=0.08738, simple_loss=0.1114, pruned_loss=0.02199, audio_tagging_loss=0.009705, over 13983.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09239, pruned_loss=0.01381, audio_tagging_loss=0.009253, over 3046961.45 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:34:47,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.434e+01 9.003e+01 9.684e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 21:34:52,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382400 2023-11-23 21:35:03,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2549326.6666666665, ans=0.125 2023-11-23 21:35:21,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-23 21:35:24,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2549460.0, ans=0.125 2023-11-23 21:35:44,315 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9700, loss[loss=0.07898, simple_loss=0.1078, pruned_loss=0.01733, audio_tagging_loss=0.007737, over 15248.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09239, pruned_loss=0.01387, audio_tagging_loss=0.009063, over 3048244.67 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:35:55,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382450 2023-11-23 21:35:55,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-23 21:36:08,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:36:45,375 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9750, loss[loss=0.06351, simple_loss=0.09247, pruned_loss=0.008527, audio_tagging_loss=0.008752, over 15663.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09241, pruned_loss=0.01385, audio_tagging_loss=0.008896, over 3045160.50 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:36:45,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2549926.6666666665, ans=0.0 2023-11-23 21:36:51,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.362e+01 8.986e+01 9.723e+01 1.186e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 21:36:54,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2023-11-23 21:36:56,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382500 2023-11-23 21:37:10,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2550060.0, ans=0.1 2023-11-23 21:37:27,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2550126.6666666665, ans=0.125 2023-11-23 21:37:45,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2550193.3333333335, ans=0.0 2023-11-23 21:37:47,723 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9800, loss[loss=0.07458, simple_loss=0.1024, pruned_loss=0.01697, audio_tagging_loss=0.006404, over 15186.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09242, pruned_loss=0.01382, audio_tagging_loss=0.008885, over 3042515.89 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:37:49,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2550260.0, ans=0.125 2023-11-23 21:37:49,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2550260.0, ans=0.125 2023-11-23 21:37:52,076 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:37:59,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382550 2023-11-23 21:38:17,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2550393.3333333335, ans=0.125 2023-11-23 21:38:23,665 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:38:43,018 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:38:50,708 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9850, loss[loss=0.07242, simple_loss=0.1105, pruned_loss=0.0107, audio_tagging_loss=0.00648, over 15717.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09217, pruned_loss=0.01385, audio_tagging_loss=0.008864, over 3046466.24 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:38:56,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.850e+01 8.500e+01 9.067e+01 9.984e+01 1.563e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-23 21:39:00,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2550593.3333333335, ans=0.125 2023-11-23 21:39:01,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382600 2023-11-23 21:39:06,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2550660.0, ans=0.125 2023-11-23 21:39:10,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2550660.0, ans=0.0 2023-11-23 21:39:10,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-23 21:39:14,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2023-11-23 21:39:20,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2550726.6666666665, ans=0.0 2023-11-23 21:39:27,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2550793.3333333335, ans=0.0 2023-11-23 21:39:37,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2550793.3333333335, ans=0.125 2023-11-23 21:39:44,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2550860.0, ans=0.0 2023-11-23 21:39:52,825 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9900, loss[loss=0.06412, simple_loss=0.08843, pruned_loss=0.01282, audio_tagging_loss=0.007091, over 15118.00 frames. ], tot_loss[loss=0.06966, simple_loss=0.09353, pruned_loss=0.01412, audio_tagging_loss=0.008779, over 3053292.78 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:40:04,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382650 2023-11-23 21:40:26,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2551060.0, ans=0.05 2023-11-23 21:40:36,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-23 21:40:55,638 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 9950, loss[loss=0.0632, simple_loss=0.08511, pruned_loss=0.01321, audio_tagging_loss=0.007433, over 14959.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.09329, pruned_loss=0.01413, audio_tagging_loss=0.008875, over 3040948.32 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:41:03,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.223e+01 9.080e+01 9.967e+01 1.331e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-23 21:41:07,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382700 2023-11-23 21:41:09,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2551326.6666666665, ans=0.125 2023-11-23 21:41:09,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2551326.6666666665, ans=0.1 2023-11-23 21:41:41,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2551460.0, ans=0.0 2023-11-23 21:41:43,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-23 21:41:44,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2551526.6666666665, ans=0.2 2023-11-23 21:41:46,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2551526.6666666665, ans=0.125 2023-11-23 21:41:52,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2551526.6666666665, ans=0.0 2023-11-23 21:41:59,256 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10000, loss[loss=0.07275, simple_loss=0.1008, pruned_loss=0.01488, audio_tagging_loss=0.007449, over 17218.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09202, pruned_loss=0.01378, audio_tagging_loss=0.008983, over 3050192.82 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:42:09,924 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382750 2023-11-23 21:42:10,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2551660.0, ans=0.1 2023-11-23 21:42:14,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2551660.0, ans=0.2 2023-11-23 21:42:23,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2551726.6666666665, ans=0.125 2023-11-23 21:42:25,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-11-23 21:42:28,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2551726.6666666665, ans=0.125 2023-11-23 21:42:36,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-23 21:42:54,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2551860.0, ans=0.125 2023-11-23 21:43:00,960 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10050, loss[loss=0.04631, simple_loss=0.06191, pruned_loss=0.006171, audio_tagging_loss=0.009179, over 14278.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09132, pruned_loss=0.01368, audio_tagging_loss=0.008953, over 3052532.56 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:43:06,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-23 21:43:08,062 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.368e+01 8.922e+01 9.527e+01 1.170e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 21:43:11,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382800 2023-11-23 21:44:00,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2552193.3333333335, ans=0.125 2023-11-23 21:44:03,500 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10100, loss[loss=0.05857, simple_loss=0.07335, pruned_loss=0.01298, audio_tagging_loss=0.00891, over 14544.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09075, pruned_loss=0.01346, audio_tagging_loss=0.009058, over 3057112.47 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:44:14,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382850 2023-11-23 21:44:16,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2552326.6666666665, ans=0.0 2023-11-23 21:44:26,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2552326.6666666665, ans=0.2 2023-11-23 21:44:29,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=22.5 2023-11-23 21:44:30,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-11-23 21:44:33,884 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:44:53,141 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:44:53,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2552526.6666666665, ans=0.125 2023-11-23 21:44:58,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2552526.6666666665, ans=0.0 2023-11-23 21:44:59,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2552526.6666666665, ans=0.0 2023-11-23 21:45:06,038 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10150, loss[loss=0.06321, simple_loss=0.08791, pruned_loss=0.01202, audio_tagging_loss=0.007232, over 15088.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.0911, pruned_loss=0.01345, audio_tagging_loss=0.009152, over 3062933.18 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:45:06,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2552593.3333333335, ans=0.05 2023-11-23 21:45:14,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.658e+01 9.213e+01 1.003e+02 1.366e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-23 21:45:17,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382900 2023-11-23 21:45:21,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2552660.0, ans=0.1 2023-11-23 21:45:34,770 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:46:00,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2552860.0, ans=0.1 2023-11-23 21:46:08,452 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10200, loss[loss=0.0547, simple_loss=0.07082, pruned_loss=0.008999, audio_tagging_loss=0.01029, over 15519.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09102, pruned_loss=0.01341, audio_tagging_loss=0.009217, over 3062496.81 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:46:11,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-23 21:46:15,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2023-11-23 21:46:15,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2552926.6666666665, ans=0.125 2023-11-23 21:46:19,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 382950 2023-11-23 21:46:28,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2552993.3333333335, ans=0.0 2023-11-23 21:46:30,781 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 21:46:31,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2552993.3333333335, ans=0.1 2023-11-23 21:46:43,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-23 21:47:04,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-11-23 21:47:05,219 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:47:09,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-23 21:47:10,201 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10250, loss[loss=0.06915, simple_loss=0.09788, pruned_loss=0.01237, audio_tagging_loss=0.007838, over 15535.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.091, pruned_loss=0.01341, audio_tagging_loss=0.009275, over 3061834.97 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:47:14,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2553260.0, ans=0.125 2023-11-23 21:47:14,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-23 21:47:18,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.706e+01 9.282e+01 1.010e+02 1.139e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-23 21:47:21,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383000 2023-11-23 21:47:39,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2553393.3333333335, ans=0.125 2023-11-23 21:47:47,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2553460.0, ans=0.0 2023-11-23 21:47:50,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2553460.0, ans=0.2 2023-11-23 21:48:12,184 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10300, loss[loss=0.05372, simple_loss=0.06297, pruned_loss=0.01043, audio_tagging_loss=0.01181, over 13929.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09104, pruned_loss=0.01351, audio_tagging_loss=0.009274, over 3055065.06 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:48:12,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2553593.3333333335, ans=0.0 2023-11-23 21:48:22,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2553593.3333333335, ans=0.5 2023-11-23 21:48:23,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383050 2023-11-23 21:48:28,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.31 vs. limit=10.0 2023-11-23 21:48:32,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2553660.0, ans=0.0 2023-11-23 21:48:36,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-23 21:49:13,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2553926.6666666665, ans=0.1 2023-11-23 21:49:13,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-23 21:49:14,281 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10350, loss[loss=0.08766, simple_loss=0.1171, pruned_loss=0.01806, audio_tagging_loss=0.01104, over 16353.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09202, pruned_loss=0.01356, audio_tagging_loss=0.009277, over 3051787.31 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:49:23,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.433e+01 8.904e+01 9.438e+01 1.477e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-23 21:49:25,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383100 2023-11-23 21:49:32,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2553993.3333333335, ans=0.0 2023-11-23 21:49:34,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2553993.3333333335, ans=0.125 2023-11-23 21:49:36,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-23 21:49:38,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2554060.0, ans=0.125 2023-11-23 21:49:50,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2554060.0, ans=0.2 2023-11-23 21:50:16,703 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10400, loss[loss=0.08132, simple_loss=0.1152, pruned_loss=0.01754, audio_tagging_loss=0.00616, over 15684.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09109, pruned_loss=0.01343, audio_tagging_loss=0.009387, over 3045483.86 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:50:28,807 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383150 2023-11-23 21:50:36,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2554326.6666666665, ans=0.0 2023-11-23 21:50:43,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2554393.3333333335, ans=0.125 2023-11-23 21:50:52,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2554393.3333333335, ans=0.125 2023-11-23 21:50:59,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.48 vs. limit=10.0 2023-11-23 21:51:04,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2554460.0, ans=0.125 2023-11-23 21:51:10,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2554526.6666666665, ans=0.125 2023-11-23 21:51:11,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2554526.6666666665, ans=0.0 2023-11-23 21:51:11,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2554526.6666666665, ans=0.0 2023-11-23 21:51:19,715 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10450, loss[loss=0.05105, simple_loss=0.06792, pruned_loss=0.008533, audio_tagging_loss=0.00856, over 14823.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09103, pruned_loss=0.01346, audio_tagging_loss=0.009286, over 3041784.49 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:51:28,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.248e+01 8.875e+01 9.733e+01 1.236e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 21:51:31,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383200 2023-11-23 21:51:34,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2554660.0, ans=0.125 2023-11-23 21:51:51,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2554726.6666666665, ans=0.1 2023-11-23 21:51:54,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2554726.6666666665, ans=0.2 2023-11-23 21:51:56,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2554793.3333333335, ans=0.125 2023-11-23 21:52:04,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-23 21:52:05,617 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:52:07,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2554793.3333333335, ans=0.125 2023-11-23 21:52:09,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-11-23 21:52:21,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2554926.6666666665, ans=0.0 2023-11-23 21:52:22,443 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10500, loss[loss=0.0599, simple_loss=0.08611, pruned_loss=0.00934, audio_tagging_loss=0.007508, over 16398.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09113, pruned_loss=0.01345, audio_tagging_loss=0.009114, over 3044086.55 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:52:23,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2554926.6666666665, ans=0.015 2023-11-23 21:52:25,164 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:52:32,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2554926.6666666665, ans=0.1 2023-11-23 21:52:32,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2554926.6666666665, ans=0.09899494936611666 2023-11-23 21:52:33,847 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383250 2023-11-23 21:52:46,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2555060.0, ans=0.04949747468305833 2023-11-23 21:53:00,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2555126.6666666665, ans=0.125 2023-11-23 21:53:02,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2555126.6666666665, ans=10.0 2023-11-23 21:53:08,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2555126.6666666665, ans=0.125 2023-11-23 21:53:11,221 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:53:16,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:53:24,427 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10550, loss[loss=0.05755, simple_loss=0.08013, pruned_loss=0.009645, audio_tagging_loss=0.007842, over 14684.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09063, pruned_loss=0.01332, audio_tagging_loss=0.009024, over 3053089.37 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:53:31,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2555260.0, ans=0.125 2023-11-23 21:53:34,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.236e+01 8.886e+01 9.649e+01 1.811e+02, threshold=1.777e+02, percent-clipped=1.0 2023-11-23 21:53:36,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383300 2023-11-23 21:53:39,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2555326.6666666665, ans=0.125 2023-11-23 21:53:47,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2555326.6666666665, ans=0.1 2023-11-23 21:53:54,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2555393.3333333335, ans=0.1 2023-11-23 21:53:58,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2555393.3333333335, ans=0.0 2023-11-23 21:54:03,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2555460.0, ans=0.0 2023-11-23 21:54:23,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2555526.6666666665, ans=0.2 2023-11-23 21:54:26,946 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10600, loss[loss=0.06904, simple_loss=0.09152, pruned_loss=0.01418, audio_tagging_loss=0.009095, over 15018.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09138, pruned_loss=0.01349, audio_tagging_loss=0.008929, over 3049964.24 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:54:30,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2555593.3333333335, ans=0.125 2023-11-23 21:54:35,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2555593.3333333335, ans=0.125 2023-11-23 21:54:38,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383350 2023-11-23 21:54:45,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2023-11-23 21:55:12,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2555793.3333333335, ans=0.125 2023-11-23 21:55:16,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2555860.0, ans=0.125 2023-11-23 21:55:29,647 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10650, loss[loss=0.07229, simple_loss=0.1013, pruned_loss=0.01385, audio_tagging_loss=0.007795, over 15007.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09174, pruned_loss=0.01343, audio_tagging_loss=0.008954, over 3046123.55 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:55:31,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2555926.6666666665, ans=0.125 2023-11-23 21:55:38,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.595e+01 9.053e+01 9.824e+01 1.350e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-23 21:55:40,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383400 2023-11-23 21:55:43,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2555993.3333333335, ans=0.0 2023-11-23 21:55:45,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2555993.3333333335, ans=0.05 2023-11-23 21:55:50,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2555993.3333333335, ans=0.0 2023-11-23 21:56:10,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2556126.6666666665, ans=0.1 2023-11-23 21:56:31,438 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10700, loss[loss=0.04161, simple_loss=0.05931, pruned_loss=0.003578, audio_tagging_loss=0.008374, over 14196.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09062, pruned_loss=0.01314, audio_tagging_loss=0.008944, over 3045228.30 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:56:37,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2556260.0, ans=0.125 2023-11-23 21:56:39,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-11-23 21:56:43,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383450 2023-11-23 21:56:47,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2556326.6666666665, ans=0.0 2023-11-23 21:57:13,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-23 21:57:27,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2556526.6666666665, ans=0.2 2023-11-23 21:57:34,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-23 21:57:35,199 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10750, loss[loss=0.07252, simple_loss=0.09758, pruned_loss=0.01361, audio_tagging_loss=0.01012, over 14236.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09067, pruned_loss=0.01327, audio_tagging_loss=0.008973, over 3050327.49 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:57:45,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.476e+01 9.127e+01 1.007e+02 1.402e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 21:57:46,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383500 2023-11-23 21:58:12,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-23 21:58:13,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2556793.3333333335, ans=0.04949747468305833 2023-11-23 21:58:26,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2556860.0, ans=0.1 2023-11-23 21:58:29,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-23 21:58:35,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2556860.0, ans=0.1 2023-11-23 21:58:37,449 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10800, loss[loss=0.05799, simple_loss=0.08043, pruned_loss=0.009042, audio_tagging_loss=0.008735, over 15251.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09079, pruned_loss=0.01327, audio_tagging_loss=0.008963, over 3052875.91 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 21:58:47,295 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 21:58:48,243 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383550 2023-11-23 21:58:58,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2556993.3333333335, ans=0.2 2023-11-23 21:59:12,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-23 21:59:30,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2557193.3333333335, ans=0.0 2023-11-23 21:59:38,837 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10850, loss[loss=0.05238, simple_loss=0.06803, pruned_loss=0.00831, audio_tagging_loss=0.01005, over 15548.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09065, pruned_loss=0.01337, audio_tagging_loss=0.009005, over 3049753.08 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 21:59:49,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.543e+01 9.235e+01 1.001e+02 1.289e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-23 21:59:50,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383600 2023-11-23 21:59:53,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2557326.6666666665, ans=0.125 2023-11-23 22:00:14,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2557393.3333333335, ans=0.0 2023-11-23 22:00:29,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2557526.6666666665, ans=0.125 2023-11-23 22:00:36,668 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:00:41,466 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10900, loss[loss=0.07826, simple_loss=0.1125, pruned_loss=0.01446, audio_tagging_loss=0.007561, over 15943.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09046, pruned_loss=0.01333, audio_tagging_loss=0.009057, over 3056130.22 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:00:47,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-11-23 22:00:53,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383650 2023-11-23 22:01:01,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2557660.0, ans=0.2 2023-11-23 22:01:04,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.62 vs. limit=10.0 2023-11-23 22:01:44,018 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 10950, loss[loss=0.06983, simple_loss=0.09992, pruned_loss=0.01152, audio_tagging_loss=0.00835, over 15484.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09008, pruned_loss=0.01318, audio_tagging_loss=0.009172, over 3049858.53 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:01:51,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2557926.6666666665, ans=0.125 2023-11-23 22:01:54,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.562e+01 9.166e+01 9.897e+01 1.361e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-23 22:01:54,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383700 2023-11-23 22:01:57,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2557993.3333333335, ans=0.1 2023-11-23 22:02:05,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2023-11-23 22:02:28,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2558126.6666666665, ans=0.0 2023-11-23 22:02:44,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2558260.0, ans=0.125 2023-11-23 22:02:45,432 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11000, loss[loss=0.07015, simple_loss=0.09822, pruned_loss=0.01404, audio_tagging_loss=0.007001, over 15025.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09076, pruned_loss=0.01322, audio_tagging_loss=0.009103, over 3045217.48 frames. ], batch size: 55, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:02:50,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2558260.0, ans=0.125 2023-11-23 22:02:53,631 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:02:56,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383750 2023-11-23 22:02:59,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2558326.6666666665, ans=0.125 2023-11-23 22:03:00,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2558326.6666666665, ans=0.125 2023-11-23 22:03:06,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-23 22:03:10,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2558393.3333333335, ans=0.125 2023-11-23 22:03:18,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2558393.3333333335, ans=0.0 2023-11-23 22:03:28,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2558460.0, ans=0.0 2023-11-23 22:03:40,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2558526.6666666665, ans=0.0 2023-11-23 22:03:47,003 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11050, loss[loss=0.09292, simple_loss=0.1242, pruned_loss=0.02161, audio_tagging_loss=0.009196, over 13856.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09101, pruned_loss=0.01327, audio_tagging_loss=0.009246, over 3050257.36 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:03:48,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2558593.3333333335, ans=0.125 2023-11-23 22:03:58,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.515e+01 9.099e+01 9.905e+01 1.187e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-23 22:03:58,857 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383800 2023-11-23 22:04:02,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-23 22:04:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2558726.6666666665, ans=0.125 2023-11-23 22:04:16,732 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:04:24,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2558793.3333333335, ans=6.0 2023-11-23 22:04:35,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2558860.0, ans=0.1 2023-11-23 22:04:41,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2558860.0, ans=0.2 2023-11-23 22:04:50,541 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11100, loss[loss=0.09542, simple_loss=0.1359, pruned_loss=0.02211, audio_tagging_loss=0.005332, over 15353.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09106, pruned_loss=0.01336, audio_tagging_loss=0.009324, over 3044507.37 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:04:57,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2558926.6666666665, ans=0.09899494936611666 2023-11-23 22:04:59,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-23 22:05:01,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383850 2023-11-23 22:05:05,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2558993.3333333335, ans=0.1 2023-11-23 22:05:05,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-23 22:05:07,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2558993.3333333335, ans=0.125 2023-11-23 22:05:13,376 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:05:31,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2559126.6666666665, ans=0.05 2023-11-23 22:05:34,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-23 22:05:51,949 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11150, loss[loss=0.05479, simple_loss=0.06933, pruned_loss=0.008292, audio_tagging_loss=0.01183, over 15150.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09074, pruned_loss=0.01335, audio_tagging_loss=0.009378, over 3047326.70 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:06:01,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2559260.0, ans=0.04949747468305833 2023-11-23 22:06:02,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.354e+01 8.926e+01 9.504e+01 1.198e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 22:06:02,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383900 2023-11-23 22:06:05,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2559326.6666666665, ans=0.1 2023-11-23 22:06:20,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2559393.3333333335, ans=0.0 2023-11-23 22:06:34,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-11-23 22:06:37,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2559460.0, ans=0.125 2023-11-23 22:06:44,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2559526.6666666665, ans=0.0 2023-11-23 22:06:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2559526.6666666665, ans=0.2 2023-11-23 22:06:53,533 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11200, loss[loss=0.07585, simple_loss=0.09973, pruned_loss=0.01279, audio_tagging_loss=0.0132, over 14671.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09006, pruned_loss=0.01327, audio_tagging_loss=0.009417, over 3040109.51 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:06:56,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2559593.3333333335, ans=0.125 2023-11-23 22:07:04,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 383950 2023-11-23 22:07:06,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-23 22:07:09,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-23 22:07:13,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2559660.0, ans=0.0 2023-11-23 22:07:14,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2559660.0, ans=0.125 2023-11-23 22:07:16,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2559660.0, ans=0.125 2023-11-23 22:07:16,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2559660.0, ans=0.2 2023-11-23 22:07:24,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-11-23 22:07:33,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-23 22:07:42,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2559860.0, ans=0.125 2023-11-23 22:07:47,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2559860.0, ans=0.0 2023-11-23 22:07:51,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2559860.0, ans=0.125 2023-11-23 22:07:55,765 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11250, loss[loss=0.07397, simple_loss=0.09676, pruned_loss=0.01493, audio_tagging_loss=0.01065, over 15447.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09079, pruned_loss=0.01352, audio_tagging_loss=0.00941, over 3037748.35 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:08:06,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2559926.6666666665, ans=0.125 2023-11-23 22:08:07,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.428e+01 9.033e+01 9.633e+01 1.168e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-23 22:08:07,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384000 2023-11-23 22:08:29,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2560060.0, ans=0.0 2023-11-23 22:08:38,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2560126.6666666665, ans=10.0 2023-11-23 22:08:39,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2560126.6666666665, ans=0.1 2023-11-23 22:08:41,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2560126.6666666665, ans=0.035 2023-11-23 22:09:02,409 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11300, loss[loss=0.06167, simple_loss=0.08273, pruned_loss=0.01117, audio_tagging_loss=0.009133, over 15270.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09056, pruned_loss=0.01353, audio_tagging_loss=0.009389, over 3034459.50 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:09:11,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2560260.0, ans=0.2 2023-11-23 22:09:12,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-23 22:09:13,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384050 2023-11-23 22:09:23,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2560326.6666666665, ans=0.0 2023-11-23 22:09:36,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2560393.3333333335, ans=0.1 2023-11-23 22:10:04,326 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11350, loss[loss=0.06323, simple_loss=0.07933, pruned_loss=0.01203, audio_tagging_loss=0.01154, over 14823.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09118, pruned_loss=0.01369, audio_tagging_loss=0.009259, over 3035465.48 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:10:16,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 8.206e+01 9.003e+01 9.537e+01 1.148e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-23 22:10:16,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384100 2023-11-23 22:10:41,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2560793.3333333335, ans=0.1 2023-11-23 22:10:42,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2560793.3333333335, ans=0.125 2023-11-23 22:11:00,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2560860.0, ans=0.125 2023-11-23 22:11:07,775 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11400, loss[loss=0.09115, simple_loss=0.1227, pruned_loss=0.0223, audio_tagging_loss=0.007479, over 15654.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09224, pruned_loss=0.0139, audio_tagging_loss=0.009109, over 3038697.37 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:11:18,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384150 2023-11-23 22:11:19,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2560993.3333333335, ans=0.1 2023-11-23 22:11:31,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2561060.0, ans=10.0 2023-11-23 22:11:32,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-23 22:11:33,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2561060.0, ans=0.125 2023-11-23 22:12:10,348 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11450, loss[loss=0.07162, simple_loss=0.09593, pruned_loss=0.01621, audio_tagging_loss=0.00745, over 14683.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09215, pruned_loss=0.01387, audio_tagging_loss=0.009071, over 3029725.82 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:12:16,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2561260.0, ans=0.125 2023-11-23 22:12:20,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2561260.0, ans=0.0 2023-11-23 22:12:21,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384200 2023-11-23 22:12:22,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.398e+01 8.984e+01 9.548e+01 1.185e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-23 22:12:26,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2561326.6666666665, ans=0.1 2023-11-23 22:12:31,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2561326.6666666665, ans=0.2 2023-11-23 22:12:36,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-23 22:13:09,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2561526.6666666665, ans=0.125 2023-11-23 22:13:11,990 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11500, loss[loss=0.09654, simple_loss=0.1353, pruned_loss=0.02183, audio_tagging_loss=0.007077, over 15545.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09199, pruned_loss=0.01385, audio_tagging_loss=0.009066, over 3035336.12 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:13:14,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2561593.3333333335, ans=0.0 2023-11-23 22:13:19,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2561593.3333333335, ans=0.0 2023-11-23 22:13:20,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-23 22:13:23,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384250 2023-11-23 22:14:04,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2561860.0, ans=0.125 2023-11-23 22:14:14,451 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11550, loss[loss=0.05019, simple_loss=0.06635, pruned_loss=0.008442, audio_tagging_loss=0.008571, over 13687.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09232, pruned_loss=0.01389, audio_tagging_loss=0.009014, over 3038624.56 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2023-11-23 22:14:25,759 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384300 2023-11-23 22:14:26,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.220e+01 8.896e+01 9.591e+01 1.198e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-23 22:14:37,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2561993.3333333335, ans=0.125 2023-11-23 22:14:50,979 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:15:02,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2562193.3333333335, ans=0.125 2023-11-23 22:15:04,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2562193.3333333335, ans=0.125 2023-11-23 22:15:11,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2562193.3333333335, ans=0.125 2023-11-23 22:15:15,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=12.0 2023-11-23 22:15:16,196 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11600, loss[loss=0.06107, simple_loss=0.07976, pruned_loss=0.01173, audio_tagging_loss=0.00946, over 16670.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09203, pruned_loss=0.01393, audio_tagging_loss=0.009074, over 3035365.93 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:15:22,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2562260.0, ans=0.07 2023-11-23 22:15:27,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384350 2023-11-23 22:15:36,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-11-23 22:15:37,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2562326.6666666665, ans=0.125 2023-11-23 22:15:43,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-11-23 22:16:06,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2562526.6666666665, ans=0.1 2023-11-23 22:16:10,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2562526.6666666665, ans=0.0 2023-11-23 22:16:12,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-23 22:16:18,784 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11650, loss[loss=0.0661, simple_loss=0.08055, pruned_loss=0.0146, audio_tagging_loss=0.01123, over 16180.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09256, pruned_loss=0.01381, audio_tagging_loss=0.00908, over 3041564.32 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:16:30,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384400 2023-11-23 22:16:31,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.387e+01 8.924e+01 9.715e+01 1.157e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-23 22:16:35,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2562660.0, ans=0.125 2023-11-23 22:16:49,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-23 22:16:54,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2562726.6666666665, ans=0.125 2023-11-23 22:16:59,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2562793.3333333335, ans=0.1 2023-11-23 22:17:22,174 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11700, loss[loss=0.0649, simple_loss=0.08457, pruned_loss=0.01281, audio_tagging_loss=0.009797, over 14395.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09198, pruned_loss=0.01368, audio_tagging_loss=0.009075, over 3044154.47 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 32.0 2023-11-23 22:17:33,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384450 2023-11-23 22:17:39,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2562993.3333333335, ans=0.125 2023-11-23 22:17:54,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2563060.0, ans=0.125 2023-11-23 22:18:04,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2563126.6666666665, ans=0.2 2023-11-23 22:18:24,594 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11750, loss[loss=0.07248, simple_loss=0.1001, pruned_loss=0.01627, audio_tagging_loss=0.006151, over 17021.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09182, pruned_loss=0.01364, audio_tagging_loss=0.009123, over 3048637.78 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 32.0 2023-11-23 22:18:25,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2563260.0, ans=22.5 2023-11-23 22:18:27,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-23 22:18:34,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2563260.0, ans=0.125 2023-11-23 22:18:34,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2563260.0, ans=0.125 2023-11-23 22:18:35,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384500 2023-11-23 22:18:36,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.506e+01 8.915e+01 9.648e+01 1.548e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-23 22:18:37,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-23 22:18:42,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=22.5 2023-11-23 22:19:01,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2563460.0, ans=0.125 2023-11-23 22:19:09,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-23 22:19:12,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2023-11-23 22:19:20,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2563526.6666666665, ans=0.125 2023-11-23 22:19:26,092 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11800, loss[loss=0.04911, simple_loss=0.05744, pruned_loss=0.007567, audio_tagging_loss=0.01282, over 14539.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.0918, pruned_loss=0.01365, audio_tagging_loss=0.009138, over 3041612.03 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 32.0 2023-11-23 22:19:37,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384550 2023-11-23 22:19:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2563660.0, ans=0.0 2023-11-23 22:20:11,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2563793.3333333335, ans=0.125 2023-11-23 22:20:16,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2563860.0, ans=0.125 2023-11-23 22:20:28,403 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11850, loss[loss=0.08003, simple_loss=0.1122, pruned_loss=0.01504, audio_tagging_loss=0.008888, over 15400.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09202, pruned_loss=0.0137, audio_tagging_loss=0.009211, over 3041449.50 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 32.0 2023-11-23 22:20:39,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384600 2023-11-23 22:20:41,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.384e+01 8.994e+01 9.780e+01 1.224e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 22:20:43,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2023-11-23 22:20:45,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-23 22:20:51,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2563993.3333333335, ans=0.1 2023-11-23 22:20:55,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2564060.0, ans=0.125 2023-11-23 22:21:15,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2564126.6666666665, ans=0.2 2023-11-23 22:21:16,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2564126.6666666665, ans=0.2 2023-11-23 22:21:30,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2564260.0, ans=0.2 2023-11-23 22:21:31,105 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11900, loss[loss=0.07957, simple_loss=0.09964, pruned_loss=0.02131, audio_tagging_loss=0.00844, over 14879.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09164, pruned_loss=0.01354, audio_tagging_loss=0.00925, over 3043740.44 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 16.0 2023-11-23 22:21:37,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2023-11-23 22:21:38,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2564260.0, ans=0.09899494936611666 2023-11-23 22:21:41,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384650 2023-11-23 22:21:42,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-23 22:21:47,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2564326.6666666665, ans=0.1 2023-11-23 22:22:23,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2564526.6666666665, ans=0.125 2023-11-23 22:22:23,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2564526.6666666665, ans=0.0 2023-11-23 22:22:27,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2564526.6666666665, ans=0.2 2023-11-23 22:22:32,788 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 11950, loss[loss=0.05241, simple_loss=0.07034, pruned_loss=0.006405, audio_tagging_loss=0.01084, over 15456.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09158, pruned_loss=0.01355, audio_tagging_loss=0.009225, over 3039482.15 frames. ], batch size: 59, lr: 2.10e-03, grad_scale: 16.0 2023-11-23 22:22:36,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2564593.3333333335, ans=0.2 2023-11-23 22:22:43,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384700 2023-11-23 22:22:46,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.381e+01 9.044e+01 9.786e+01 1.119e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-23 22:23:12,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2564793.3333333335, ans=0.0 2023-11-23 22:23:22,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2564860.0, ans=0.0 2023-11-23 22:23:31,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2023-11-23 22:23:32,160 INFO [train_asr.py:1221] (1/4) Epoch 32, batch 12000, loss[loss=0.06733, simple_loss=0.08157, pruned_loss=0.01531, audio_tagging_loss=0.01124, over 14595.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.092, pruned_loss=0.01359, audio_tagging_loss=0.009262, over 3045704.87 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 32.0 2023-11-23 22:23:32,161 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 22:24:14,275 INFO [train_asr.py:1253] (1/4) Epoch 32, validation: loss=0.05848, simple_loss=0.0511, pruned_loss=0.005239, audio_tagging_loss=0.02769, over 4681554.00 frames. 2023-11-23 22:24:14,275 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 22:24:20,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-23 22:24:24,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384750 2023-11-23 22:24:31,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2564993.3333333335, ans=0.05 2023-11-23 22:24:35,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-23 22:24:38,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-23 22:25:14,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2565086.6666666665, ans=0.1 2023-11-23 22:25:15,127 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 0, loss[loss=0.07483, simple_loss=0.08404, pruned_loss=0.01156, audio_tagging_loss=0.02125, over 16700.00 frames. ], tot_loss[loss=0.07483, simple_loss=0.08404, pruned_loss=0.01156, audio_tagging_loss=0.02125, over 16700.00 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:25:15,128 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 22:25:50,700 INFO [train_asr.py:1253] (1/4) Epoch 33, validation: loss=0.05781, simple_loss=0.05104, pruned_loss=0.005203, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-23 22:25:50,701 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 22:25:56,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2565086.6666666665, ans=0.125 2023-11-23 22:25:58,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2565086.6666666665, ans=0.125 2023-11-23 22:26:13,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2565153.3333333335, ans=0.95 2023-11-23 22:26:14,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-11-23 22:26:19,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2565220.0, ans=0.2 2023-11-23 22:26:26,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2565220.0, ans=10.0 2023-11-23 22:26:34,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384800 2023-11-23 22:26:35,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2565286.6666666665, ans=0.1 2023-11-23 22:26:38,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.944e+01 9.729e+01 1.034e+02 1.354e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-23 22:26:39,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2565353.3333333335, ans=0.0 2023-11-23 22:26:44,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2565353.3333333335, ans=0.125 2023-11-23 22:26:46,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2565353.3333333335, ans=0.0 2023-11-23 22:26:52,490 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 50, loss[loss=0.08276, simple_loss=0.1044, pruned_loss=0.01567, audio_tagging_loss=0.01486, over 14790.00 frames. ], tot_loss[loss=0.07607, simple_loss=0.09084, pruned_loss=0.01334, audio_tagging_loss=0.01732, over 692116.40 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:27:00,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2565420.0, ans=0.0 2023-11-23 22:27:00,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2565420.0, ans=0.1 2023-11-23 22:27:27,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-23 22:27:36,568 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384850 2023-11-23 22:27:49,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2565686.6666666665, ans=0.125 2023-11-23 22:27:52,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2565686.6666666665, ans=0.0 2023-11-23 22:27:52,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2565686.6666666665, ans=0.1 2023-11-23 22:27:55,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2565753.3333333335, ans=0.0 2023-11-23 22:27:56,909 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 100, loss[loss=0.07797, simple_loss=0.1004, pruned_loss=0.01388, audio_tagging_loss=0.01389, over 15627.00 frames. ], tot_loss[loss=0.07628, simple_loss=0.09263, pruned_loss=0.01361, audio_tagging_loss=0.01636, over 1211730.31 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:28:19,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2565886.6666666665, ans=0.1 2023-11-23 22:28:32,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2565953.3333333335, ans=0.125 2023-11-23 22:28:34,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-23 22:28:38,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384900 2023-11-23 22:28:43,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.971e+01 9.700e+01 1.041e+02 1.375e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-23 22:28:54,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2566020.0, ans=6.0 2023-11-23 22:28:57,722 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 150, loss[loss=0.08589, simple_loss=0.1201, pruned_loss=0.01737, audio_tagging_loss=0.008468, over 15406.00 frames. ], tot_loss[loss=0.07609, simple_loss=0.09498, pruned_loss=0.01395, audio_tagging_loss=0.01465, over 1614483.05 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:28:59,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2566086.6666666665, ans=0.1 2023-11-23 22:29:15,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2566153.3333333335, ans=0.125 2023-11-23 22:29:27,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2566220.0, ans=0.125 2023-11-23 22:29:29,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2566220.0, ans=0.2 2023-11-23 22:29:40,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2566286.6666666665, ans=0.2 2023-11-23 22:29:41,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 384950 2023-11-23 22:29:42,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-23 22:29:54,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2566353.3333333335, ans=0.0 2023-11-23 22:29:59,068 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 200, loss[loss=0.06271, simple_loss=0.088, pruned_loss=0.0103, audio_tagging_loss=0.008411, over 14561.00 frames. ], tot_loss[loss=0.07359, simple_loss=0.09391, pruned_loss=0.01374, audio_tagging_loss=0.01289, over 1935930.23 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:30:26,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2566553.3333333335, ans=0.2 2023-11-23 22:30:27,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2566553.3333333335, ans=0.2 2023-11-23 22:30:34,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2566553.3333333335, ans=0.125 2023-11-23 22:30:39,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2566620.0, ans=0.2 2023-11-23 22:30:40,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2566620.0, ans=0.125 2023-11-23 22:30:40,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2566620.0, ans=0.1 2023-11-23 22:30:42,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385000 2023-11-23 22:30:46,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.668e+01 9.124e+01 1.011e+02 1.346e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-23 22:31:01,362 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 250, loss[loss=0.05796, simple_loss=0.06548, pruned_loss=0.01145, audio_tagging_loss=0.01377, over 15294.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.09413, pruned_loss=0.01384, audio_tagging_loss=0.01168, over 2185127.69 frames. ], batch size: 61, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:31:01,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2566753.3333333335, ans=0.1 2023-11-23 22:31:08,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2566753.3333333335, ans=0.95 2023-11-23 22:31:11,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2566753.3333333335, ans=0.125 2023-11-23 22:31:11,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2566753.3333333335, ans=0.125 2023-11-23 22:31:21,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2566820.0, ans=0.0 2023-11-23 22:31:27,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2566886.6666666665, ans=0.125 2023-11-23 22:31:35,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:31:42,736 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:31:44,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385050 2023-11-23 22:32:03,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-23 22:32:04,231 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 300, loss[loss=0.06841, simple_loss=0.09456, pruned_loss=0.01507, audio_tagging_loss=0.006063, over 15481.00 frames. ], tot_loss[loss=0.07246, simple_loss=0.09519, pruned_loss=0.01413, audio_tagging_loss=0.01073, over 2378665.81 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:32:04,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=2567086.6666666665, ans=15.0 2023-11-23 22:32:10,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2567086.6666666665, ans=0.0 2023-11-23 22:32:11,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2567086.6666666665, ans=0.1 2023-11-23 22:32:19,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2567153.3333333335, ans=0.125 2023-11-23 22:32:31,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2567220.0, ans=0.125 2023-11-23 22:32:47,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385100 2023-11-23 22:32:51,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.686e+01 8.804e+01 9.384e+01 1.010e+02 1.261e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-23 22:33:00,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2567353.3333333335, ans=0.1 2023-11-23 22:33:05,399 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 350, loss[loss=0.06013, simple_loss=0.0789, pruned_loss=0.01106, audio_tagging_loss=0.009618, over 15538.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09329, pruned_loss=0.01372, audio_tagging_loss=0.0103, over 2524418.42 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:33:29,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2567486.6666666665, ans=0.125 2023-11-23 22:33:41,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2567553.3333333335, ans=0.125 2023-11-23 22:33:48,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-23 22:33:49,632 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385150 2023-11-23 22:33:53,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2567620.0, ans=0.0 2023-11-23 22:33:54,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2567686.6666666665, ans=0.125 2023-11-23 22:34:07,551 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 400, loss[loss=0.06381, simple_loss=0.08427, pruned_loss=0.01063, audio_tagging_loss=0.01104, over 16058.00 frames. ], tot_loss[loss=0.0707, simple_loss=0.09376, pruned_loss=0.01392, audio_tagging_loss=0.009899, over 2644983.86 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:34:11,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2567753.3333333335, ans=15.0 2023-11-23 22:34:34,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2567886.6666666665, ans=0.125 2023-11-23 22:34:35,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2567886.6666666665, ans=0.04949747468305833 2023-11-23 22:34:48,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-23 22:34:51,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385200 2023-11-23 22:34:51,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2567953.3333333335, ans=0.125 2023-11-23 22:34:51,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-23 22:34:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2567953.3333333335, ans=0.125 2023-11-23 22:34:54,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.325e+01 8.873e+01 9.611e+01 1.385e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-23 22:34:59,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2568020.0, ans=0.125 2023-11-23 22:35:02,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2568020.0, ans=0.2 2023-11-23 22:35:11,083 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 450, loss[loss=0.06723, simple_loss=0.09527, pruned_loss=0.01072, audio_tagging_loss=0.008877, over 16056.00 frames. ], tot_loss[loss=0.07059, simple_loss=0.09393, pruned_loss=0.01399, audio_tagging_loss=0.009635, over 2731745.53 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:35:35,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2568220.0, ans=0.0 2023-11-23 22:35:35,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2568220.0, ans=0.125 2023-11-23 22:35:41,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2568220.0, ans=0.125 2023-11-23 22:35:54,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385250 2023-11-23 22:36:12,788 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 500, loss[loss=0.06622, simple_loss=0.08569, pruned_loss=0.01501, audio_tagging_loss=0.00837, over 14622.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09317, pruned_loss=0.01395, audio_tagging_loss=0.009458, over 2792996.19 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:36:15,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2568420.0, ans=0.125 2023-11-23 22:36:37,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-23 22:36:37,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2568553.3333333335, ans=0.125 2023-11-23 22:36:51,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2568620.0, ans=0.0 2023-11-23 22:36:57,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385300 2023-11-23 22:37:00,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.409e+01 9.142e+01 9.799e+01 1.482e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 22:37:15,573 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 550, loss[loss=0.04193, simple_loss=0.0519, pruned_loss=0.006074, audio_tagging_loss=0.009908, over 13836.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09205, pruned_loss=0.01375, audio_tagging_loss=0.009433, over 2846365.95 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:37:34,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2568820.0, ans=0.125 2023-11-23 22:37:58,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385350 2023-11-23 22:37:59,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2568953.3333333335, ans=0.0 2023-11-23 22:38:01,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2568953.3333333335, ans=0.125 2023-11-23 22:38:07,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:38:17,970 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 600, loss[loss=0.0679, simple_loss=0.09577, pruned_loss=0.0123, audio_tagging_loss=0.007717, over 16039.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09228, pruned_loss=0.01387, audio_tagging_loss=0.009347, over 2888790.04 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:38:33,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2569153.3333333335, ans=0.125 2023-11-23 22:39:01,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385400 2023-11-23 22:39:06,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.472e+01 9.172e+01 9.611e+01 1.258e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-23 22:39:11,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2569353.3333333335, ans=0.125 2023-11-23 22:39:20,474 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 650, loss[loss=0.06467, simple_loss=0.0891, pruned_loss=0.01232, audio_tagging_loss=0.007806, over 15425.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09242, pruned_loss=0.01377, audio_tagging_loss=0.009367, over 2915082.19 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:39:25,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:39:29,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2023-11-23 22:39:59,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-11-23 22:40:01,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-23 22:40:04,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385450 2023-11-23 22:40:04,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2569620.0, ans=0.125 2023-11-23 22:40:06,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2569620.0, ans=0.1 2023-11-23 22:40:06,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2569620.0, ans=0.125 2023-11-23 22:40:12,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2569686.6666666665, ans=0.0 2023-11-23 22:40:17,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2569686.6666666665, ans=0.0 2023-11-23 22:40:22,208 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 700, loss[loss=0.06295, simple_loss=0.0843, pruned_loss=0.01145, audio_tagging_loss=0.009357, over 15510.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09279, pruned_loss=0.01363, audio_tagging_loss=0.009347, over 2952428.24 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:40:25,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2569753.3333333335, ans=0.1 2023-11-23 22:40:45,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2569820.0, ans=0.1 2023-11-23 22:40:45,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2023-11-23 22:40:54,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-23 22:41:06,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385500 2023-11-23 22:41:10,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-23 22:41:11,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.479e+01 9.169e+01 9.632e+01 1.119e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-23 22:41:25,250 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 750, loss[loss=0.08393, simple_loss=0.1041, pruned_loss=0.02113, audio_tagging_loss=0.01074, over 15756.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09292, pruned_loss=0.01365, audio_tagging_loss=0.00929, over 2970993.98 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:41:26,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2570086.6666666665, ans=0.0 2023-11-23 22:41:32,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2570086.6666666665, ans=0.2 2023-11-23 22:41:36,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2570153.3333333335, ans=0.125 2023-11-23 22:41:57,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2570220.0, ans=0.0 2023-11-23 22:42:08,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-23 22:42:09,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385550 2023-11-23 22:42:21,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:42:27,203 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 800, loss[loss=0.05903, simple_loss=0.07192, pruned_loss=0.01226, audio_tagging_loss=0.01081, over 14841.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09188, pruned_loss=0.01345, audio_tagging_loss=0.009285, over 2984725.65 frames. ], batch size: 54, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:42:41,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2570486.6666666665, ans=0.125 2023-11-23 22:43:08,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2570620.0, ans=0.0 2023-11-23 22:43:11,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385600 2023-11-23 22:43:11,505 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:43:14,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-23 22:43:16,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.735e+01 9.399e+01 1.001e+02 1.314e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-23 22:43:30,059 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 850, loss[loss=0.05987, simple_loss=0.08711, pruned_loss=0.00835, audio_tagging_loss=0.007969, over 14656.00 frames. ], tot_loss[loss=0.07019, simple_loss=0.09418, pruned_loss=0.01391, audio_tagging_loss=0.009185, over 3008325.15 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:43:41,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2570753.3333333335, ans=0.025 2023-11-23 22:43:48,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2570820.0, ans=0.0 2023-11-23 22:43:51,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2570820.0, ans=0.0 2023-11-23 22:44:01,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-23 22:44:07,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-23 22:44:14,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385650 2023-11-23 22:44:33,791 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 900, loss[loss=0.05428, simple_loss=0.06949, pruned_loss=0.011, audio_tagging_loss=0.008536, over 14116.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09222, pruned_loss=0.01355, audio_tagging_loss=0.00929, over 3011769.59 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:44:34,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2571086.6666666665, ans=0.125 2023-11-23 22:44:35,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2571086.6666666665, ans=0.0 2023-11-23 22:44:42,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2571086.6666666665, ans=0.07 2023-11-23 22:44:54,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2571153.3333333335, ans=0.125 2023-11-23 22:45:17,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385700 2023-11-23 22:45:22,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.319e+01 9.000e+01 1.000e+02 1.242e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-23 22:45:34,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2571420.0, ans=0.125 2023-11-23 22:45:34,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-23 22:45:35,454 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 950, loss[loss=0.07061, simple_loss=0.1014, pruned_loss=0.01055, audio_tagging_loss=0.009342, over 15009.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09259, pruned_loss=0.01373, audio_tagging_loss=0.009193, over 3017134.54 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:46:19,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385750 2023-11-23 22:46:21,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2571620.0, ans=0.0 2023-11-23 22:46:37,418 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1000, loss[loss=0.06437, simple_loss=0.08474, pruned_loss=0.01474, audio_tagging_loss=0.007259, over 14827.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.0924, pruned_loss=0.01365, audio_tagging_loss=0.008989, over 3030958.50 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:46:42,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2571753.3333333335, ans=0.0 2023-11-23 22:46:48,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2571820.0, ans=0.07 2023-11-23 22:46:58,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-23 22:47:04,034 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:47:07,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2571886.6666666665, ans=0.125 2023-11-23 22:47:09,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2571886.6666666665, ans=0.09899494936611666 2023-11-23 22:47:10,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2571886.6666666665, ans=0.2 2023-11-23 22:47:20,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385800 2023-11-23 22:47:25,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2571953.3333333335, ans=0.125 2023-11-23 22:47:27,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.356e+01 8.922e+01 9.562e+01 1.199e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 22:47:40,644 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1050, loss[loss=0.07889, simple_loss=0.1158, pruned_loss=0.01396, audio_tagging_loss=0.00704, over 15434.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09185, pruned_loss=0.01362, audio_tagging_loss=0.00896, over 3030306.84 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:48:23,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2572286.6666666665, ans=0.2 2023-11-23 22:48:24,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385850 2023-11-23 22:48:38,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2572353.3333333335, ans=0.0 2023-11-23 22:48:42,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2572420.0, ans=0.2 2023-11-23 22:48:43,141 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1100, loss[loss=0.06291, simple_loss=0.08624, pruned_loss=0.01118, audio_tagging_loss=0.008606, over 15363.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.09139, pruned_loss=0.01349, audio_tagging_loss=0.008974, over 3031463.48 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:48:45,568 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:49:00,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2572486.6666666665, ans=0.0 2023-11-23 22:49:27,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385900 2023-11-23 22:49:33,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.297e+01 8.837e+01 9.387e+01 1.175e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-23 22:49:35,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2572686.6666666665, ans=0.0 2023-11-23 22:49:45,373 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1150, loss[loss=0.08036, simple_loss=0.1033, pruned_loss=0.01907, audio_tagging_loss=0.009666, over 13954.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09186, pruned_loss=0.01366, audio_tagging_loss=0.008914, over 3026911.74 frames. ], batch size: 54, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:49:53,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2572753.3333333335, ans=0.125 2023-11-23 22:50:19,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2572886.6666666665, ans=0.0 2023-11-23 22:50:29,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 385950 2023-11-23 22:50:29,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2572953.3333333335, ans=0.125 2023-11-23 22:50:42,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2573020.0, ans=0.0 2023-11-23 22:50:43,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573020.0, ans=0.1 2023-11-23 22:50:47,799 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1200, loss[loss=0.07379, simple_loss=0.1078, pruned_loss=0.0128, audio_tagging_loss=0.007103, over 15975.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09263, pruned_loss=0.0138, audio_tagging_loss=0.008808, over 3036235.60 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:50:51,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2573086.6666666665, ans=0.1 2023-11-23 22:51:03,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573153.3333333335, ans=0.1 2023-11-23 22:51:16,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2573220.0, ans=0.125 2023-11-23 22:51:20,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2573220.0, ans=0.125 2023-11-23 22:51:31,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386000 2023-11-23 22:51:38,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.408e+01 9.187e+01 9.835e+01 1.432e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-23 22:51:49,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573420.0, ans=0.1 2023-11-23 22:51:50,524 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1250, loss[loss=0.0725, simple_loss=0.09143, pruned_loss=0.01741, audio_tagging_loss=0.009373, over 15458.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09182, pruned_loss=0.01367, audio_tagging_loss=0.008872, over 3036302.57 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:52:18,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2023-11-23 22:52:24,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2573553.3333333335, ans=0.0 2023-11-23 22:52:30,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2573620.0, ans=0.125 2023-11-23 22:52:34,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2573620.0, ans=0.0 2023-11-23 22:52:35,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386050 2023-11-23 22:52:40,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2573686.6666666665, ans=0.2 2023-11-23 22:52:52,989 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1300, loss[loss=0.07815, simple_loss=0.108, pruned_loss=0.01431, audio_tagging_loss=0.00985, over 14317.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.0919, pruned_loss=0.01377, audio_tagging_loss=0.008868, over 3031664.38 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:53:17,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2573820.0, ans=0.125 2023-11-23 22:53:20,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2573886.6666666665, ans=0.0 2023-11-23 22:53:24,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2023-11-23 22:53:27,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2573886.6666666665, ans=0.125 2023-11-23 22:53:32,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2023-11-23 22:53:36,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386100 2023-11-23 22:53:42,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.267e+01 8.921e+01 9.505e+01 1.791e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-23 22:53:55,057 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1350, loss[loss=0.06673, simple_loss=0.08941, pruned_loss=0.01182, audio_tagging_loss=0.0102, over 14160.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09121, pruned_loss=0.01363, audio_tagging_loss=0.008851, over 3031819.12 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:54:20,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2574220.0, ans=0.1 2023-11-23 22:54:31,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2574286.6666666665, ans=0.125 2023-11-23 22:54:38,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386150 2023-11-23 22:54:39,675 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 22:54:43,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2574353.3333333335, ans=0.125 2023-11-23 22:54:43,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2574353.3333333335, ans=0.0 2023-11-23 22:54:57,758 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1400, loss[loss=0.06061, simple_loss=0.07344, pruned_loss=0.01081, audio_tagging_loss=0.01308, over 15282.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09118, pruned_loss=0.01369, audio_tagging_loss=0.00892, over 3041040.07 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:55:12,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2574486.6666666665, ans=0.0 2023-11-23 22:55:38,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2574620.0, ans=0.1 2023-11-23 22:55:41,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386200 2023-11-23 22:55:45,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2574620.0, ans=0.125 2023-11-23 22:55:47,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2574686.6666666665, ans=0.0 2023-11-23 22:55:48,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.327e+01 9.200e+01 9.811e+01 1.262e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-23 22:55:54,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2574686.6666666665, ans=0.0 2023-11-23 22:55:55,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-11-23 22:55:58,972 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1450, loss[loss=0.07548, simple_loss=0.1006, pruned_loss=0.01763, audio_tagging_loss=0.007559, over 16764.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09105, pruned_loss=0.01369, audio_tagging_loss=0.009032, over 3044960.67 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:56:07,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2574753.3333333335, ans=0.0 2023-11-23 22:56:15,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2574820.0, ans=0.2 2023-11-23 22:56:15,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-23 22:56:23,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-23 22:56:27,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2574886.6666666665, ans=0.2 2023-11-23 22:56:42,092 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386250 2023-11-23 22:57:00,661 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1500, loss[loss=0.05558, simple_loss=0.07137, pruned_loss=0.006968, audio_tagging_loss=0.01292, over 14071.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.0909, pruned_loss=0.01354, audio_tagging_loss=0.009131, over 3038873.51 frames. ], batch size: 54, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:57:05,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2575086.6666666665, ans=0.2 2023-11-23 22:57:21,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2575153.3333333335, ans=0.2 2023-11-23 22:57:24,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-23 22:57:26,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:57:38,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2575286.6666666665, ans=0.04949747468305833 2023-11-23 22:57:40,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2575286.6666666665, ans=0.07 2023-11-23 22:57:44,167 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386300 2023-11-23 22:57:47,365 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 22:57:51,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.322e+01 9.192e+01 9.802e+01 2.224e+02, threshold=1.838e+02, percent-clipped=1.0 2023-11-23 22:58:04,258 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1550, loss[loss=0.06551, simple_loss=0.08484, pruned_loss=0.01228, audio_tagging_loss=0.01081, over 14074.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09115, pruned_loss=0.01362, audio_tagging_loss=0.009157, over 3031933.36 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 22:58:29,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2575553.3333333335, ans=0.2 2023-11-23 22:58:42,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2575620.0, ans=0.125 2023-11-23 22:58:46,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-23 22:58:47,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386350 2023-11-23 22:58:54,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2575686.6666666665, ans=0.1 2023-11-23 22:59:05,869 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1600, loss[loss=0.07201, simple_loss=0.08742, pruned_loss=0.0167, audio_tagging_loss=0.0116, over 14529.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09157, pruned_loss=0.01369, audio_tagging_loss=0.009237, over 3034293.96 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 22:59:12,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2575753.3333333335, ans=0.0 2023-11-23 22:59:27,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-23 22:59:44,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2575953.3333333335, ans=0.05 2023-11-23 22:59:45,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-23 22:59:49,994 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386400 2023-11-23 22:59:54,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2575953.3333333335, ans=0.125 2023-11-23 22:59:57,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.368e+01 9.062e+01 9.713e+01 1.291e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-23 23:00:06,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2576020.0, ans=0.125 2023-11-23 23:00:08,291 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1650, loss[loss=0.05235, simple_loss=0.06534, pruned_loss=0.008217, audio_tagging_loss=0.01146, over 15628.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09111, pruned_loss=0.01362, audio_tagging_loss=0.00926, over 3038371.41 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:00:37,441 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:00:39,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-23 23:00:43,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2576220.0, ans=0.5 2023-11-23 23:00:46,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2576286.6666666665, ans=0.2 2023-11-23 23:00:52,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386450 2023-11-23 23:00:52,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2576286.6666666665, ans=0.0 2023-11-23 23:00:59,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2576353.3333333335, ans=0.0 2023-11-23 23:01:02,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2576353.3333333335, ans=0.125 2023-11-23 23:01:09,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2576353.3333333335, ans=0.0 2023-11-23 23:01:12,026 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1700, loss[loss=0.04092, simple_loss=0.05301, pruned_loss=0.004188, audio_tagging_loss=0.01023, over 14790.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09082, pruned_loss=0.0135, audio_tagging_loss=0.009334, over 3029734.48 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:01:26,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2576486.6666666665, ans=0.2 2023-11-23 23:01:42,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-23 23:01:55,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386500 2023-11-23 23:01:55,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2576620.0, ans=0.125 2023-11-23 23:02:00,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2576686.6666666665, ans=0.1 2023-11-23 23:02:03,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.350e+01 8.996e+01 9.717e+01 1.219e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-23 23:02:14,542 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1750, loss[loss=0.06676, simple_loss=0.09528, pruned_loss=0.01001, audio_tagging_loss=0.009116, over 15211.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09069, pruned_loss=0.01344, audio_tagging_loss=0.009206, over 3036544.39 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:02:19,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2576753.3333333335, ans=0.125 2023-11-23 23:02:40,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2576886.6666666665, ans=0.125 2023-11-23 23:02:46,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-23 23:02:52,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2576953.3333333335, ans=0.125 2023-11-23 23:02:57,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-23 23:02:58,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386550 2023-11-23 23:03:04,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2577020.0, ans=0.125 2023-11-23 23:03:04,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2577020.0, ans=0.1 2023-11-23 23:03:07,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2577020.0, ans=0.0 2023-11-23 23:03:11,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-11-23 23:03:15,777 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1800, loss[loss=0.04885, simple_loss=0.05974, pruned_loss=0.008775, audio_tagging_loss=0.0102, over 14725.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09158, pruned_loss=0.01348, audio_tagging_loss=0.00905, over 3033579.82 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:03:21,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-23 23:03:24,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2577086.6666666665, ans=0.125 2023-11-23 23:03:36,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2577153.3333333335, ans=0.125 2023-11-23 23:03:38,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2577153.3333333335, ans=0.125 2023-11-23 23:03:47,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2577220.0, ans=0.0 2023-11-23 23:03:50,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-23 23:03:59,256 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386600 2023-11-23 23:04:01,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-23 23:04:06,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2023-11-23 23:04:08,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.514e+01 9.080e+01 9.768e+01 1.273e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-23 23:04:08,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2577353.3333333335, ans=0.04949747468305833 2023-11-23 23:04:10,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2577353.3333333335, ans=0.2 2023-11-23 23:04:18,919 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1850, loss[loss=0.0629, simple_loss=0.085, pruned_loss=0.01182, audio_tagging_loss=0.008581, over 15484.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09221, pruned_loss=0.0136, audio_tagging_loss=0.009029, over 3041882.09 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:04:24,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2577420.0, ans=0.04949747468305833 2023-11-23 23:04:24,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2023-11-23 23:04:31,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2577486.6666666665, ans=0.125 2023-11-23 23:04:45,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2577553.3333333335, ans=0.125 2023-11-23 23:04:59,400 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:05:01,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2577620.0, ans=0.125 2023-11-23 23:05:02,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386650 2023-11-23 23:05:20,243 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1900, loss[loss=0.07783, simple_loss=0.1103, pruned_loss=0.01441, audio_tagging_loss=0.008277, over 15057.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09307, pruned_loss=0.0137, audio_tagging_loss=0.008972, over 3049010.03 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:05:22,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2577753.3333333335, ans=0.0 2023-11-23 23:05:53,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:06:03,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-23 23:06:04,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386700 2023-11-23 23:06:05,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2577953.3333333335, ans=0.1 2023-11-23 23:06:07,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2577953.3333333335, ans=0.125 2023-11-23 23:06:13,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.280e+01 8.966e+01 9.619e+01 1.252e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-23 23:06:18,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2578020.0, ans=0.125 2023-11-23 23:06:23,280 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 1950, loss[loss=0.08022, simple_loss=0.1058, pruned_loss=0.01646, audio_tagging_loss=0.01086, over 15762.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09181, pruned_loss=0.01353, audio_tagging_loss=0.008967, over 3050606.68 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:06:37,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2578153.3333333335, ans=0.125 2023-11-23 23:06:55,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2578220.0, ans=0.2 2023-11-23 23:06:59,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2023-11-23 23:07:00,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-23 23:07:04,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2578286.6666666665, ans=0.125 2023-11-23 23:07:07,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386750 2023-11-23 23:07:24,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2578353.3333333335, ans=0.125 2023-11-23 23:07:26,422 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2000, loss[loss=0.07818, simple_loss=0.1067, pruned_loss=0.01623, audio_tagging_loss=0.008613, over 15247.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09149, pruned_loss=0.01343, audio_tagging_loss=0.008985, over 3049668.84 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:07:29,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-23 23:07:43,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2578486.6666666665, ans=0.125 2023-11-23 23:07:52,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2578553.3333333335, ans=0.125 2023-11-23 23:07:57,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-23 23:08:03,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2023-11-23 23:08:09,842 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386800 2023-11-23 23:08:17,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2578686.6666666665, ans=0.1 2023-11-23 23:08:18,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2578686.6666666665, ans=0.125 2023-11-23 23:08:19,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.226e+01 8.902e+01 9.752e+01 1.387e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-23 23:08:19,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2578686.6666666665, ans=0.125 2023-11-23 23:08:20,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2578686.6666666665, ans=0.0 2023-11-23 23:08:23,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-23 23:08:28,799 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2050, loss[loss=0.07234, simple_loss=0.1058, pruned_loss=0.01476, audio_tagging_loss=0.004665, over 15758.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09135, pruned_loss=0.01333, audio_tagging_loss=0.009011, over 3042373.97 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:08:39,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2023-11-23 23:08:40,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2578820.0, ans=0.125 2023-11-23 23:08:57,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2578886.6666666665, ans=0.2 2023-11-23 23:08:57,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2578886.6666666665, ans=0.2 2023-11-23 23:09:04,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-23 23:09:05,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-23 23:09:12,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386850 2023-11-23 23:09:27,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2579020.0, ans=0.0 2023-11-23 23:09:30,992 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2100, loss[loss=0.05447, simple_loss=0.07398, pruned_loss=0.008366, audio_tagging_loss=0.009118, over 15381.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09022, pruned_loss=0.01323, audio_tagging_loss=0.009077, over 3041612.16 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:09:31,296 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:09:33,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2579086.6666666665, ans=0.125 2023-11-23 23:09:51,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-23 23:10:14,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386900 2023-11-23 23:10:23,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.184e+01 8.426e+01 9.085e+01 9.880e+01 1.258e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-23 23:10:29,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2579353.3333333335, ans=0.04949747468305833 2023-11-23 23:10:33,276 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2150, loss[loss=0.06307, simple_loss=0.08423, pruned_loss=0.01147, audio_tagging_loss=0.009486, over 14763.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09075, pruned_loss=0.01333, audio_tagging_loss=0.009087, over 3040261.11 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:10:41,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2579420.0, ans=0.125 2023-11-23 23:11:02,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2579553.3333333335, ans=0.07 2023-11-23 23:11:09,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2579620.0, ans=0.95 2023-11-23 23:11:10,834 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:11:13,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2579620.0, ans=0.1 2023-11-23 23:11:17,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 386950 2023-11-23 23:11:28,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-23 23:11:36,347 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2200, loss[loss=0.07245, simple_loss=0.09531, pruned_loss=0.01512, audio_tagging_loss=0.009677, over 14097.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09061, pruned_loss=0.01333, audio_tagging_loss=0.009111, over 3047068.53 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:11:58,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2579820.0, ans=0.125 2023-11-23 23:11:58,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2579820.0, ans=0.0 2023-11-23 23:12:20,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387000 2023-11-23 23:12:29,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2580020.0, ans=0.1 2023-11-23 23:12:29,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.536e+01 9.005e+01 9.625e+01 2.688e+02, threshold=1.801e+02, percent-clipped=1.0 2023-11-23 23:12:38,243 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2250, loss[loss=0.06759, simple_loss=0.08997, pruned_loss=0.01113, audio_tagging_loss=0.01147, over 15669.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.0909, pruned_loss=0.01333, audio_tagging_loss=0.009231, over 3046331.32 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:12:58,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2580153.3333333335, ans=0.2 2023-11-23 23:13:04,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2580220.0, ans=0.125 2023-11-23 23:13:18,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2580286.6666666665, ans=0.04949747468305833 2023-11-23 23:13:22,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387050 2023-11-23 23:13:41,177 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2300, loss[loss=0.0795, simple_loss=0.1058, pruned_loss=0.01707, audio_tagging_loss=0.009506, over 16199.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09086, pruned_loss=0.01335, audio_tagging_loss=0.009218, over 3046058.44 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:13:42,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2580420.0, ans=0.5 2023-11-23 23:13:47,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2580420.0, ans=0.07 2023-11-23 23:14:04,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2580553.3333333335, ans=0.2 2023-11-23 23:14:24,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387100 2023-11-23 23:14:34,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.489e+01 9.011e+01 9.554e+01 1.239e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-23 23:14:36,315 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:14:40,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2580686.6666666665, ans=0.1 2023-11-23 23:14:43,644 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2350, loss[loss=0.07746, simple_loss=0.1026, pruned_loss=0.01478, audio_tagging_loss=0.01138, over 15611.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09134, pruned_loss=0.01343, audio_tagging_loss=0.009206, over 3047327.34 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 16.0 2023-11-23 23:14:47,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2580753.3333333335, ans=0.1 2023-11-23 23:15:01,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2580820.0, ans=0.125 2023-11-23 23:15:12,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-23 23:15:27,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387150 2023-11-23 23:15:28,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2580953.3333333335, ans=0.125 2023-11-23 23:15:30,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2580953.3333333335, ans=0.09899494936611666 2023-11-23 23:15:37,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-23 23:15:44,719 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2400, loss[loss=0.07311, simple_loss=0.09857, pruned_loss=0.0162, audio_tagging_loss=0.00763, over 15618.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.0907, pruned_loss=0.01337, audio_tagging_loss=0.009367, over 3036487.30 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 32.0 2023-11-23 23:15:47,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2581086.6666666665, ans=0.0 2023-11-23 23:16:28,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387200 2023-11-23 23:16:31,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-11-23 23:16:39,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.662e+01 8.668e+01 9.147e+01 9.893e+01 1.218e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-23 23:16:45,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2581420.0, ans=0.125 2023-11-23 23:16:46,717 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2450, loss[loss=0.0551, simple_loss=0.06963, pruned_loss=0.009428, audio_tagging_loss=0.01086, over 16002.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09114, pruned_loss=0.01345, audio_tagging_loss=0.00938, over 3033362.64 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:16:52,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2581420.0, ans=0.0 2023-11-23 23:17:05,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2581486.6666666665, ans=0.0 2023-11-23 23:17:29,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2581620.0, ans=0.1 2023-11-23 23:17:29,866 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387250 2023-11-23 23:17:33,142 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:17:33,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2581620.0, ans=0.125 2023-11-23 23:17:41,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2581686.6666666665, ans=0.125 2023-11-23 23:17:50,319 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2500, loss[loss=0.05954, simple_loss=0.08447, pruned_loss=0.0106, audio_tagging_loss=0.006708, over 14858.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09158, pruned_loss=0.01351, audio_tagging_loss=0.009383, over 3036846.88 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:17:57,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2581753.3333333335, ans=0.0 2023-11-23 23:17:57,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-23 23:18:15,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2581886.6666666665, ans=0.0 2023-11-23 23:18:33,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387300 2023-11-23 23:18:42,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2582020.0, ans=0.0 2023-11-23 23:18:44,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.436e+01 9.159e+01 9.960e+01 1.411e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-23 23:18:51,367 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2550, loss[loss=0.0586, simple_loss=0.08187, pruned_loss=0.008092, audio_tagging_loss=0.009574, over 14527.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09148, pruned_loss=0.01344, audio_tagging_loss=0.009277, over 3041222.26 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:19:02,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2582153.3333333335, ans=0.125 2023-11-23 23:19:04,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2582153.3333333335, ans=0.1 2023-11-23 23:19:15,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2582220.0, ans=0.125 2023-11-23 23:19:17,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-23 23:19:25,902 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:19:29,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2582286.6666666665, ans=0.125 2023-11-23 23:19:35,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387350 2023-11-23 23:19:42,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-23 23:19:43,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2582353.3333333335, ans=0.1 2023-11-23 23:19:52,916 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2600, loss[loss=0.06114, simple_loss=0.08758, pruned_loss=0.01092, audio_tagging_loss=0.006438, over 15584.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09077, pruned_loss=0.01324, audio_tagging_loss=0.009133, over 3044596.68 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:19:56,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2582420.0, ans=0.09899494936611666 2023-11-23 23:20:11,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2582486.6666666665, ans=0.0 2023-11-23 23:20:14,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2582486.6666666665, ans=0.125 2023-11-23 23:20:24,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2582553.3333333335, ans=0.0 2023-11-23 23:20:36,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387400 2023-11-23 23:20:44,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2582686.6666666665, ans=0.125 2023-11-23 23:20:48,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.392e+01 8.863e+01 9.833e+01 1.286e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-23 23:20:49,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-23 23:20:56,692 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2650, loss[loss=0.08971, simple_loss=0.1346, pruned_loss=0.01656, audio_tagging_loss=0.005833, over 15078.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09118, pruned_loss=0.01322, audio_tagging_loss=0.009021, over 3042090.46 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:20:56,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2582753.3333333335, ans=10.0 2023-11-23 23:21:08,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2023-11-23 23:21:12,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2023-11-23 23:21:18,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-11-23 23:21:40,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387450 2023-11-23 23:21:55,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-23 23:21:58,873 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2700, loss[loss=0.06721, simple_loss=0.09945, pruned_loss=0.01015, audio_tagging_loss=0.007336, over 16746.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09084, pruned_loss=0.0132, audio_tagging_loss=0.008962, over 3051206.94 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:22:39,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2583286.6666666665, ans=0.0 2023-11-23 23:22:40,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2583286.6666666665, ans=0.09899494936611666 2023-11-23 23:22:42,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387500 2023-11-23 23:22:53,262 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:22:54,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.709e+01 9.550e+01 1.033e+02 1.468e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-23 23:22:58,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2583353.3333333335, ans=0.2 2023-11-23 23:23:00,136 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2750, loss[loss=0.07423, simple_loss=0.09187, pruned_loss=0.01842, audio_tagging_loss=0.009878, over 15200.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09071, pruned_loss=0.01331, audio_tagging_loss=0.009018, over 3054133.11 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:23:27,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2583553.3333333335, ans=0.125 2023-11-23 23:23:28,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2583553.3333333335, ans=0.0 2023-11-23 23:23:44,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387550 2023-11-23 23:23:54,124 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:24:02,855 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2800, loss[loss=0.06705, simple_loss=0.0851, pruned_loss=0.0157, audio_tagging_loss=0.008798, over 14917.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09051, pruned_loss=0.01333, audio_tagging_loss=0.009016, over 3048125.65 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:24:22,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2583820.0, ans=0.125 2023-11-23 23:24:45,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2583953.3333333335, ans=0.125 2023-11-23 23:24:46,371 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387600 2023-11-23 23:24:59,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.278e+01 8.804e+01 9.483e+01 1.291e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-23 23:25:06,131 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2850, loss[loss=0.06672, simple_loss=0.08521, pruned_loss=0.01475, audio_tagging_loss=0.009363, over 14913.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09003, pruned_loss=0.01341, audio_tagging_loss=0.009026, over 3044816.49 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:25:08,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2584086.6666666665, ans=0.025 2023-11-23 23:25:17,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2584153.3333333335, ans=0.1 2023-11-23 23:25:39,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2584220.0, ans=0.125 2023-11-23 23:25:47,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2584286.6666666665, ans=0.09899494936611666 2023-11-23 23:25:49,603 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387650 2023-11-23 23:25:56,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2584353.3333333335, ans=0.125 2023-11-23 23:26:07,797 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2900, loss[loss=0.06399, simple_loss=0.08007, pruned_loss=0.01387, audio_tagging_loss=0.01008, over 14966.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09006, pruned_loss=0.01334, audio_tagging_loss=0.008983, over 3040581.78 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:26:19,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2584486.6666666665, ans=0.0 2023-11-23 23:26:43,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2584553.3333333335, ans=0.0 2023-11-23 23:26:44,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2584620.0, ans=0.125 2023-11-23 23:26:45,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-23 23:26:51,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387700 2023-11-23 23:27:05,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.293e+01 9.102e+01 9.710e+01 1.234e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-23 23:27:10,864 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 2950, loss[loss=0.06603, simple_loss=0.09454, pruned_loss=0.01281, audio_tagging_loss=0.005954, over 14817.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09024, pruned_loss=0.0133, audio_tagging_loss=0.008967, over 3043842.96 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:27:32,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2584820.0, ans=0.0 2023-11-23 23:27:36,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-11-23 23:27:53,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2584953.3333333335, ans=0.125 2023-11-23 23:27:54,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387750 2023-11-23 23:27:57,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-11-23 23:27:58,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2584953.3333333335, ans=0.125 2023-11-23 23:27:59,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2585020.0, ans=0.125 2023-11-23 23:28:12,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2585086.6666666665, ans=0.125 2023-11-23 23:28:13,245 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3000, loss[loss=0.06791, simple_loss=0.09862, pruned_loss=0.0119, audio_tagging_loss=0.006708, over 15692.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09028, pruned_loss=0.01335, audio_tagging_loss=0.009066, over 3040431.94 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:28:13,246 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-23 23:28:53,109 INFO [train_asr.py:1253] (1/4) Epoch 33, validation: loss=0.05846, simple_loss=0.05103, pruned_loss=0.005194, audio_tagging_loss=0.02775, over 4681554.00 frames. 2023-11-23 23:28:53,110 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-23 23:28:55,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2585086.6666666665, ans=0.0 2023-11-23 23:29:10,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2585153.3333333335, ans=0.125 2023-11-23 23:29:13,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2585153.3333333335, ans=0.0 2023-11-23 23:29:21,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-23 23:29:37,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387800 2023-11-23 23:29:51,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.566e+01 9.149e+01 9.920e+01 1.258e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-23 23:29:52,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2585353.3333333335, ans=0.125 2023-11-23 23:29:54,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2585353.3333333335, ans=0.125 2023-11-23 23:29:56,964 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3050, loss[loss=0.07206, simple_loss=0.0957, pruned_loss=0.01625, audio_tagging_loss=0.007961, over 15647.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09138, pruned_loss=0.01359, audio_tagging_loss=0.009124, over 3039409.32 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:30:10,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2585486.6666666665, ans=0.2 2023-11-23 23:30:17,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2585486.6666666665, ans=0.5 2023-11-23 23:30:23,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2585553.3333333335, ans=0.0 2023-11-23 23:30:24,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2023-11-23 23:30:27,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2585553.3333333335, ans=0.1 2023-11-23 23:30:30,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2585553.3333333335, ans=0.125 2023-11-23 23:30:35,038 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:30:42,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387850 2023-11-23 23:30:51,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2585686.6666666665, ans=0.1 2023-11-23 23:30:51,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2585686.6666666665, ans=0.125 2023-11-23 23:31:01,216 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3100, loss[loss=0.06163, simple_loss=0.08458, pruned_loss=0.008296, audio_tagging_loss=0.01104, over 13940.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09157, pruned_loss=0.01357, audio_tagging_loss=0.009187, over 3042206.21 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:31:02,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2585753.3333333335, ans=0.2 2023-11-23 23:31:07,584 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:31:10,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2585753.3333333335, ans=0.1 2023-11-23 23:31:45,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387900 2023-11-23 23:31:52,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-23 23:31:59,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.717e+01 9.122e+01 9.882e+01 2.384e+02, threshold=1.824e+02, percent-clipped=1.0 2023-11-23 23:32:03,856 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3150, loss[loss=0.0775, simple_loss=0.1049, pruned_loss=0.01745, audio_tagging_loss=0.007586, over 15664.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09118, pruned_loss=0.01351, audio_tagging_loss=0.00921, over 3042866.33 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 8.0 2023-11-23 23:32:09,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2586086.6666666665, ans=0.125 2023-11-23 23:32:25,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2586153.3333333335, ans=0.125 2023-11-23 23:32:44,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2586286.6666666665, ans=0.0 2023-11-23 23:32:47,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 387950 2023-11-23 23:32:58,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2586353.3333333335, ans=0.0 2023-11-23 23:32:58,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-11-23 23:33:06,479 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3200, loss[loss=0.07177, simple_loss=0.09587, pruned_loss=0.01659, audio_tagging_loss=0.007248, over 14695.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.0921, pruned_loss=0.01367, audio_tagging_loss=0.009249, over 3044294.64 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:33:09,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2586420.0, ans=0.1 2023-11-23 23:33:12,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2586420.0, ans=0.1 2023-11-23 23:33:12,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-23 23:33:15,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2586420.0, ans=0.04949747468305833 2023-11-23 23:33:17,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2586420.0, ans=0.125 2023-11-23 23:33:25,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2586486.6666666665, ans=0.0 2023-11-23 23:33:37,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2586553.3333333335, ans=0.125 2023-11-23 23:33:50,075 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388000 2023-11-23 23:34:06,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.288e+01 8.870e+01 9.620e+01 1.279e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-23 23:34:11,750 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3250, loss[loss=0.06896, simple_loss=0.09044, pruned_loss=0.01333, audio_tagging_loss=0.01041, over 15192.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09181, pruned_loss=0.01351, audio_tagging_loss=0.009343, over 3047989.69 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:34:38,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2586886.6666666665, ans=0.2 2023-11-23 23:34:56,650 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388050 2023-11-23 23:35:15,185 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3300, loss[loss=0.06017, simple_loss=0.08343, pruned_loss=0.009375, audio_tagging_loss=0.009077, over 14968.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09205, pruned_loss=0.01361, audio_tagging_loss=0.00939, over 3044237.15 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:35:23,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2023-11-23 23:35:33,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2587153.3333333335, ans=0.0 2023-11-23 23:35:39,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2023-11-23 23:35:58,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388100 2023-11-23 23:36:12,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.941e+01 8.544e+01 9.048e+01 9.852e+01 1.338e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-23 23:36:15,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2587353.3333333335, ans=0.0 2023-11-23 23:36:18,177 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3350, loss[loss=0.06318, simple_loss=0.09054, pruned_loss=0.01007, audio_tagging_loss=0.007838, over 16476.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09229, pruned_loss=0.01369, audio_tagging_loss=0.009283, over 3051139.58 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:36:22,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2023-11-23 23:36:26,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2587420.0, ans=0.125 2023-11-23 23:36:38,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2587486.6666666665, ans=0.0 2023-11-23 23:36:39,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2587486.6666666665, ans=0.125 2023-11-23 23:36:45,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2587553.3333333335, ans=0.95 2023-11-23 23:36:51,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2587553.3333333335, ans=0.0 2023-11-23 23:36:58,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2587620.0, ans=0.0 2023-11-23 23:36:59,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2023-11-23 23:37:02,107 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388150 2023-11-23 23:37:04,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2587620.0, ans=0.0 2023-11-23 23:37:06,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2587620.0, ans=0.0 2023-11-23 23:37:20,745 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3400, loss[loss=0.07647, simple_loss=0.1049, pruned_loss=0.01647, audio_tagging_loss=0.007549, over 15125.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09236, pruned_loss=0.01369, audio_tagging_loss=0.009103, over 3049319.39 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:37:25,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2587753.3333333335, ans=0.2 2023-11-23 23:37:27,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2587753.3333333335, ans=0.125 2023-11-23 23:37:28,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2587753.3333333335, ans=10.0 2023-11-23 23:37:33,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2587820.0, ans=0.125 2023-11-23 23:37:38,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2587820.0, ans=0.2 2023-11-23 23:37:39,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2587820.0, ans=0.125 2023-11-23 23:37:50,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2587886.6666666665, ans=0.1 2023-11-23 23:37:56,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2587953.3333333335, ans=0.1 2023-11-23 23:38:04,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388200 2023-11-23 23:38:17,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.375e+01 8.962e+01 9.583e+01 1.424e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-23 23:38:22,584 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3450, loss[loss=0.04767, simple_loss=0.05803, pruned_loss=0.009573, audio_tagging_loss=0.009079, over 15668.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09238, pruned_loss=0.01364, audio_tagging_loss=0.008973, over 3047732.31 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:38:27,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2588086.6666666665, ans=0.125 2023-11-23 23:38:32,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.02 vs. limit=15.0 2023-11-23 23:38:37,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2588153.3333333335, ans=0.125 2023-11-23 23:38:49,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2588220.0, ans=0.125 2023-11-23 23:39:07,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388250 2023-11-23 23:39:07,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2588286.6666666665, ans=0.125 2023-11-23 23:39:08,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2588286.6666666665, ans=0.0 2023-11-23 23:39:15,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2588353.3333333335, ans=0.1 2023-11-23 23:39:26,123 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3500, loss[loss=0.04804, simple_loss=0.0636, pruned_loss=0.009042, audio_tagging_loss=0.007201, over 14416.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.0916, pruned_loss=0.0135, audio_tagging_loss=0.009001, over 3045989.79 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:39:29,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2588420.0, ans=0.125 2023-11-23 23:39:30,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2588420.0, ans=0.0 2023-11-23 23:39:31,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2023-11-23 23:39:35,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2588420.0, ans=0.125 2023-11-23 23:39:39,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2588486.6666666665, ans=0.0 2023-11-23 23:39:43,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2588486.6666666665, ans=0.125 2023-11-23 23:39:47,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2588486.6666666665, ans=0.2 2023-11-23 23:39:57,872 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:40:09,081 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388300 2023-11-23 23:40:21,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2588686.6666666665, ans=0.0 2023-11-23 23:40:22,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2588686.6666666665, ans=0.125 2023-11-23 23:40:22,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2588686.6666666665, ans=0.125 2023-11-23 23:40:23,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.850e+01 8.607e+01 9.289e+01 1.010e+02 1.260e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-23 23:40:28,513 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3550, loss[loss=0.06405, simple_loss=0.08557, pruned_loss=0.01225, audio_tagging_loss=0.009018, over 15834.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09145, pruned_loss=0.01348, audio_tagging_loss=0.009037, over 3046989.43 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:40:37,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2588753.3333333335, ans=0.125 2023-11-23 23:40:41,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2588820.0, ans=0.1 2023-11-23 23:40:53,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2588886.6666666665, ans=0.0 2023-11-23 23:40:59,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2588886.6666666665, ans=0.125 2023-11-23 23:41:03,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2588886.6666666665, ans=0.0 2023-11-23 23:41:12,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388350 2023-11-23 23:41:30,136 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3600, loss[loss=0.08614, simple_loss=0.1171, pruned_loss=0.01909, audio_tagging_loss=0.008495, over 14982.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09081, pruned_loss=0.01336, audio_tagging_loss=0.009081, over 3042355.55 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:41:35,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2589086.6666666665, ans=0.125 2023-11-23 23:41:45,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2023-11-23 23:41:47,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2589153.3333333335, ans=0.0 2023-11-23 23:41:48,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-23 23:41:52,095 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:42:00,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2023-11-23 23:42:07,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-23 23:42:09,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2589286.6666666665, ans=0.125 2023-11-23 23:42:14,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388400 2023-11-23 23:42:23,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2589353.3333333335, ans=0.0 2023-11-23 23:42:28,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.733e+01 8.372e+01 8.860e+01 9.569e+01 1.357e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-23 23:42:32,286 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3650, loss[loss=0.06205, simple_loss=0.08419, pruned_loss=0.01189, audio_tagging_loss=0.008065, over 15680.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09038, pruned_loss=0.0131, audio_tagging_loss=0.008971, over 3041515.46 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:42:51,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2589486.6666666665, ans=0.125 2023-11-23 23:43:06,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2589553.3333333335, ans=0.125 2023-11-23 23:43:16,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388450 2023-11-23 23:43:35,933 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3700, loss[loss=0.06199, simple_loss=0.08272, pruned_loss=0.01276, audio_tagging_loss=0.007868, over 15847.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09051, pruned_loss=0.01314, audio_tagging_loss=0.00891, over 3042978.64 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:43:36,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2589753.3333333335, ans=0.0 2023-11-23 23:43:36,354 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:43:43,153 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:44:08,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2589886.6666666665, ans=0.125 2023-11-23 23:44:19,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388500 2023-11-23 23:44:28,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2023-11-23 23:44:33,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.486e+01 9.089e+01 9.768e+01 1.177e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-23 23:44:36,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2590086.6666666665, ans=0.125 2023-11-23 23:44:37,521 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3750, loss[loss=0.04955, simple_loss=0.05996, pruned_loss=0.007853, audio_tagging_loss=0.01172, over 15324.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09116, pruned_loss=0.01344, audio_tagging_loss=0.008941, over 3046956.53 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:44:40,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2590086.6666666665, ans=0.125 2023-11-23 23:44:46,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2590086.6666666665, ans=0.2 2023-11-23 23:44:47,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2590086.6666666665, ans=0.0 2023-11-23 23:45:21,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-23 23:45:21,619 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:45:21,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388550 2023-11-23 23:45:27,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2590353.3333333335, ans=0.125 2023-11-23 23:45:39,696 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3800, loss[loss=0.05106, simple_loss=0.06465, pruned_loss=0.007734, audio_tagging_loss=0.011, over 14695.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09027, pruned_loss=0.01317, audio_tagging_loss=0.009064, over 3051442.33 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:46:23,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388600 2023-11-23 23:46:28,056 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-23 23:46:33,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2590686.6666666665, ans=0.1 2023-11-23 23:46:33,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2590686.6666666665, ans=10.0 2023-11-23 23:46:39,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.689e+01 8.380e+01 8.887e+01 9.611e+01 1.323e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-23 23:46:44,003 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3850, loss[loss=0.0693, simple_loss=0.08888, pruned_loss=0.01574, audio_tagging_loss=0.009117, over 15416.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09071, pruned_loss=0.01327, audio_tagging_loss=0.009166, over 3052960.66 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:46:44,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2023-11-23 23:46:46,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2590753.3333333335, ans=0.125 2023-11-23 23:46:53,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-23 23:47:08,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2023-11-23 23:47:22,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2590953.3333333335, ans=0.125 2023-11-23 23:47:28,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388650 2023-11-23 23:47:35,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2591020.0, ans=0.0 2023-11-23 23:47:36,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2591020.0, ans=0.125 2023-11-23 23:47:38,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2591020.0, ans=0.0 2023-11-23 23:47:40,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2591020.0, ans=0.1 2023-11-23 23:47:47,372 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3900, loss[loss=0.07225, simple_loss=0.1033, pruned_loss=0.01493, audio_tagging_loss=0.005663, over 15340.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09146, pruned_loss=0.01351, audio_tagging_loss=0.009102, over 3045461.18 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:47:48,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2591086.6666666665, ans=0.125 2023-11-23 23:48:00,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2591153.3333333335, ans=0.1 2023-11-23 23:48:11,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2591220.0, ans=0.05 2023-11-23 23:48:27,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2591286.6666666665, ans=0.125 2023-11-23 23:48:32,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388700 2023-11-23 23:48:46,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.330e+01 8.934e+01 9.660e+01 1.290e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-23 23:48:49,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2591420.0, ans=0.2 2023-11-23 23:48:50,236 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 3950, loss[loss=0.07506, simple_loss=0.09613, pruned_loss=0.01699, audio_tagging_loss=0.01001, over 14957.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09146, pruned_loss=0.01358, audio_tagging_loss=0.009211, over 3040066.34 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:49:16,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2591553.3333333335, ans=0.1 2023-11-23 23:49:33,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2591620.0, ans=0.125 2023-11-23 23:49:34,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388750 2023-11-23 23:49:53,528 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4000, loss[loss=0.06334, simple_loss=0.07713, pruned_loss=0.01189, audio_tagging_loss=0.01289, over 15285.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.0926, pruned_loss=0.01375, audio_tagging_loss=0.009337, over 3046497.26 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:49:58,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-23 23:50:06,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2023-11-23 23:50:21,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-23 23:50:37,101 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388800 2023-11-23 23:50:41,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2591953.3333333335, ans=0.125 2023-11-23 23:50:45,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2023-11-23 23:50:54,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.337e+01 9.138e+01 9.888e+01 1.432e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-23 23:50:56,665 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4050, loss[loss=0.09295, simple_loss=0.118, pruned_loss=0.02371, audio_tagging_loss=0.01023, over 14809.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09241, pruned_loss=0.01373, audio_tagging_loss=0.009287, over 3048078.80 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:51:00,283 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:51:40,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388850 2023-11-23 23:51:47,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2023-11-23 23:51:48,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-23 23:51:58,737 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4100, loss[loss=0.0788, simple_loss=0.1098, pruned_loss=0.0149, audio_tagging_loss=0.00899, over 15015.00 frames. ], tot_loss[loss=0.06924, simple_loss=0.09259, pruned_loss=0.0137, audio_tagging_loss=0.009243, over 3052568.07 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:52:21,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2592486.6666666665, ans=0.125 2023-11-23 23:52:43,310 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388900 2023-11-23 23:52:46,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2592620.0, ans=0.1 2023-11-23 23:52:59,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.379e+01 9.260e+01 9.829e+01 1.268e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-23 23:53:02,086 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4150, loss[loss=0.08128, simple_loss=0.1166, pruned_loss=0.01689, audio_tagging_loss=0.006095, over 15260.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09243, pruned_loss=0.01366, audio_tagging_loss=0.00906, over 3046675.37 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:53:02,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2592753.3333333335, ans=0.0 2023-11-23 23:53:10,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2592753.3333333335, ans=0.5 2023-11-23 23:53:20,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2592820.0, ans=0.125 2023-11-23 23:53:32,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2592886.6666666665, ans=0.125 2023-11-23 23:53:37,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2592953.3333333335, ans=0.125 2023-11-23 23:53:45,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 388950 2023-11-23 23:53:48,194 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-23 23:53:52,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2593020.0, ans=0.2 2023-11-23 23:54:04,222 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4200, loss[loss=0.06438, simple_loss=0.08794, pruned_loss=0.01196, audio_tagging_loss=0.008446, over 14948.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09267, pruned_loss=0.01354, audio_tagging_loss=0.008981, over 3044598.38 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:54:04,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2593086.6666666665, ans=0.125 2023-11-23 23:54:11,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-23 23:54:12,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2593086.6666666665, ans=0.1 2023-11-23 23:54:47,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389000 2023-11-23 23:54:50,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2593286.6666666665, ans=0.2 2023-11-23 23:54:52,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2593286.6666666665, ans=0.125 2023-11-23 23:55:04,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.440e+01 9.074e+01 9.920e+01 1.262e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-23 23:55:06,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2593420.0, ans=0.1 2023-11-23 23:55:07,004 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4250, loss[loss=0.05144, simple_loss=0.07238, pruned_loss=0.005622, audio_tagging_loss=0.009623, over 14988.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.09312, pruned_loss=0.01367, audio_tagging_loss=0.008977, over 3045704.01 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:55:13,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2593420.0, ans=0.07 2023-11-23 23:55:21,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2593486.6666666665, ans=0.1 2023-11-23 23:55:49,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2593620.0, ans=0.125 2023-11-23 23:55:51,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389050 2023-11-23 23:55:53,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-23 23:55:54,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.96 vs. limit=22.5 2023-11-23 23:56:05,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2593686.6666666665, ans=0.2 2023-11-23 23:56:10,038 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4300, loss[loss=0.09785, simple_loss=0.14, pruned_loss=0.0209, audio_tagging_loss=0.006968, over 15461.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09334, pruned_loss=0.01381, audio_tagging_loss=0.008916, over 3052843.68 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:56:10,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-23 23:56:28,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2593820.0, ans=0.125 2023-11-23 23:56:43,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2593886.6666666665, ans=0.2 2023-11-23 23:56:53,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389100 2023-11-23 23:57:09,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2594020.0, ans=0.125 2023-11-23 23:57:10,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.627e+01 9.104e+01 9.923e+01 1.149e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-23 23:57:12,439 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4350, loss[loss=0.08169, simple_loss=0.112, pruned_loss=0.01771, audio_tagging_loss=0.007988, over 15610.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09299, pruned_loss=0.01364, audio_tagging_loss=0.0089, over 3041667.85 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-23 23:57:12,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2594086.6666666665, ans=0.2 2023-11-23 23:57:20,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2594086.6666666665, ans=0.0 2023-11-23 23:57:23,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2594153.3333333335, ans=0.125 2023-11-23 23:57:49,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-23 23:57:56,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389150 2023-11-23 23:58:06,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2594353.3333333335, ans=0.1 2023-11-23 23:58:15,014 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4400, loss[loss=0.07055, simple_loss=0.09487, pruned_loss=0.01278, audio_tagging_loss=0.01033, over 14600.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09312, pruned_loss=0.01368, audio_tagging_loss=0.00889, over 3044343.32 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 32.0 2023-11-23 23:58:15,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2594420.0, ans=0.0 2023-11-23 23:58:21,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-11-23 23:58:22,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-23 23:58:40,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2594553.3333333335, ans=0.1 2023-11-23 23:58:42,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2594553.3333333335, ans=0.125 2023-11-23 23:58:58,453 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389200 2023-11-23 23:59:01,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2594620.0, ans=0.125 2023-11-23 23:59:02,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2594620.0, ans=0.1 2023-11-23 23:59:12,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2594686.6666666665, ans=0.0 2023-11-23 23:59:14,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.522e+01 9.303e+01 9.983e+01 1.286e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-23 23:59:16,882 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4450, loss[loss=0.05916, simple_loss=0.08005, pruned_loss=0.01084, audio_tagging_loss=0.008294, over 16387.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.09364, pruned_loss=0.01372, audio_tagging_loss=0.00892, over 3045591.33 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 32.0 2023-11-23 23:59:21,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2594753.3333333335, ans=0.125 2023-11-23 23:59:47,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2594886.6666666665, ans=0.0 2023-11-23 23:59:50,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2594886.6666666665, ans=0.0 2023-11-23 23:59:55,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2594953.3333333335, ans=0.125 2023-11-24 00:00:00,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389250 2023-11-24 00:00:20,329 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4500, loss[loss=0.0587, simple_loss=0.0762, pruned_loss=0.01039, audio_tagging_loss=0.01022, over 13847.00 frames. ], tot_loss[loss=0.06882, simple_loss=0.0929, pruned_loss=0.01349, audio_tagging_loss=0.008883, over 3052529.73 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:00:44,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2595220.0, ans=0.125 2023-11-24 00:00:57,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-24 00:01:04,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389300 2023-11-24 00:01:05,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-24 00:01:19,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.069e+01 8.788e+01 9.879e+01 1.178e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-24 00:01:22,009 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4550, loss[loss=0.08738, simple_loss=0.1243, pruned_loss=0.0189, audio_tagging_loss=0.006344, over 16280.00 frames. ], tot_loss[loss=0.06939, simple_loss=0.09375, pruned_loss=0.01374, audio_tagging_loss=0.008767, over 3048004.00 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:01:51,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2595553.3333333335, ans=0.125 2023-11-24 00:01:54,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2595553.3333333335, ans=0.5 2023-11-24 00:02:05,550 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389350 2023-11-24 00:02:10,195 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 00:02:23,777 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4600, loss[loss=0.08446, simple_loss=0.1184, pruned_loss=0.01486, audio_tagging_loss=0.01042, over 16208.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09252, pruned_loss=0.01361, audio_tagging_loss=0.008929, over 3050091.74 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:02:24,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2595753.3333333335, ans=0.125 2023-11-24 00:02:38,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2595820.0, ans=0.125 2023-11-24 00:02:40,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2595820.0, ans=0.0 2023-11-24 00:02:55,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2023-11-24 00:02:56,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2595886.6666666665, ans=0.125 2023-11-24 00:03:00,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2595953.3333333335, ans=0.1 2023-11-24 00:03:06,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389400 2023-11-24 00:03:10,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2595953.3333333335, ans=0.0 2023-11-24 00:03:11,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2595953.3333333335, ans=0.125 2023-11-24 00:03:14,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=15.0 2023-11-24 00:03:16,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2596020.0, ans=0.1 2023-11-24 00:03:24,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.634e+01 9.201e+01 1.007e+02 1.274e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-24 00:03:26,979 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4650, loss[loss=0.05008, simple_loss=0.06182, pruned_loss=0.007069, audio_tagging_loss=0.0121, over 13918.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09245, pruned_loss=0.01365, audio_tagging_loss=0.00904, over 3045310.21 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:03:33,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2596086.6666666665, ans=0.125 2023-11-24 00:03:43,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2596153.3333333335, ans=0.125 2023-11-24 00:03:51,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-24 00:04:10,447 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389450 2023-11-24 00:04:24,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2596353.3333333335, ans=0.0 2023-11-24 00:04:28,859 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4700, loss[loss=0.07054, simple_loss=0.09304, pruned_loss=0.01529, audio_tagging_loss=0.008726, over 14909.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09183, pruned_loss=0.01347, audio_tagging_loss=0.009128, over 3042223.10 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:05:03,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2596553.3333333335, ans=0.0 2023-11-24 00:05:12,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389500 2023-11-24 00:05:13,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2023-11-24 00:05:14,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-11-24 00:05:28,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.415e+01 9.088e+01 1.001e+02 1.234e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-24 00:05:30,842 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4750, loss[loss=0.06417, simple_loss=0.08316, pruned_loss=0.01428, audio_tagging_loss=0.008308, over 15535.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09147, pruned_loss=0.01351, audio_tagging_loss=0.009234, over 3040852.49 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:06:08,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2596953.3333333335, ans=0.125 2023-11-24 00:06:09,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2596953.3333333335, ans=0.125 2023-11-24 00:06:14,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389550 2023-11-24 00:06:35,068 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4800, loss[loss=0.08465, simple_loss=0.111, pruned_loss=0.02011, audio_tagging_loss=0.009052, over 16593.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09092, pruned_loss=0.01348, audio_tagging_loss=0.009301, over 3033389.96 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:06:42,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2597086.6666666665, ans=0.0 2023-11-24 00:06:42,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2597086.6666666665, ans=0.125 2023-11-24 00:06:53,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2597153.3333333335, ans=0.125 2023-11-24 00:06:56,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2597153.3333333335, ans=0.125 2023-11-24 00:06:58,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2597220.0, ans=0.125 2023-11-24 00:07:19,031 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389600 2023-11-24 00:07:33,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2597353.3333333335, ans=0.1 2023-11-24 00:07:37,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.337e+01 8.951e+01 9.611e+01 1.135e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-24 00:07:37,750 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4850, loss[loss=0.07599, simple_loss=0.09147, pruned_loss=0.01836, audio_tagging_loss=0.01189, over 14428.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09165, pruned_loss=0.01367, audio_tagging_loss=0.009498, over 3037469.51 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:07:52,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2597486.6666666665, ans=0.125 2023-11-24 00:07:54,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2597486.6666666665, ans=0.1 2023-11-24 00:07:57,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=12.0 2023-11-24 00:08:01,243 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:08:13,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2597553.3333333335, ans=0.0 2023-11-24 00:08:13,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-11-24 00:08:21,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389650 2023-11-24 00:08:30,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2597686.6666666665, ans=15.0 2023-11-24 00:08:36,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2597686.6666666665, ans=0.0 2023-11-24 00:08:39,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2023-11-24 00:08:39,292 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4900, loss[loss=0.0476, simple_loss=0.06629, pruned_loss=0.007448, audio_tagging_loss=0.007006, over 16377.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.0922, pruned_loss=0.01368, audio_tagging_loss=0.009433, over 3039236.70 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:08:54,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2597820.0, ans=0.035 2023-11-24 00:09:08,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2597886.6666666665, ans=0.2 2023-11-24 00:09:19,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2597953.3333333335, ans=0.125 2023-11-24 00:09:23,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389700 2023-11-24 00:09:43,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.493e+01 8.944e+01 9.853e+01 1.226e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-24 00:09:43,080 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 4950, loss[loss=0.08608, simple_loss=0.1138, pruned_loss=0.02115, audio_tagging_loss=0.008007, over 15448.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.09283, pruned_loss=0.01375, audio_tagging_loss=0.009182, over 3042349.61 frames. ], batch size: 60, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:09:46,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2598086.6666666665, ans=0.0 2023-11-24 00:10:22,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2598286.6666666665, ans=0.1 2023-11-24 00:10:26,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389750 2023-11-24 00:10:37,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2598353.3333333335, ans=0.125 2023-11-24 00:10:45,884 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5000, loss[loss=0.05807, simple_loss=0.07562, pruned_loss=0.01032, audio_tagging_loss=0.009949, over 15933.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09229, pruned_loss=0.01353, audio_tagging_loss=0.008954, over 3041799.24 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:11:00,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2598486.6666666665, ans=0.0 2023-11-24 00:11:24,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2598620.0, ans=0.125 2023-11-24 00:11:29,696 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389800 2023-11-24 00:11:38,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2598686.6666666665, ans=0.125 2023-11-24 00:11:47,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.099e+01 8.775e+01 9.573e+01 1.290e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-24 00:11:47,944 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5050, loss[loss=0.06602, simple_loss=0.07395, pruned_loss=0.01731, audio_tagging_loss=0.01174, over 14315.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09195, pruned_loss=0.01345, audio_tagging_loss=0.008914, over 3042046.24 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:12:02,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2598820.0, ans=0.125 2023-11-24 00:12:31,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2598953.3333333335, ans=0.125 2023-11-24 00:12:32,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389850 2023-11-24 00:12:50,811 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5100, loss[loss=0.07014, simple_loss=0.08783, pruned_loss=0.0158, audio_tagging_loss=0.01043, over 13716.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09134, pruned_loss=0.01347, audio_tagging_loss=0.009003, over 3042905.60 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:12:54,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=12.0 2023-11-24 00:12:58,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2599086.6666666665, ans=0.125 2023-11-24 00:13:07,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2599153.3333333335, ans=0.125 2023-11-24 00:13:10,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2599153.3333333335, ans=0.125 2023-11-24 00:13:34,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389900 2023-11-24 00:13:35,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2599286.6666666665, ans=0.0 2023-11-24 00:13:44,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2599353.3333333335, ans=0.125 2023-11-24 00:13:44,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2599353.3333333335, ans=0.1 2023-11-24 00:13:54,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.055e+01 8.310e+01 9.019e+01 9.639e+01 1.666e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-24 00:13:54,350 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5150, loss[loss=0.07515, simple_loss=0.1095, pruned_loss=0.01334, audio_tagging_loss=0.007059, over 15172.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09121, pruned_loss=0.01342, audio_tagging_loss=0.008982, over 3044649.08 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:14:15,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2599486.6666666665, ans=0.015 2023-11-24 00:14:16,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-24 00:14:19,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2599553.3333333335, ans=0.125 2023-11-24 00:14:23,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2599553.3333333335, ans=0.0 2023-11-24 00:14:38,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 389950 2023-11-24 00:14:45,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2599686.6666666665, ans=0.0 2023-11-24 00:14:48,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2599686.6666666665, ans=0.1 2023-11-24 00:14:48,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2599686.6666666665, ans=0.125 2023-11-24 00:14:56,519 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5200, loss[loss=0.07346, simple_loss=0.09442, pruned_loss=0.01925, audio_tagging_loss=0.007005, over 14971.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09125, pruned_loss=0.01346, audio_tagging_loss=0.008974, over 3039446.35 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:15:00,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2599753.3333333335, ans=0.95 2023-11-24 00:15:24,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-24 00:15:41,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390000 2023-11-24 00:15:52,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-24 00:16:00,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.431e+01 8.968e+01 9.666e+01 1.343e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-24 00:16:00,385 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5250, loss[loss=0.08746, simple_loss=0.1199, pruned_loss=0.02066, audio_tagging_loss=0.006872, over 15903.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09242, pruned_loss=0.01373, audio_tagging_loss=0.00882, over 3041459.03 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:16:30,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2600220.0, ans=0.125 2023-11-24 00:16:34,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-24 00:16:43,590 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390050 2023-11-24 00:16:45,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-24 00:16:51,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-24 00:17:03,028 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5300, loss[loss=0.079, simple_loss=0.1074, pruned_loss=0.01562, audio_tagging_loss=0.009691, over 15554.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09243, pruned_loss=0.01372, audio_tagging_loss=0.008896, over 3045929.37 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:17:09,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2600420.0, ans=0.125 2023-11-24 00:17:09,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2600420.0, ans=0.125 2023-11-24 00:17:26,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2600553.3333333335, ans=0.125 2023-11-24 00:17:37,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2600553.3333333335, ans=0.125 2023-11-24 00:17:41,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2600620.0, ans=0.09899494936611666 2023-11-24 00:17:46,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390100 2023-11-24 00:17:47,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.74 vs. limit=10.0 2023-11-24 00:17:49,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2023-11-24 00:17:52,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2600686.6666666665, ans=0.125 2023-11-24 00:18:02,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2600686.6666666665, ans=0.125 2023-11-24 00:18:04,804 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5350, loss[loss=0.04359, simple_loss=0.04916, pruned_loss=0.004908, audio_tagging_loss=0.0141, over 15701.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09182, pruned_loss=0.0135, audio_tagging_loss=0.008935, over 3043840.26 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:18:05,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.377e+01 8.739e+01 9.630e+01 1.201e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-24 00:18:27,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2600820.0, ans=0.1 2023-11-24 00:18:30,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2600886.6666666665, ans=0.1 2023-11-24 00:18:46,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2600953.3333333335, ans=0.1 2023-11-24 00:18:48,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390150 2023-11-24 00:18:51,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2600953.3333333335, ans=0.125 2023-11-24 00:19:06,218 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5400, loss[loss=0.0772, simple_loss=0.1042, pruned_loss=0.01538, audio_tagging_loss=0.009703, over 15058.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09199, pruned_loss=0.01354, audio_tagging_loss=0.008984, over 3045402.40 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:19:06,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2601086.6666666665, ans=0.125 2023-11-24 00:19:08,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2601086.6666666665, ans=0.125 2023-11-24 00:19:20,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2601153.3333333335, ans=0.125 2023-11-24 00:19:22,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2023-11-24 00:19:26,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2601153.3333333335, ans=0.125 2023-11-24 00:19:34,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2601220.0, ans=0.04949747468305833 2023-11-24 00:19:50,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390200 2023-11-24 00:20:07,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2023-11-24 00:20:09,849 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5450, loss[loss=0.07357, simple_loss=0.1001, pruned_loss=0.01413, audio_tagging_loss=0.009413, over 15927.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09225, pruned_loss=0.01366, audio_tagging_loss=0.009041, over 3041958.62 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:20:10,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.413e+01 8.965e+01 9.765e+01 1.201e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 00:20:13,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2601420.0, ans=0.125 2023-11-24 00:20:36,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2601553.3333333335, ans=0.0 2023-11-24 00:20:51,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2601620.0, ans=0.95 2023-11-24 00:20:51,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2601620.0, ans=0.2 2023-11-24 00:20:52,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390250 2023-11-24 00:21:11,504 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5500, loss[loss=0.0944, simple_loss=0.1243, pruned_loss=0.02436, audio_tagging_loss=0.007886, over 15449.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09282, pruned_loss=0.01367, audio_tagging_loss=0.008975, over 3050182.95 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:21:42,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2601886.6666666665, ans=0.125 2023-11-24 00:21:55,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390300 2023-11-24 00:22:11,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2602020.0, ans=0.125 2023-11-24 00:22:13,306 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5550, loss[loss=0.08773, simple_loss=0.1236, pruned_loss=0.01869, audio_tagging_loss=0.007219, over 15821.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09396, pruned_loss=0.01381, audio_tagging_loss=0.008969, over 3056512.99 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:22:14,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.280e+01 9.193e+01 9.915e+01 1.422e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 00:22:14,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2602086.6666666665, ans=0.0 2023-11-24 00:22:26,566 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:22:40,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2602220.0, ans=0.2 2023-11-24 00:22:45,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2602220.0, ans=0.05 2023-11-24 00:22:57,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390350 2023-11-24 00:23:01,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-24 00:23:05,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2602353.3333333335, ans=10.0 2023-11-24 00:23:15,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2602420.0, ans=0.125 2023-11-24 00:23:15,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2602420.0, ans=0.0 2023-11-24 00:23:16,741 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5600, loss[loss=0.06684, simple_loss=0.09445, pruned_loss=0.01148, audio_tagging_loss=0.008132, over 14239.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09372, pruned_loss=0.01381, audio_tagging_loss=0.009094, over 3056540.84 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:23:21,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2602420.0, ans=0.1 2023-11-24 00:23:34,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-24 00:23:44,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2602553.3333333335, ans=0.1 2023-11-24 00:23:59,844 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390400 2023-11-24 00:24:00,937 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 00:24:05,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2602686.6666666665, ans=0.125 2023-11-24 00:24:15,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2602686.6666666665, ans=0.95 2023-11-24 00:24:18,478 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5650, loss[loss=0.06096, simple_loss=0.07638, pruned_loss=0.01308, audio_tagging_loss=0.009687, over 15930.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09322, pruned_loss=0.01396, audio_tagging_loss=0.009209, over 3055515.41 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:24:19,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.461e+01 8.950e+01 9.725e+01 1.309e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-24 00:24:29,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2602820.0, ans=0.1 2023-11-24 00:25:02,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390450 2023-11-24 00:25:18,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2603020.0, ans=0.0 2023-11-24 00:25:20,365 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5700, loss[loss=0.06165, simple_loss=0.08363, pruned_loss=0.01216, audio_tagging_loss=0.007668, over 15946.00 frames. ], tot_loss[loss=0.06949, simple_loss=0.09255, pruned_loss=0.01393, audio_tagging_loss=0.009281, over 3058345.95 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:25:21,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2603086.6666666665, ans=0.0 2023-11-24 00:25:27,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2603086.6666666665, ans=0.125 2023-11-24 00:25:33,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2603153.3333333335, ans=0.125 2023-11-24 00:26:02,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2603286.6666666665, ans=0.2 2023-11-24 00:26:04,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390500 2023-11-24 00:26:06,656 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:26:19,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2603353.3333333335, ans=0.0 2023-11-24 00:26:23,059 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5750, loss[loss=0.07521, simple_loss=0.1164, pruned_loss=0.01278, audio_tagging_loss=0.004225, over 15856.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.0919, pruned_loss=0.01367, audio_tagging_loss=0.00919, over 3055564.22 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2023-11-24 00:26:24,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.481e+01 9.036e+01 9.796e+01 1.280e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 00:26:25,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2023-11-24 00:26:47,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2023-11-24 00:27:06,387 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390550 2023-11-24 00:27:11,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2603686.6666666665, ans=0.125 2023-11-24 00:27:13,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2603686.6666666665, ans=0.125 2023-11-24 00:27:25,128 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5800, loss[loss=0.04351, simple_loss=0.05162, pruned_loss=0.008328, audio_tagging_loss=0.009374, over 14299.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09086, pruned_loss=0.01357, audio_tagging_loss=0.009097, over 3053203.13 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 8.0 2023-11-24 00:27:39,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2603820.0, ans=0.04949747468305833 2023-11-24 00:27:56,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2603886.6666666665, ans=0.0 2023-11-24 00:28:08,523 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390600 2023-11-24 00:28:23,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2604020.0, ans=0.125 2023-11-24 00:28:24,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2604020.0, ans=0.1 2023-11-24 00:28:26,602 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5850, loss[loss=0.07997, simple_loss=0.1165, pruned_loss=0.01446, audio_tagging_loss=0.007275, over 15180.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09259, pruned_loss=0.01368, audio_tagging_loss=0.008937, over 3047492.76 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 8.0 2023-11-24 00:28:27,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2604086.6666666665, ans=0.125 2023-11-24 00:28:29,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2604086.6666666665, ans=0.125 2023-11-24 00:28:30,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.606e+01 8.379e+01 8.883e+01 9.652e+01 2.888e+02, threshold=1.777e+02, percent-clipped=1.0 2023-11-24 00:29:04,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2604286.6666666665, ans=0.0 2023-11-24 00:29:04,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2604286.6666666665, ans=0.0 2023-11-24 00:29:10,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390650 2023-11-24 00:29:10,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2604286.6666666665, ans=0.0 2023-11-24 00:29:15,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2604353.3333333335, ans=0.1 2023-11-24 00:29:17,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2604353.3333333335, ans=0.125 2023-11-24 00:29:29,329 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5900, loss[loss=0.08308, simple_loss=0.1182, pruned_loss=0.01619, audio_tagging_loss=0.007798, over 15241.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09238, pruned_loss=0.01374, audio_tagging_loss=0.008984, over 3046097.40 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 8.0 2023-11-24 00:29:40,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2604420.0, ans=0.125 2023-11-24 00:29:47,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-24 00:30:00,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2604553.3333333335, ans=0.0 2023-11-24 00:30:13,471 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390700 2023-11-24 00:30:18,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2604686.6666666665, ans=0.0 2023-11-24 00:30:32,173 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 5950, loss[loss=0.06773, simple_loss=0.09684, pruned_loss=0.01237, audio_tagging_loss=0.006949, over 16013.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09217, pruned_loss=0.01369, audio_tagging_loss=0.008973, over 3039280.20 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 8.0 2023-11-24 00:30:35,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.451e+01 9.116e+01 9.981e+01 1.115e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 00:30:59,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2604886.6666666665, ans=0.125 2023-11-24 00:31:00,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2604886.6666666665, ans=0.125 2023-11-24 00:31:15,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390750 2023-11-24 00:31:27,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2605020.0, ans=0.05 2023-11-24 00:31:33,655 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6000, loss[loss=0.0734, simple_loss=0.09753, pruned_loss=0.01605, audio_tagging_loss=0.008578, over 15415.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09139, pruned_loss=0.01346, audio_tagging_loss=0.009029, over 3037824.99 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:31:33,655 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 00:32:09,892 INFO [train_asr.py:1253] (1/4) Epoch 33, validation: loss=0.05769, simple_loss=0.05098, pruned_loss=0.005124, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-24 00:32:09,893 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 00:32:10,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2605086.6666666665, ans=0.1 2023-11-24 00:32:15,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2605086.6666666665, ans=0.125 2023-11-24 00:32:38,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2605220.0, ans=0.2 2023-11-24 00:32:38,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2605220.0, ans=0.1 2023-11-24 00:32:52,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390800 2023-11-24 00:32:55,875 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 00:33:06,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2605353.3333333335, ans=0.0 2023-11-24 00:33:11,945 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6050, loss[loss=0.0706, simple_loss=0.1002, pruned_loss=0.01465, audio_tagging_loss=0.005858, over 15448.00 frames. ], tot_loss[loss=0.068, simple_loss=0.091, pruned_loss=0.01348, audio_tagging_loss=0.009027, over 3035922.23 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:33:15,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.432e+01 8.885e+01 9.962e+01 1.302e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-24 00:33:18,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2605420.0, ans=10.0 2023-11-24 00:33:19,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2605420.0, ans=0.125 2023-11-24 00:33:22,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2605486.6666666665, ans=0.0 2023-11-24 00:33:37,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2605553.3333333335, ans=0.125 2023-11-24 00:33:49,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2605620.0, ans=0.0 2023-11-24 00:33:54,996 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390850 2023-11-24 00:34:04,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2605686.6666666665, ans=0.0 2023-11-24 00:34:09,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2605686.6666666665, ans=0.125 2023-11-24 00:34:12,709 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6100, loss[loss=0.06858, simple_loss=0.09104, pruned_loss=0.01449, audio_tagging_loss=0.008566, over 14543.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.08978, pruned_loss=0.01314, audio_tagging_loss=0.009074, over 3040422.47 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:34:16,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2605753.3333333335, ans=0.0 2023-11-24 00:34:27,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-24 00:34:49,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2605953.3333333335, ans=0.0 2023-11-24 00:34:56,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390900 2023-11-24 00:35:14,632 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6150, loss[loss=0.07437, simple_loss=0.09759, pruned_loss=0.01576, audio_tagging_loss=0.00981, over 15240.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08946, pruned_loss=0.01305, audio_tagging_loss=0.009085, over 3043902.31 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 16.0 2023-11-24 00:35:19,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.208e+01 8.886e+01 9.597e+01 1.503e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-24 00:35:20,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2606086.6666666665, ans=0.125 2023-11-24 00:35:25,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-24 00:35:50,631 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:35:57,630 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 390950 2023-11-24 00:36:09,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2606353.3333333335, ans=0.125 2023-11-24 00:36:09,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-24 00:36:17,589 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6200, loss[loss=0.06877, simple_loss=0.08765, pruned_loss=0.01409, audio_tagging_loss=0.01086, over 14340.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0893, pruned_loss=0.01316, audio_tagging_loss=0.009179, over 3038493.71 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:36:44,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2606553.3333333335, ans=0.125 2023-11-24 00:37:02,078 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391000 2023-11-24 00:37:19,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2606753.3333333335, ans=0.125 2023-11-24 00:37:20,037 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6250, loss[loss=0.07413, simple_loss=0.1043, pruned_loss=0.0154, audio_tagging_loss=0.006575, over 15311.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.08934, pruned_loss=0.01316, audio_tagging_loss=0.009226, over 3042962.53 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:37:23,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.315e+01 8.981e+01 9.556e+01 1.931e+02, threshold=1.796e+02, percent-clipped=1.0 2023-11-24 00:37:47,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2606886.6666666665, ans=0.09899494936611666 2023-11-24 00:37:53,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2606886.6666666665, ans=0.125 2023-11-24 00:38:04,287 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391050 2023-11-24 00:38:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2607086.6666666665, ans=0.125 2023-11-24 00:38:22,586 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6300, loss[loss=0.05928, simple_loss=0.07483, pruned_loss=0.009719, audio_tagging_loss=0.01215, over 16623.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.08968, pruned_loss=0.01331, audio_tagging_loss=0.009381, over 3043093.28 frames. ], batch size: 66, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:38:28,290 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:38:30,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2607086.6666666665, ans=0.0 2023-11-24 00:38:46,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-24 00:38:56,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-24 00:39:05,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391100 2023-11-24 00:39:21,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2607353.3333333335, ans=0.125 2023-11-24 00:39:25,696 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6350, loss[loss=0.05786, simple_loss=0.07555, pruned_loss=0.008892, audio_tagging_loss=0.01119, over 15400.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09085, pruned_loss=0.01329, audio_tagging_loss=0.00936, over 3048867.36 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:39:29,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 8.323e+01 9.041e+01 9.620e+01 1.215e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 00:39:44,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-24 00:39:45,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2607486.6666666665, ans=0.2 2023-11-24 00:39:46,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2607486.6666666665, ans=0.125 2023-11-24 00:39:49,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2607553.3333333335, ans=0.125 2023-11-24 00:40:06,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2607620.0, ans=0.0 2023-11-24 00:40:09,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391150 2023-11-24 00:40:27,284 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6400, loss[loss=0.0686, simple_loss=0.08383, pruned_loss=0.0165, audio_tagging_loss=0.01019, over 15054.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09143, pruned_loss=0.01341, audio_tagging_loss=0.009313, over 3046260.10 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 32.0 2023-11-24 00:40:31,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2607753.3333333335, ans=0.5 2023-11-24 00:40:33,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2607753.3333333335, ans=0.125 2023-11-24 00:40:43,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2607820.0, ans=0.5 2023-11-24 00:40:47,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2607820.0, ans=0.2 2023-11-24 00:40:57,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2607886.6666666665, ans=0.125 2023-11-24 00:41:11,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391200 2023-11-24 00:41:29,310 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6450, loss[loss=0.07958, simple_loss=0.11, pruned_loss=0.0175, audio_tagging_loss=0.007092, over 15008.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09147, pruned_loss=0.01346, audio_tagging_loss=0.009288, over 3041656.99 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:41:32,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2608086.6666666665, ans=0.0 2023-11-24 00:41:33,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.31 vs. limit=22.5 2023-11-24 00:41:34,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.349e+01 8.985e+01 9.691e+01 1.713e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-24 00:42:14,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391250 2023-11-24 00:42:33,538 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6500, loss[loss=0.07404, simple_loss=0.1093, pruned_loss=0.0112, audio_tagging_loss=0.008191, over 14691.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.0908, pruned_loss=0.01329, audio_tagging_loss=0.00923, over 3040863.41 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:42:52,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2608486.6666666665, ans=0.5 2023-11-24 00:42:55,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2608486.6666666665, ans=0.0 2023-11-24 00:43:05,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2608553.3333333335, ans=0.05 2023-11-24 00:43:17,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391300 2023-11-24 00:43:21,374 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:43:24,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2608686.6666666665, ans=0.0 2023-11-24 00:43:24,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2608686.6666666665, ans=0.125 2023-11-24 00:43:26,043 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:43:34,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-24 00:43:35,731 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6550, loss[loss=0.07704, simple_loss=0.0988, pruned_loss=0.01904, audio_tagging_loss=0.008597, over 15343.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09222, pruned_loss=0.01348, audio_tagging_loss=0.009114, over 3045710.99 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:43:40,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.079e+01 8.397e+01 9.072e+01 9.736e+01 1.207e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 00:43:48,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2608820.0, ans=0.125 2023-11-24 00:43:57,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2608820.0, ans=0.125 2023-11-24 00:44:01,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2608886.6666666665, ans=0.125 2023-11-24 00:44:17,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2608953.3333333335, ans=0.125 2023-11-24 00:44:17,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2023-11-24 00:44:19,805 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391350 2023-11-24 00:44:22,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=12.0 2023-11-24 00:44:37,628 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6600, loss[loss=0.07271, simple_loss=0.09279, pruned_loss=0.01695, audio_tagging_loss=0.009368, over 14548.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09267, pruned_loss=0.0137, audio_tagging_loss=0.008922, over 3036483.26 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:44:41,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2609086.6666666665, ans=0.0 2023-11-24 00:44:59,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2609153.3333333335, ans=0.125 2023-11-24 00:45:11,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2609220.0, ans=0.125 2023-11-24 00:45:21,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391400 2023-11-24 00:45:22,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-24 00:45:33,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2023-11-24 00:45:38,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2609353.3333333335, ans=0.0 2023-11-24 00:45:41,428 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6650, loss[loss=0.04732, simple_loss=0.06585, pruned_loss=0.006524, audio_tagging_loss=0.007868, over 14895.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.0918, pruned_loss=0.01355, audio_tagging_loss=0.008838, over 3040837.55 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:45:46,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.351e+01 8.932e+01 9.624e+01 1.119e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 00:45:48,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 00:46:20,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2609620.0, ans=0.125 2023-11-24 00:46:22,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-24 00:46:25,039 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391450 2023-11-24 00:46:25,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2609620.0, ans=0.125 2023-11-24 00:46:30,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2609686.6666666665, ans=0.125 2023-11-24 00:46:39,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2609686.6666666665, ans=0.0 2023-11-24 00:46:42,708 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6700, loss[loss=0.0604, simple_loss=0.077, pruned_loss=0.01017, audio_tagging_loss=0.01173, over 15783.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09243, pruned_loss=0.01354, audio_tagging_loss=0.008743, over 3043584.19 frames. ], batch size: 64, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:46:56,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2609820.0, ans=0.2 2023-11-24 00:47:01,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2609820.0, ans=0.125 2023-11-24 00:47:12,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2609886.6666666665, ans=0.05 2023-11-24 00:47:14,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2609886.6666666665, ans=0.035 2023-11-24 00:47:26,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391500 2023-11-24 00:47:27,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2609953.3333333335, ans=0.125 2023-11-24 00:47:37,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2610020.0, ans=0.0 2023-11-24 00:47:45,022 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6750, loss[loss=0.06607, simple_loss=0.08927, pruned_loss=0.01159, audio_tagging_loss=0.009843, over 15720.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09195, pruned_loss=0.01346, audio_tagging_loss=0.008847, over 3046670.26 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:47:49,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.433e+01 8.920e+01 9.710e+01 1.340e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-24 00:47:57,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2610153.3333333335, ans=0.125 2023-11-24 00:48:05,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-24 00:48:15,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2610220.0, ans=0.125 2023-11-24 00:48:20,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2610220.0, ans=0.125 2023-11-24 00:48:28,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391550 2023-11-24 00:48:34,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2610353.3333333335, ans=0.1 2023-11-24 00:48:46,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2610420.0, ans=0.0 2023-11-24 00:48:48,395 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6800, loss[loss=0.05785, simple_loss=0.07896, pruned_loss=0.009897, audio_tagging_loss=0.008468, over 15202.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.0918, pruned_loss=0.01347, audio_tagging_loss=0.0088, over 3045857.84 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 32.0 2023-11-24 00:49:14,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2610553.3333333335, ans=0.1 2023-11-24 00:49:22,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2610553.3333333335, ans=0.0 2023-11-24 00:49:29,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2610620.0, ans=0.125 2023-11-24 00:49:31,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391600 2023-11-24 00:49:44,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-24 00:49:50,164 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6850, loss[loss=0.07513, simple_loss=0.1069, pruned_loss=0.01591, audio_tagging_loss=0.00576, over 15756.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09199, pruned_loss=0.01343, audio_tagging_loss=0.008776, over 3047142.75 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:49:56,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.434e+01 9.033e+01 9.858e+01 1.368e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 00:50:10,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2610820.0, ans=0.0 2023-11-24 00:50:28,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2610953.3333333335, ans=0.0 2023-11-24 00:50:30,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2610953.3333333335, ans=0.125 2023-11-24 00:50:34,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391650 2023-11-24 00:50:43,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2611020.0, ans=0.1 2023-11-24 00:50:52,592 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6900, loss[loss=0.05873, simple_loss=0.07712, pruned_loss=0.009585, audio_tagging_loss=0.01058, over 14923.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09109, pruned_loss=0.01316, audio_tagging_loss=0.008808, over 3045346.49 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:50:57,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2611086.6666666665, ans=0.125 2023-11-24 00:51:04,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2611153.3333333335, ans=0.125 2023-11-24 00:51:08,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-24 00:51:28,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2611220.0, ans=0.0 2023-11-24 00:51:29,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2611286.6666666665, ans=0.125 2023-11-24 00:51:30,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2611286.6666666665, ans=0.125 2023-11-24 00:51:32,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-24 00:51:36,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391700 2023-11-24 00:51:41,368 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 00:51:47,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2611353.3333333335, ans=0.125 2023-11-24 00:51:54,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-24 00:51:55,495 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 6950, loss[loss=0.07838, simple_loss=0.09828, pruned_loss=0.01788, audio_tagging_loss=0.01137, over 15274.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09025, pruned_loss=0.01309, audio_tagging_loss=0.008936, over 3042052.32 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 00:52:03,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.055e+01 8.470e+01 9.036e+01 9.958e+01 1.197e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 00:52:31,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-24 00:52:38,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391750 2023-11-24 00:52:38,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2611620.0, ans=0.125 2023-11-24 00:52:51,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2611686.6666666665, ans=0.125 2023-11-24 00:52:57,554 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7000, loss[loss=0.06364, simple_loss=0.08184, pruned_loss=0.01237, audio_tagging_loss=0.01035, over 15959.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09109, pruned_loss=0.01319, audio_tagging_loss=0.008965, over 3038361.28 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 00:53:02,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2611753.3333333335, ans=0.125 2023-11-24 00:53:03,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2611753.3333333335, ans=0.125 2023-11-24 00:53:06,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2611753.3333333335, ans=0.1 2023-11-24 00:53:06,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=2611753.3333333335, ans=0.2 2023-11-24 00:53:18,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2611820.0, ans=0.0 2023-11-24 00:53:23,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2611886.6666666665, ans=0.125 2023-11-24 00:53:41,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391800 2023-11-24 00:53:59,717 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7050, loss[loss=0.05813, simple_loss=0.07679, pruned_loss=0.008805, audio_tagging_loss=0.01093, over 14912.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09083, pruned_loss=0.01314, audio_tagging_loss=0.009129, over 3035763.89 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 00:54:07,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.402e+01 9.074e+01 1.006e+02 1.351e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 00:54:16,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2612153.3333333335, ans=0.125 2023-11-24 00:54:22,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2612153.3333333335, ans=0.1 2023-11-24 00:54:26,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2612220.0, ans=0.0 2023-11-24 00:54:29,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2612220.0, ans=0.125 2023-11-24 00:54:43,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391850 2023-11-24 00:54:57,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2612353.3333333335, ans=0.07 2023-11-24 00:55:02,808 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7100, loss[loss=0.06188, simple_loss=0.08347, pruned_loss=0.01208, audio_tagging_loss=0.008064, over 14266.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09157, pruned_loss=0.01333, audio_tagging_loss=0.009105, over 3043971.41 frames. ], batch size: 53, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 00:55:05,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2612420.0, ans=0.0 2023-11-24 00:55:12,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-24 00:55:17,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-24 00:55:20,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2612486.6666666665, ans=0.0 2023-11-24 00:55:40,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2612620.0, ans=0.125 2023-11-24 00:55:43,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2612620.0, ans=0.0 2023-11-24 00:55:45,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391900 2023-11-24 00:55:48,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2612620.0, ans=0.125 2023-11-24 00:56:03,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-24 00:56:05,201 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7150, loss[loss=0.0563, simple_loss=0.07438, pruned_loss=0.009999, audio_tagging_loss=0.009112, over 15634.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09236, pruned_loss=0.01345, audio_tagging_loss=0.009138, over 3039578.13 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 00:56:11,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2612753.3333333335, ans=0.09899494936611666 2023-11-24 00:56:12,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.358e+01 9.047e+01 9.923e+01 1.303e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 00:56:49,174 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 391950 2023-11-24 00:57:05,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2613086.6666666665, ans=10.0 2023-11-24 00:57:06,669 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7200, loss[loss=0.05973, simple_loss=0.07993, pruned_loss=0.01034, audio_tagging_loss=0.009425, over 15198.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09234, pruned_loss=0.01335, audio_tagging_loss=0.009139, over 3043413.46 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:57:09,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2613086.6666666665, ans=0.0 2023-11-24 00:57:18,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=8.0 2023-11-24 00:57:36,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-24 00:57:50,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392000 2023-11-24 00:57:52,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2613286.6666666665, ans=0.125 2023-11-24 00:58:12,144 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7250, loss[loss=0.06523, simple_loss=0.08786, pruned_loss=0.01444, audio_tagging_loss=0.006868, over 14624.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.09319, pruned_loss=0.01356, audio_tagging_loss=0.009219, over 3043564.89 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:58:20,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.844e+01 9.320e+01 1.006e+02 1.273e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-24 00:58:32,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2613486.6666666665, ans=0.0 2023-11-24 00:58:39,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2613553.3333333335, ans=0.125 2023-11-24 00:58:47,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-11-24 00:58:55,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392050 2023-11-24 00:58:55,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-24 00:59:15,261 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7300, loss[loss=0.0594, simple_loss=0.07192, pruned_loss=0.01233, audio_tagging_loss=0.01111, over 14578.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09281, pruned_loss=0.01346, audio_tagging_loss=0.00908, over 3046475.51 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 00:59:35,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2613820.0, ans=0.0 2023-11-24 00:59:38,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2613886.6666666665, ans=0.0 2023-11-24 00:59:40,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2613886.6666666665, ans=0.125 2023-11-24 00:59:41,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2613886.6666666665, ans=0.2 2023-11-24 00:59:58,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392100 2023-11-24 01:00:05,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2614020.0, ans=0.2 2023-11-24 01:00:10,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2614020.0, ans=0.125 2023-11-24 01:00:14,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2614020.0, ans=0.2 2023-11-24 01:00:16,220 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7350, loss[loss=0.08035, simple_loss=0.1174, pruned_loss=0.01407, audio_tagging_loss=0.007579, over 16289.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09207, pruned_loss=0.01336, audio_tagging_loss=0.008883, over 3048222.80 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:00:23,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.518e+01 9.030e+01 9.706e+01 1.263e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 01:00:38,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2614153.3333333335, ans=0.1 2023-11-24 01:00:52,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-11-24 01:01:00,287 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392150 2023-11-24 01:01:05,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2614353.3333333335, ans=0.125 2023-11-24 01:01:07,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2614353.3333333335, ans=0.0 2023-11-24 01:01:07,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2614353.3333333335, ans=0.0 2023-11-24 01:01:18,084 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7400, loss[loss=0.08754, simple_loss=0.1224, pruned_loss=0.01734, audio_tagging_loss=0.008981, over 15985.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09252, pruned_loss=0.01326, audio_tagging_loss=0.008746, over 3054043.28 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:01:44,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2614553.3333333335, ans=0.1 2023-11-24 01:02:01,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2614620.0, ans=0.125 2023-11-24 01:02:02,061 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392200 2023-11-24 01:02:03,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2614620.0, ans=0.0 2023-11-24 01:02:21,662 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7450, loss[loss=0.07214, simple_loss=0.1111, pruned_loss=0.01025, audio_tagging_loss=0.006355, over 14646.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09293, pruned_loss=0.01324, audio_tagging_loss=0.008713, over 3056744.90 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:02:27,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2614753.3333333335, ans=0.1 2023-11-24 01:02:27,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2614753.3333333335, ans=0.1 2023-11-24 01:02:28,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.459e+01 9.049e+01 9.753e+01 1.412e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 01:02:39,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2614820.0, ans=0.0 2023-11-24 01:02:52,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2614886.6666666665, ans=0.04949747468305833 2023-11-24 01:03:05,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392250 2023-11-24 01:03:21,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2615020.0, ans=0.125 2023-11-24 01:03:23,816 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7500, loss[loss=0.04935, simple_loss=0.06738, pruned_loss=0.008384, audio_tagging_loss=0.00728, over 14760.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09243, pruned_loss=0.01319, audio_tagging_loss=0.008784, over 3055270.36 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:03:31,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2023-11-24 01:03:33,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2615086.6666666665, ans=0.0 2023-11-24 01:03:34,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2615153.3333333335, ans=0.1 2023-11-24 01:03:39,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2615153.3333333335, ans=15.0 2023-11-24 01:04:07,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392300 2023-11-24 01:04:09,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2615286.6666666665, ans=0.125 2023-11-24 01:04:20,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2615353.3333333335, ans=15.0 2023-11-24 01:04:25,646 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7550, loss[loss=0.05651, simple_loss=0.07842, pruned_loss=0.008329, audio_tagging_loss=0.008965, over 15373.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09196, pruned_loss=0.01322, audio_tagging_loss=0.008761, over 3056151.10 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:04:29,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2615420.0, ans=0.07 2023-11-24 01:04:33,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.602e+01 9.110e+01 9.644e+01 1.333e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-24 01:04:50,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2615553.3333333335, ans=0.125 2023-11-24 01:05:01,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2615553.3333333335, ans=0.0 2023-11-24 01:05:09,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392350 2023-11-24 01:05:25,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2615686.6666666665, ans=0.125 2023-11-24 01:05:25,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2615686.6666666665, ans=0.0 2023-11-24 01:05:29,043 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7600, loss[loss=0.08294, simple_loss=0.1054, pruned_loss=0.022, audio_tagging_loss=0.008241, over 15146.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.0906, pruned_loss=0.01326, audio_tagging_loss=0.00885, over 3048709.19 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 32.0 2023-11-24 01:05:39,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2615753.3333333335, ans=0.0 2023-11-24 01:05:41,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2615820.0, ans=0.125 2023-11-24 01:06:02,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2615886.6666666665, ans=0.125 2023-11-24 01:06:12,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392400 2023-11-24 01:06:30,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2616086.6666666665, ans=0.0 2023-11-24 01:06:31,604 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7650, loss[loss=0.07999, simple_loss=0.1093, pruned_loss=0.01493, audio_tagging_loss=0.01041, over 14288.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09099, pruned_loss=0.01332, audio_tagging_loss=0.008793, over 3051100.94 frames. ], batch size: 53, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:06:39,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-24 01:06:39,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.392e+01 8.875e+01 9.417e+01 1.557e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-24 01:06:42,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2023-11-24 01:07:07,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2616286.6666666665, ans=0.0 2023-11-24 01:07:14,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2616286.6666666665, ans=0.1 2023-11-24 01:07:15,523 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392450 2023-11-24 01:07:21,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2616353.3333333335, ans=0.125 2023-11-24 01:07:28,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2616353.3333333335, ans=0.0 2023-11-24 01:07:33,365 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7700, loss[loss=0.05467, simple_loss=0.07914, pruned_loss=0.01049, audio_tagging_loss=0.004604, over 15144.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09129, pruned_loss=0.01334, audio_tagging_loss=0.008732, over 3045523.89 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:07:39,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2616420.0, ans=0.035 2023-11-24 01:07:40,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2616420.0, ans=0.0 2023-11-24 01:07:58,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-11-24 01:08:10,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2616620.0, ans=0.5 2023-11-24 01:08:17,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392500 2023-11-24 01:08:24,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2616686.6666666665, ans=0.125 2023-11-24 01:08:33,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2616686.6666666665, ans=0.2 2023-11-24 01:08:36,708 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7750, loss[loss=0.06686, simple_loss=0.08515, pruned_loss=0.01232, audio_tagging_loss=0.01197, over 15648.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09137, pruned_loss=0.01341, audio_tagging_loss=0.008756, over 3046501.27 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:08:45,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.416e+01 8.954e+01 9.884e+01 1.252e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-24 01:08:54,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2023-11-24 01:09:08,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2616886.6666666665, ans=0.0 2023-11-24 01:09:14,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-24 01:09:19,631 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392550 2023-11-24 01:09:34,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:09:38,490 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7800, loss[loss=0.07549, simple_loss=0.09323, pruned_loss=0.02058, audio_tagging_loss=0.008297, over 14857.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09237, pruned_loss=0.01367, audio_tagging_loss=0.008705, over 3046000.67 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:09:42,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2617086.6666666665, ans=0.1 2023-11-24 01:09:46,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2617086.6666666665, ans=0.2 2023-11-24 01:10:05,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2617220.0, ans=0.125 2023-11-24 01:10:08,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2617220.0, ans=0.125 2023-11-24 01:10:18,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-24 01:10:21,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=12.0 2023-11-24 01:10:22,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392600 2023-11-24 01:10:26,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2617286.6666666665, ans=0.125 2023-11-24 01:10:29,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.08 vs. limit=10.0 2023-11-24 01:10:36,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2617353.3333333335, ans=0.0 2023-11-24 01:10:41,565 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7850, loss[loss=0.1024, simple_loss=0.1344, pruned_loss=0.03002, audio_tagging_loss=0.005186, over 15480.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.0924, pruned_loss=0.01371, audio_tagging_loss=0.008822, over 3049987.99 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:10:45,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2023-11-24 01:10:49,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.631e+01 9.206e+01 9.904e+01 1.275e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-24 01:11:07,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2023-11-24 01:11:25,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392650 2023-11-24 01:11:25,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2617620.0, ans=0.125 2023-11-24 01:11:43,256 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7900, loss[loss=0.0716, simple_loss=0.1034, pruned_loss=0.01396, audio_tagging_loss=0.005945, over 15411.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09266, pruned_loss=0.01381, audio_tagging_loss=0.009014, over 3052376.30 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:11:43,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2617753.3333333335, ans=0.125 2023-11-24 01:11:52,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2023-11-24 01:11:53,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2023-11-24 01:12:26,908 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392700 2023-11-24 01:12:45,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2618086.6666666665, ans=0.125 2023-11-24 01:12:46,551 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 7950, loss[loss=0.06426, simple_loss=0.08545, pruned_loss=0.01183, audio_tagging_loss=0.009708, over 14897.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09284, pruned_loss=0.01384, audio_tagging_loss=0.009166, over 3046734.56 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:12:54,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.531e+01 8.932e+01 9.571e+01 1.197e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 01:12:56,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2618086.6666666665, ans=0.0 2023-11-24 01:12:59,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=22.5 2023-11-24 01:13:03,666 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:13:05,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2618153.3333333335, ans=0.125 2023-11-24 01:13:23,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2618286.6666666665, ans=0.125 2023-11-24 01:13:30,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392750 2023-11-24 01:13:48,831 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8000, loss[loss=0.0677, simple_loss=0.08676, pruned_loss=0.01245, audio_tagging_loss=0.01187, over 16149.00 frames. ], tot_loss[loss=0.06944, simple_loss=0.09273, pruned_loss=0.0138, audio_tagging_loss=0.009271, over 3045539.32 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:13:58,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2618420.0, ans=0.125 2023-11-24 01:14:23,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2618553.3333333335, ans=0.125 2023-11-24 01:14:25,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2618620.0, ans=0.125 2023-11-24 01:14:27,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2618620.0, ans=0.125 2023-11-24 01:14:32,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392800 2023-11-24 01:14:47,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-24 01:14:49,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2618753.3333333335, ans=0.2 2023-11-24 01:14:50,741 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8050, loss[loss=0.05011, simple_loss=0.05334, pruned_loss=0.009764, audio_tagging_loss=0.01368, over 14515.00 frames. ], tot_loss[loss=0.06923, simple_loss=0.09213, pruned_loss=0.01372, audio_tagging_loss=0.009443, over 3039183.15 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:15:00,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-24 01:15:01,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.420e+01 8.885e+01 9.707e+01 1.213e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-24 01:15:04,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2618820.0, ans=0.2 2023-11-24 01:15:06,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=22.5 2023-11-24 01:15:18,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2618886.6666666665, ans=0.125 2023-11-24 01:15:20,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2618886.6666666665, ans=0.0 2023-11-24 01:15:34,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392850 2023-11-24 01:15:50,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2619020.0, ans=0.125 2023-11-24 01:15:54,257 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8100, loss[loss=0.06992, simple_loss=0.09119, pruned_loss=0.01592, audio_tagging_loss=0.008404, over 15275.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.09271, pruned_loss=0.01385, audio_tagging_loss=0.009288, over 3045006.84 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:16:01,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2619086.6666666665, ans=0.05 2023-11-24 01:16:17,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2619220.0, ans=0.0 2023-11-24 01:16:22,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2619220.0, ans=0.125 2023-11-24 01:16:38,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392900 2023-11-24 01:16:56,156 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8150, loss[loss=0.07073, simple_loss=0.08876, pruned_loss=0.01466, audio_tagging_loss=0.01168, over 14670.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09206, pruned_loss=0.01368, audio_tagging_loss=0.009198, over 3043689.02 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:16:56,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2619420.0, ans=0.125 2023-11-24 01:17:01,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2619420.0, ans=0.125 2023-11-24 01:17:06,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.428e+01 9.309e+01 1.006e+02 1.505e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-24 01:17:32,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2619553.3333333335, ans=0.125 2023-11-24 01:17:35,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2619620.0, ans=0.125 2023-11-24 01:17:40,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 392950 2023-11-24 01:17:58,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2619753.3333333335, ans=0.0 2023-11-24 01:17:59,133 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8200, loss[loss=0.05964, simple_loss=0.07531, pruned_loss=0.0123, audio_tagging_loss=0.009692, over 14865.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09167, pruned_loss=0.01351, audio_tagging_loss=0.009114, over 3050626.12 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:18:01,536 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:18:06,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2619753.3333333335, ans=0.2 2023-11-24 01:18:19,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2619820.0, ans=0.09899494936611666 2023-11-24 01:18:43,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393000 2023-11-24 01:18:46,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=12.0 2023-11-24 01:18:54,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2620020.0, ans=0.0 2023-11-24 01:18:56,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-24 01:19:03,397 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8250, loss[loss=0.0589, simple_loss=0.07998, pruned_loss=0.009887, audio_tagging_loss=0.00902, over 15120.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09108, pruned_loss=0.01339, audio_tagging_loss=0.009031, over 3045131.23 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:19:13,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.535e+01 8.227e+01 9.050e+01 1.002e+02 1.257e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 01:19:19,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2620153.3333333335, ans=0.2 2023-11-24 01:19:26,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2620220.0, ans=0.125 2023-11-24 01:19:28,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2620220.0, ans=0.125 2023-11-24 01:19:35,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2620220.0, ans=0.2 2023-11-24 01:19:47,618 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393050 2023-11-24 01:19:57,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-24 01:20:03,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2620353.3333333335, ans=0.0 2023-11-24 01:20:05,501 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8300, loss[loss=0.05699, simple_loss=0.07471, pruned_loss=0.01057, audio_tagging_loss=0.009056, over 16545.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.0907, pruned_loss=0.0134, audio_tagging_loss=0.009017, over 3052331.37 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:20:10,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2620420.0, ans=0.125 2023-11-24 01:20:14,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-24 01:20:32,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2620553.3333333335, ans=0.125 2023-11-24 01:20:49,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393100 2023-11-24 01:20:51,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2620620.0, ans=0.125 2023-11-24 01:21:06,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-11-24 01:21:07,462 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8350, loss[loss=0.0583, simple_loss=0.07701, pruned_loss=0.01064, audio_tagging_loss=0.009159, over 16148.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09115, pruned_loss=0.01336, audio_tagging_loss=0.008989, over 3054236.01 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:21:19,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.568e+01 9.324e+01 1.020e+02 1.487e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 01:21:39,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2620886.6666666665, ans=0.2 2023-11-24 01:21:51,031 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393150 2023-11-24 01:22:09,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2621020.0, ans=0.025 2023-11-24 01:22:11,142 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8400, loss[loss=0.07001, simple_loss=0.09634, pruned_loss=0.01333, audio_tagging_loss=0.008504, over 15535.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09013, pruned_loss=0.0131, audio_tagging_loss=0.009002, over 3054204.53 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:22:11,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2621086.6666666665, ans=0.125 2023-11-24 01:22:14,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2621086.6666666665, ans=0.125 2023-11-24 01:22:16,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2023-11-24 01:22:35,236 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:22:54,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393200 2023-11-24 01:23:12,777 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8450, loss[loss=0.05928, simple_loss=0.07893, pruned_loss=0.01068, audio_tagging_loss=0.009134, over 16111.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09024, pruned_loss=0.01315, audio_tagging_loss=0.009015, over 3049871.00 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:23:21,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2621420.0, ans=0.0 2023-11-24 01:23:23,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.428e+01 8.989e+01 9.532e+01 1.176e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-24 01:23:32,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2023-11-24 01:23:51,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2621620.0, ans=0.125 2023-11-24 01:23:56,093 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393250 2023-11-24 01:24:03,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2621686.6666666665, ans=0.07 2023-11-24 01:24:13,899 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8500, loss[loss=0.07185, simple_loss=0.1009, pruned_loss=0.01428, audio_tagging_loss=0.007125, over 15430.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09035, pruned_loss=0.01328, audio_tagging_loss=0.009041, over 3049005.17 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:24:31,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2621820.0, ans=0.125 2023-11-24 01:24:32,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2621820.0, ans=15.0 2023-11-24 01:24:33,475 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:24:35,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-11-24 01:24:57,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393300 2023-11-24 01:25:04,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2023-11-24 01:25:15,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.38 vs. limit=10.0 2023-11-24 01:25:17,338 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8550, loss[loss=0.06727, simple_loss=0.09227, pruned_loss=0.01156, audio_tagging_loss=0.009573, over 14399.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09104, pruned_loss=0.0135, audio_tagging_loss=0.009019, over 3052671.10 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:25:18,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2622086.6666666665, ans=0.05 2023-11-24 01:25:19,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2622086.6666666665, ans=0.125 2023-11-24 01:25:25,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2622086.6666666665, ans=0.125 2023-11-24 01:25:29,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.811e+01 9.309e+01 9.853e+01 1.247e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-24 01:25:58,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2622286.6666666665, ans=0.125 2023-11-24 01:25:59,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393350 2023-11-24 01:26:08,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2622353.3333333335, ans=0.2 2023-11-24 01:26:10,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-24 01:26:18,349 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8600, loss[loss=0.09032, simple_loss=0.1324, pruned_loss=0.01811, audio_tagging_loss=0.006024, over 16066.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09002, pruned_loss=0.01319, audio_tagging_loss=0.009159, over 3049195.22 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:26:37,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-24 01:26:44,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2622553.3333333335, ans=0.2 2023-11-24 01:26:50,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2622553.3333333335, ans=0.0 2023-11-24 01:26:51,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2622553.3333333335, ans=0.125 2023-11-24 01:26:57,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2023-11-24 01:26:58,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2622620.0, ans=0.0 2023-11-24 01:27:01,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393400 2023-11-24 01:27:08,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2622686.6666666665, ans=0.125 2023-11-24 01:27:19,487 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8650, loss[loss=0.05551, simple_loss=0.07059, pruned_loss=0.0103, audio_tagging_loss=0.009913, over 14883.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.0899, pruned_loss=0.01318, audio_tagging_loss=0.009263, over 3053760.25 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:27:20,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2622753.3333333335, ans=0.0 2023-11-24 01:27:23,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2622753.3333333335, ans=0.0 2023-11-24 01:27:25,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2622753.3333333335, ans=0.1 2023-11-24 01:27:32,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.465e+01 9.032e+01 9.917e+01 1.291e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 01:27:44,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2622886.6666666665, ans=0.125 2023-11-24 01:27:45,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-24 01:27:54,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2622886.6666666665, ans=22.5 2023-11-24 01:28:03,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393450 2023-11-24 01:28:08,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2623020.0, ans=0.125 2023-11-24 01:28:22,590 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8700, loss[loss=0.08285, simple_loss=0.1062, pruned_loss=0.01903, audio_tagging_loss=0.01072, over 14531.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09096, pruned_loss=0.0134, audio_tagging_loss=0.00932, over 3053495.72 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:28:51,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2623220.0, ans=0.07 2023-11-24 01:29:05,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393500 2023-11-24 01:29:23,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2623420.0, ans=0.1 2023-11-24 01:29:24,766 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8750, loss[loss=0.05187, simple_loss=0.06181, pruned_loss=0.009241, audio_tagging_loss=0.01173, over 15515.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09088, pruned_loss=0.01348, audio_tagging_loss=0.009445, over 3054420.97 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:29:33,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2623420.0, ans=0.0 2023-11-24 01:29:36,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.417e+01 9.143e+01 1.010e+02 1.352e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 01:29:38,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2623486.6666666665, ans=0.0 2023-11-24 01:29:48,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2623553.3333333335, ans=0.125 2023-11-24 01:29:54,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2623553.3333333335, ans=0.125 2023-11-24 01:29:57,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2623553.3333333335, ans=0.2 2023-11-24 01:30:07,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393550 2023-11-24 01:30:07,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2623620.0, ans=0.125 2023-11-24 01:30:08,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-11-24 01:30:12,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-11-24 01:30:14,252 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:30:25,614 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8800, loss[loss=0.08374, simple_loss=0.1173, pruned_loss=0.01722, audio_tagging_loss=0.007883, over 14330.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09173, pruned_loss=0.01359, audio_tagging_loss=0.009408, over 3055301.53 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:30:27,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-24 01:30:42,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2623820.0, ans=0.035 2023-11-24 01:30:46,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-11-24 01:30:47,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2623820.0, ans=0.07 2023-11-24 01:30:51,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2623886.6666666665, ans=0.09899494936611666 2023-11-24 01:30:56,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2623886.6666666665, ans=0.0 2023-11-24 01:31:08,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393600 2023-11-24 01:31:10,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2623953.3333333335, ans=0.2 2023-11-24 01:31:15,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2624020.0, ans=0.125 2023-11-24 01:31:27,893 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8850, loss[loss=0.08727, simple_loss=0.1142, pruned_loss=0.02242, audio_tagging_loss=0.007727, over 15418.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.09187, pruned_loss=0.0136, audio_tagging_loss=0.009374, over 3044552.65 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:31:39,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.328e+01 9.022e+01 9.725e+01 1.238e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-24 01:31:40,859 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:32:00,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2624220.0, ans=0.0 2023-11-24 01:32:09,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393650 2023-11-24 01:32:14,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2624286.6666666665, ans=0.125 2023-11-24 01:32:19,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2624353.3333333335, ans=0.1 2023-11-24 01:32:26,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-24 01:32:28,679 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8900, loss[loss=0.08256, simple_loss=0.1278, pruned_loss=0.01333, audio_tagging_loss=0.00535, over 16215.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.0926, pruned_loss=0.01372, audio_tagging_loss=0.009228, over 3046336.43 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:32:39,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2624420.0, ans=0.125 2023-11-24 01:33:05,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2624620.0, ans=0.125 2023-11-24 01:33:12,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393700 2023-11-24 01:33:30,357 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 8950, loss[loss=0.08118, simple_loss=0.1127, pruned_loss=0.01881, audio_tagging_loss=0.006033, over 15877.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.09242, pruned_loss=0.01383, audio_tagging_loss=0.009064, over 3040358.78 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:33:31,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2624753.3333333335, ans=0.09899494936611666 2023-11-24 01:33:34,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2023-11-24 01:33:35,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-11-24 01:33:42,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.467e+01 9.115e+01 9.992e+01 1.363e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 01:34:13,842 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393750 2023-11-24 01:34:20,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2625020.0, ans=0.125 2023-11-24 01:34:32,132 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9000, loss[loss=0.06629, simple_loss=0.09151, pruned_loss=0.011, audio_tagging_loss=0.009538, over 15259.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.09264, pruned_loss=0.01385, audio_tagging_loss=0.008981, over 3035607.32 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:34:32,133 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 01:35:10,891 INFO [train_asr.py:1253] (1/4) Epoch 33, validation: loss=0.05892, simple_loss=0.05094, pruned_loss=0.005119, audio_tagging_loss=0.02833, over 4681554.00 frames. 2023-11-24 01:35:10,892 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 01:35:26,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2625153.3333333335, ans=0.125 2023-11-24 01:35:28,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2625153.3333333335, ans=0.125 2023-11-24 01:35:34,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-24 01:35:44,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2625220.0, ans=0.125 2023-11-24 01:35:53,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393800 2023-11-24 01:35:55,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-24 01:36:12,544 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9050, loss[loss=0.05816, simple_loss=0.07769, pruned_loss=0.009486, audio_tagging_loss=0.009831, over 15329.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09262, pruned_loss=0.01369, audio_tagging_loss=0.008876, over 3034409.94 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:36:15,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2625420.0, ans=0.07 2023-11-24 01:36:24,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2625486.6666666665, ans=0.2 2023-11-24 01:36:25,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.819e+01 9.377e+01 1.005e+02 1.265e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-24 01:36:37,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2023-11-24 01:36:47,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2625553.3333333335, ans=0.0 2023-11-24 01:36:48,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=12.0 2023-11-24 01:36:52,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2625620.0, ans=10.0 2023-11-24 01:36:55,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393850 2023-11-24 01:37:05,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2625686.6666666665, ans=0.125 2023-11-24 01:37:14,622 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9100, loss[loss=0.06042, simple_loss=0.08338, pruned_loss=0.01078, audio_tagging_loss=0.007946, over 15309.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09221, pruned_loss=0.01353, audio_tagging_loss=0.008898, over 3039973.15 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:37:14,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2625753.3333333335, ans=0.125 2023-11-24 01:37:19,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2625753.3333333335, ans=0.125 2023-11-24 01:37:36,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2023-11-24 01:37:57,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393900 2023-11-24 01:37:57,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2625953.3333333335, ans=0.125 2023-11-24 01:38:11,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-24 01:38:13,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-24 01:38:15,236 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9150, loss[loss=0.06843, simple_loss=0.08992, pruned_loss=0.0137, audio_tagging_loss=0.009774, over 15004.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09093, pruned_loss=0.01337, audio_tagging_loss=0.008931, over 3041613.62 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:38:25,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=12.0 2023-11-24 01:38:27,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 7.991e+01 8.782e+01 9.725e+01 1.593e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-24 01:38:58,282 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 393950 2023-11-24 01:39:01,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2626286.6666666665, ans=0.125 2023-11-24 01:39:04,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2626353.3333333335, ans=0.1 2023-11-24 01:39:05,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2626353.3333333335, ans=0.0 2023-11-24 01:39:05,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2626353.3333333335, ans=0.125 2023-11-24 01:39:05,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-11-24 01:39:08,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-24 01:39:16,550 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9200, loss[loss=0.05713, simple_loss=0.07318, pruned_loss=0.0109, audio_tagging_loss=0.009638, over 14330.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09135, pruned_loss=0.01349, audio_tagging_loss=0.008794, over 3038131.76 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 32.0 2023-11-24 01:39:34,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2626486.6666666665, ans=0.2 2023-11-24 01:39:50,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2626553.3333333335, ans=0.125 2023-11-24 01:39:56,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2626620.0, ans=0.0 2023-11-24 01:39:58,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394000 2023-11-24 01:40:10,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2626686.6666666665, ans=0.0 2023-11-24 01:40:18,491 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9250, loss[loss=0.06117, simple_loss=0.08031, pruned_loss=0.01317, audio_tagging_loss=0.007848, over 14547.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09129, pruned_loss=0.01354, audio_tagging_loss=0.008856, over 3052005.58 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:40:18,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2626753.3333333335, ans=0.1 2023-11-24 01:40:31,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.331e+01 9.082e+01 9.914e+01 1.295e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 01:40:33,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2626820.0, ans=0.125 2023-11-24 01:40:49,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2626886.6666666665, ans=0.0 2023-11-24 01:40:52,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2626886.6666666665, ans=0.0 2023-11-24 01:41:00,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394050 2023-11-24 01:41:02,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2626953.3333333335, ans=0.1 2023-11-24 01:41:19,474 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9300, loss[loss=0.05876, simple_loss=0.08149, pruned_loss=0.01112, audio_tagging_loss=0.006895, over 15532.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09062, pruned_loss=0.01331, audio_tagging_loss=0.008944, over 3051207.43 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:41:27,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2627086.6666666665, ans=0.125 2023-11-24 01:41:28,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-24 01:41:35,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=12.0 2023-11-24 01:41:43,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2627220.0, ans=0.125 2023-11-24 01:41:48,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2627220.0, ans=0.0 2023-11-24 01:41:59,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2627286.6666666665, ans=0.2 2023-11-24 01:42:03,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394100 2023-11-24 01:42:21,240 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9350, loss[loss=0.08908, simple_loss=0.118, pruned_loss=0.02238, audio_tagging_loss=0.007696, over 15581.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09082, pruned_loss=0.01345, audio_tagging_loss=0.009016, over 3048895.89 frames. ], batch size: 53, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:42:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2627420.0, ans=0.125 2023-11-24 01:42:25,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2627420.0, ans=0.0 2023-11-24 01:42:36,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.369e+01 8.431e+01 9.003e+01 9.569e+01 1.110e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-24 01:42:37,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-11-24 01:42:42,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-24 01:43:04,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394150 2023-11-24 01:43:06,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2627620.0, ans=0.0 2023-11-24 01:43:21,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2627753.3333333335, ans=0.2 2023-11-24 01:43:22,668 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9400, loss[loss=0.08478, simple_loss=0.1151, pruned_loss=0.02198, audio_tagging_loss=0.005271, over 15466.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.0923, pruned_loss=0.0136, audio_tagging_loss=0.008915, over 3056411.02 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:43:33,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2627753.3333333335, ans=0.1 2023-11-24 01:43:39,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2627820.0, ans=0.125 2023-11-24 01:43:50,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2627886.6666666665, ans=0.0 2023-11-24 01:43:56,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2627886.6666666665, ans=0.125 2023-11-24 01:44:03,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2627953.3333333335, ans=0.125 2023-11-24 01:44:06,273 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394200 2023-11-24 01:44:15,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2628020.0, ans=0.125 2023-11-24 01:44:25,386 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9450, loss[loss=0.09701, simple_loss=0.14, pruned_loss=0.02169, audio_tagging_loss=0.0053, over 14195.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09161, pruned_loss=0.01357, audio_tagging_loss=0.009029, over 3054539.52 frames. ], batch size: 52, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:44:25,407 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:44:39,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.363e+01 8.998e+01 9.818e+01 1.304e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-24 01:45:01,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2628220.0, ans=0.2 2023-11-24 01:45:09,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394250 2023-11-24 01:45:09,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2628286.6666666665, ans=0.0 2023-11-24 01:45:14,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2628353.3333333335, ans=0.2 2023-11-24 01:45:16,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2628353.3333333335, ans=0.125 2023-11-24 01:45:26,616 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9500, loss[loss=0.07277, simple_loss=0.08548, pruned_loss=0.01782, audio_tagging_loss=0.0122, over 15220.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09201, pruned_loss=0.0137, audio_tagging_loss=0.009107, over 3057138.12 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:45:31,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2628420.0, ans=0.125 2023-11-24 01:45:34,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2628420.0, ans=0.125 2023-11-24 01:45:45,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2628486.6666666665, ans=0.125 2023-11-24 01:46:01,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2628553.3333333335, ans=0.125 2023-11-24 01:46:02,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-24 01:46:09,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394300 2023-11-24 01:46:27,670 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9550, loss[loss=0.08996, simple_loss=0.1166, pruned_loss=0.02298, audio_tagging_loss=0.00868, over 14857.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09149, pruned_loss=0.01368, audio_tagging_loss=0.009163, over 3052582.83 frames. ], batch size: 53, lr: 2.05e-03, grad_scale: 8.0 2023-11-24 01:46:30,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2628753.3333333335, ans=0.0 2023-11-24 01:46:42,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.519e+01 9.047e+01 9.704e+01 1.211e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 01:46:50,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2628820.0, ans=0.125 2023-11-24 01:46:58,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2628886.6666666665, ans=0.125 2023-11-24 01:47:10,256 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394350 2023-11-24 01:47:18,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2629020.0, ans=0.125 2023-11-24 01:47:29,289 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9600, loss[loss=0.06818, simple_loss=0.0839, pruned_loss=0.01276, audio_tagging_loss=0.01347, over 15229.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09112, pruned_loss=0.01352, audio_tagging_loss=0.00927, over 3057194.94 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:47:29,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2629086.6666666665, ans=0.125 2023-11-24 01:47:30,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2629086.6666666665, ans=0.025 2023-11-24 01:47:41,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2629153.3333333335, ans=0.125 2023-11-24 01:47:43,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2629153.3333333335, ans=0.2 2023-11-24 01:47:49,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2023-11-24 01:48:01,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-24 01:48:08,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2629286.6666666665, ans=0.125 2023-11-24 01:48:13,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394400 2023-11-24 01:48:31,237 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9650, loss[loss=0.06474, simple_loss=0.09256, pruned_loss=0.01197, audio_tagging_loss=0.006495, over 14772.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.09096, pruned_loss=0.01343, audio_tagging_loss=0.009216, over 3045139.96 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:48:36,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2629420.0, ans=0.2 2023-11-24 01:48:36,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2629420.0, ans=0.125 2023-11-24 01:48:45,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.217e+01 8.903e+01 9.505e+01 1.138e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-24 01:48:52,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2629486.6666666665, ans=0.125 2023-11-24 01:49:00,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2629553.3333333335, ans=0.2 2023-11-24 01:49:02,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2629553.3333333335, ans=0.125 2023-11-24 01:49:14,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394450 2023-11-24 01:49:17,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2629620.0, ans=0.2 2023-11-24 01:49:30,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2629686.6666666665, ans=0.125 2023-11-24 01:49:32,170 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9700, loss[loss=0.07917, simple_loss=0.1116, pruned_loss=0.0182, audio_tagging_loss=0.005155, over 14767.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09138, pruned_loss=0.01362, audio_tagging_loss=0.008987, over 3044621.27 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:49:58,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2629886.6666666665, ans=0.1 2023-11-24 01:49:59,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-24 01:50:15,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394500 2023-11-24 01:50:34,838 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9750, loss[loss=0.07198, simple_loss=0.104, pruned_loss=0.009963, audio_tagging_loss=0.01005, over 15447.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09182, pruned_loss=0.01347, audio_tagging_loss=0.008807, over 3043720.70 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:50:37,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2630086.6666666665, ans=0.125 2023-11-24 01:50:49,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.620e+01 9.194e+01 9.972e+01 1.344e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 01:50:50,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2630153.3333333335, ans=0.125 2023-11-24 01:51:05,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2630220.0, ans=0.125 2023-11-24 01:51:06,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-24 01:51:17,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394550 2023-11-24 01:51:36,064 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9800, loss[loss=0.05961, simple_loss=0.0811, pruned_loss=0.008343, audio_tagging_loss=0.01072, over 15642.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09161, pruned_loss=0.01338, audio_tagging_loss=0.008882, over 3043035.79 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:51:38,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2630420.0, ans=0.125 2023-11-24 01:51:57,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2630486.6666666665, ans=0.1 2023-11-24 01:52:00,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2630553.3333333335, ans=0.1 2023-11-24 01:52:04,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2630553.3333333335, ans=0.125 2023-11-24 01:52:19,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394600 2023-11-24 01:52:33,007 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:52:37,795 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9850, loss[loss=0.089, simple_loss=0.1211, pruned_loss=0.02256, audio_tagging_loss=0.005904, over 15876.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.0918, pruned_loss=0.01337, audio_tagging_loss=0.008748, over 3045160.26 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:52:52,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2630820.0, ans=0.2 2023-11-24 01:52:53,141 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.642e+01 9.395e+01 1.002e+02 1.404e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-24 01:52:57,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-24 01:53:01,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2630820.0, ans=0.95 2023-11-24 01:53:03,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2630886.6666666665, ans=0.2 2023-11-24 01:53:21,232 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394650 2023-11-24 01:53:23,680 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:53:28,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2631020.0, ans=0.025 2023-11-24 01:53:28,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2631020.0, ans=0.2 2023-11-24 01:53:40,116 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9900, loss[loss=0.07814, simple_loss=0.1054, pruned_loss=0.01769, audio_tagging_loss=0.007777, over 15498.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09234, pruned_loss=0.01348, audio_tagging_loss=0.008783, over 3043949.49 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:53:44,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-24 01:53:51,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2631153.3333333335, ans=0.0 2023-11-24 01:54:03,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-11-24 01:54:07,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2631220.0, ans=0.125 2023-11-24 01:54:12,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-24 01:54:19,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-24 01:54:23,708 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394700 2023-11-24 01:54:32,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2631353.3333333335, ans=0.0 2023-11-24 01:54:39,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2631353.3333333335, ans=0.125 2023-11-24 01:54:42,056 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 9950, loss[loss=0.0685, simple_loss=0.09151, pruned_loss=0.01534, audio_tagging_loss=0.007411, over 16110.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09232, pruned_loss=0.0136, audio_tagging_loss=0.008766, over 3040585.10 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 16.0 2023-11-24 01:54:49,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-24 01:54:56,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.637e+01 9.131e+01 9.854e+01 1.265e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 01:55:01,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2631486.6666666665, ans=0.07 2023-11-24 01:55:03,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2631486.6666666665, ans=0.125 2023-11-24 01:55:19,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2631620.0, ans=0.125 2023-11-24 01:55:24,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394750 2023-11-24 01:55:40,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2631686.6666666665, ans=0.1 2023-11-24 01:55:42,556 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10000, loss[loss=0.06203, simple_loss=0.08275, pruned_loss=0.009308, audio_tagging_loss=0.01134, over 14464.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09202, pruned_loss=0.01363, audio_tagging_loss=0.008721, over 3045089.29 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 32.0 2023-11-24 01:56:15,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2631886.6666666665, ans=0.125 2023-11-24 01:56:16,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2631886.6666666665, ans=0.125 2023-11-24 01:56:17,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2631886.6666666665, ans=0.1 2023-11-24 01:56:25,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394800 2023-11-24 01:56:36,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2632020.0, ans=0.0 2023-11-24 01:56:38,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2632020.0, ans=0.0 2023-11-24 01:56:43,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2632086.6666666665, ans=0.1 2023-11-24 01:56:45,104 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10050, loss[loss=0.06112, simple_loss=0.08091, pruned_loss=0.01406, audio_tagging_loss=0.0066, over 14110.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09131, pruned_loss=0.01358, audio_tagging_loss=0.008783, over 3041952.23 frames. ], batch size: 55, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 01:56:56,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2632153.3333333335, ans=0.2 2023-11-24 01:56:57,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2632153.3333333335, ans=0.125 2023-11-24 01:56:59,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.389e+01 9.133e+01 9.609e+01 1.226e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 01:57:13,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2632220.0, ans=0.125 2023-11-24 01:57:24,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-24 01:57:27,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394850 2023-11-24 01:57:34,941 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 01:57:46,263 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10100, loss[loss=0.05318, simple_loss=0.07237, pruned_loss=0.008072, audio_tagging_loss=0.008927, over 16867.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09008, pruned_loss=0.01345, audio_tagging_loss=0.008917, over 3043108.34 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 01:57:55,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-24 01:58:13,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2632553.3333333335, ans=0.125 2023-11-24 01:58:29,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394900 2023-11-24 01:58:36,498 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:58:47,634 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10150, loss[loss=0.06772, simple_loss=0.09409, pruned_loss=0.0101, audio_tagging_loss=0.01057, over 14061.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09028, pruned_loss=0.01338, audio_tagging_loss=0.00904, over 3048600.57 frames. ], batch size: 52, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 01:58:52,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-24 01:59:02,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.517e+01 9.184e+01 1.003e+02 1.404e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-24 01:59:10,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-24 01:59:17,267 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 01:59:20,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-24 01:59:27,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2632953.3333333335, ans=0.125 2023-11-24 01:59:29,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2632953.3333333335, ans=0.125 2023-11-24 01:59:30,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 394950 2023-11-24 01:59:42,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-24 01:59:44,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2633020.0, ans=0.125 2023-11-24 01:59:49,426 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10200, loss[loss=0.06963, simple_loss=0.0836, pruned_loss=0.01509, audio_tagging_loss=0.01275, over 15189.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09, pruned_loss=0.01319, audio_tagging_loss=0.009104, over 3051261.80 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 01:59:51,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2633086.6666666665, ans=0.125 2023-11-24 01:59:54,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-24 01:59:55,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2633086.6666666665, ans=0.1 2023-11-24 02:00:03,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-24 02:00:14,006 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 02:00:22,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2633220.0, ans=0.125 2023-11-24 02:00:24,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2633220.0, ans=0.125 2023-11-24 02:00:32,828 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395000 2023-11-24 02:00:52,213 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10250, loss[loss=0.06096, simple_loss=0.09026, pruned_loss=0.007693, audio_tagging_loss=0.008131, over 14913.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09042, pruned_loss=0.01319, audio_tagging_loss=0.009185, over 3055365.54 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:00:54,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-11-24 02:00:57,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=12.0 2023-11-24 02:01:07,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.620e+01 9.148e+01 9.659e+01 1.254e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 02:01:08,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2023-11-24 02:01:23,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-24 02:01:36,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395050 2023-11-24 02:01:39,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2633620.0, ans=0.125 2023-11-24 02:01:54,953 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10300, loss[loss=0.06972, simple_loss=0.08921, pruned_loss=0.01489, audio_tagging_loss=0.01023, over 15255.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09038, pruned_loss=0.01317, audio_tagging_loss=0.009232, over 3055218.76 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:02:02,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2633753.3333333335, ans=0.2 2023-11-24 02:02:24,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2633886.6666666665, ans=0.125 2023-11-24 02:02:25,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2633886.6666666665, ans=0.125 2023-11-24 02:02:32,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2633953.3333333335, ans=0.1 2023-11-24 02:02:38,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395100 2023-11-24 02:02:44,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2634020.0, ans=0.0 2023-11-24 02:02:56,668 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10350, loss[loss=0.06983, simple_loss=0.09133, pruned_loss=0.0129, audio_tagging_loss=0.01126, over 14813.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09079, pruned_loss=0.01327, audio_tagging_loss=0.009276, over 3054989.80 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:03:12,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.449e+01 8.937e+01 9.719e+01 1.290e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-24 02:03:39,987 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395150 2023-11-24 02:03:43,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-24 02:03:45,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2634353.3333333335, ans=0.0 2023-11-24 02:03:49,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2634353.3333333335, ans=0.125 2023-11-24 02:03:58,867 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10400, loss[loss=0.07861, simple_loss=0.1146, pruned_loss=0.01428, audio_tagging_loss=0.007021, over 14668.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09183, pruned_loss=0.01347, audio_tagging_loss=0.009258, over 3055914.36 frames. ], batch size: 53, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:04:39,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2634620.0, ans=0.125 2023-11-24 02:04:41,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395200 2023-11-24 02:04:42,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2634620.0, ans=0.125 2023-11-24 02:04:59,974 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10450, loss[loss=0.0641, simple_loss=0.08949, pruned_loss=0.01281, audio_tagging_loss=0.006548, over 14106.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.0912, pruned_loss=0.01339, audio_tagging_loss=0.009284, over 3052988.46 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:05:14,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.359e+01 8.851e+01 9.617e+01 1.233e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-24 02:05:34,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2634886.6666666665, ans=0.0 2023-11-24 02:05:35,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-24 02:05:43,210 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395250 2023-11-24 02:05:46,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2634953.3333333335, ans=0.125 2023-11-24 02:05:55,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2635020.0, ans=0.125 2023-11-24 02:06:01,537 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10500, loss[loss=0.06359, simple_loss=0.08819, pruned_loss=0.01095, audio_tagging_loss=0.008537, over 15248.00 frames. ], tot_loss[loss=0.06882, simple_loss=0.09229, pruned_loss=0.01356, audio_tagging_loss=0.009117, over 3056478.36 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:06:10,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2635086.6666666665, ans=0.125 2023-11-24 02:06:11,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2023-11-24 02:06:25,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2635220.0, ans=0.0 2023-11-24 02:06:43,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2635286.6666666665, ans=0.0 2023-11-24 02:06:44,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395300 2023-11-24 02:06:48,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2635286.6666666665, ans=0.04949747468305833 2023-11-24 02:07:04,401 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10550, loss[loss=0.06674, simple_loss=0.09338, pruned_loss=0.008685, audio_tagging_loss=0.01137, over 14896.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09139, pruned_loss=0.01337, audio_tagging_loss=0.009049, over 3051442.50 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:07:06,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2635420.0, ans=0.0 2023-11-24 02:07:07,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2635420.0, ans=0.2 2023-11-24 02:07:13,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2635420.0, ans=0.0 2023-11-24 02:07:16,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2635486.6666666665, ans=0.1 2023-11-24 02:07:19,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.732e+01 9.283e+01 1.036e+02 1.632e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-24 02:07:47,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2023-11-24 02:07:48,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395350 2023-11-24 02:08:03,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2635686.6666666665, ans=0.1 2023-11-24 02:08:05,710 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10600, loss[loss=0.07736, simple_loss=0.1101, pruned_loss=0.01408, audio_tagging_loss=0.008208, over 15554.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09149, pruned_loss=0.01342, audio_tagging_loss=0.00898, over 3042835.13 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:08:19,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-24 02:08:47,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2635953.3333333335, ans=0.125 2023-11-24 02:08:48,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-24 02:08:48,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395400 2023-11-24 02:09:07,228 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10650, loss[loss=0.04191, simple_loss=0.05166, pruned_loss=0.006567, audio_tagging_loss=0.009506, over 14569.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09213, pruned_loss=0.01347, audio_tagging_loss=0.008898, over 3048097.49 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:09:11,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2636086.6666666665, ans=0.125 2023-11-24 02:09:21,652 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:09:23,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.571e+01 8.426e+01 9.214e+01 9.689e+01 1.168e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 02:09:51,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395450 2023-11-24 02:09:52,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2636286.6666666665, ans=0.1 2023-11-24 02:09:53,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2636286.6666666665, ans=0.0 2023-11-24 02:10:10,653 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10700, loss[loss=0.05129, simple_loss=0.06694, pruned_loss=0.009759, audio_tagging_loss=0.008061, over 14945.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09257, pruned_loss=0.01358, audio_tagging_loss=0.008806, over 3047208.56 frames. ], batch size: 55, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:10:13,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2636420.0, ans=0.0 2023-11-24 02:10:14,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-24 02:10:20,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2636420.0, ans=0.1 2023-11-24 02:10:26,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2636486.6666666665, ans=0.125 2023-11-24 02:10:32,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2636486.6666666665, ans=0.0 2023-11-24 02:10:40,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2636553.3333333335, ans=0.1 2023-11-24 02:10:40,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2636553.3333333335, ans=0.2 2023-11-24 02:10:41,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2636553.3333333335, ans=0.125 2023-11-24 02:10:53,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395500 2023-11-24 02:11:11,227 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10750, loss[loss=0.06045, simple_loss=0.08323, pruned_loss=0.01038, audio_tagging_loss=0.008456, over 15296.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09256, pruned_loss=0.01366, audio_tagging_loss=0.008803, over 3042707.33 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:11:26,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.296e+01 8.958e+01 9.992e+01 1.267e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-24 02:11:36,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2636886.6666666665, ans=0.125 2023-11-24 02:11:55,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395550 2023-11-24 02:12:12,788 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10800, loss[loss=0.06512, simple_loss=0.08989, pruned_loss=0.0136, audio_tagging_loss=0.006567, over 16168.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09255, pruned_loss=0.01358, audio_tagging_loss=0.008776, over 3036107.05 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:12:14,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2637086.6666666665, ans=0.125 2023-11-24 02:12:33,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2637153.3333333335, ans=0.0 2023-11-24 02:12:56,731 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395600 2023-11-24 02:13:06,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2637353.3333333335, ans=0.0 2023-11-24 02:13:16,931 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10850, loss[loss=0.05782, simple_loss=0.06992, pruned_loss=0.01147, audio_tagging_loss=0.0114, over 15198.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09164, pruned_loss=0.01352, audio_tagging_loss=0.008878, over 3037320.57 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:13:22,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2637420.0, ans=0.125 2023-11-24 02:13:33,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.225e+01 8.902e+01 9.397e+01 1.304e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-24 02:13:33,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2637486.6666666665, ans=0.0 2023-11-24 02:13:34,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=22.5 2023-11-24 02:13:46,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2637553.3333333335, ans=0.125 2023-11-24 02:13:52,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2023-11-24 02:13:52,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2637620.0, ans=0.035 2023-11-24 02:14:00,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395650 2023-11-24 02:14:15,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2637686.6666666665, ans=0.125 2023-11-24 02:14:16,147 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 02:14:18,576 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10900, loss[loss=0.07632, simple_loss=0.1022, pruned_loss=0.01596, audio_tagging_loss=0.009266, over 15784.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09257, pruned_loss=0.01365, audio_tagging_loss=0.008814, over 3037918.00 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:14:30,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2637820.0, ans=0.1 2023-11-24 02:14:31,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2637820.0, ans=0.0 2023-11-24 02:14:33,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-24 02:14:34,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2637820.0, ans=0.1 2023-11-24 02:14:59,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2637953.3333333335, ans=0.125 2023-11-24 02:15:01,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395700 2023-11-24 02:15:04,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2637953.3333333335, ans=0.0 2023-11-24 02:15:05,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2637953.3333333335, ans=0.0 2023-11-24 02:15:19,291 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 10950, loss[loss=0.05785, simple_loss=0.07495, pruned_loss=0.00992, audio_tagging_loss=0.01046, over 15291.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09266, pruned_loss=0.01377, audio_tagging_loss=0.008895, over 3032903.42 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:15:36,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.249e+01 9.167e+01 9.838e+01 1.256e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 02:15:48,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2638220.0, ans=0.1 2023-11-24 02:16:02,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395750 2023-11-24 02:16:07,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2638353.3333333335, ans=0.2 2023-11-24 02:16:08,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2638353.3333333335, ans=0.95 2023-11-24 02:16:14,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2638353.3333333335, ans=0.2 2023-11-24 02:16:18,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2638353.3333333335, ans=0.0 2023-11-24 02:16:22,051 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11000, loss[loss=0.0707, simple_loss=0.1024, pruned_loss=0.01211, audio_tagging_loss=0.007394, over 14508.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09248, pruned_loss=0.01371, audio_tagging_loss=0.008982, over 3036113.54 frames. ], batch size: 53, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:16:33,494 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 02:16:36,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2638486.6666666665, ans=0.125 2023-11-24 02:17:02,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2638620.0, ans=0.0 2023-11-24 02:17:04,755 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395800 2023-11-24 02:17:06,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-24 02:17:24,172 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11050, loss[loss=0.07383, simple_loss=0.09289, pruned_loss=0.01786, audio_tagging_loss=0.009528, over 15171.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09231, pruned_loss=0.01382, audio_tagging_loss=0.00905, over 3040108.14 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:17:27,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-24 02:17:40,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.311e+01 8.882e+01 9.606e+01 1.257e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-24 02:17:40,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2638820.0, ans=0.125 2023-11-24 02:17:55,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2023-11-24 02:17:57,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-24 02:18:07,191 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395850 2023-11-24 02:18:13,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2639020.0, ans=0.125 2023-11-24 02:18:15,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2639020.0, ans=0.1 2023-11-24 02:18:25,485 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11100, loss[loss=0.06522, simple_loss=0.0781, pruned_loss=0.01646, audio_tagging_loss=0.009713, over 13873.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09215, pruned_loss=0.01377, audio_tagging_loss=0.009087, over 3039406.38 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:18:54,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2639220.0, ans=0.1 2023-11-24 02:19:01,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2023-11-24 02:19:01,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2639286.6666666665, ans=0.0 2023-11-24 02:19:03,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-24 02:19:08,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395900 2023-11-24 02:19:11,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2639286.6666666665, ans=0.125 2023-11-24 02:19:27,077 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11150, loss[loss=0.05775, simple_loss=0.07181, pruned_loss=0.01296, audio_tagging_loss=0.008892, over 13991.00 frames. ], tot_loss[loss=0.06911, simple_loss=0.09279, pruned_loss=0.01364, audio_tagging_loss=0.009077, over 3042913.54 frames. ], batch size: 53, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:19:39,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2639486.6666666665, ans=0.025 2023-11-24 02:19:43,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2639486.6666666665, ans=0.0 2023-11-24 02:19:45,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.458e+01 9.119e+01 9.750e+01 1.333e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 02:19:52,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2639553.3333333335, ans=0.1 2023-11-24 02:19:56,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2639553.3333333335, ans=0.125 2023-11-24 02:20:10,023 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 395950 2023-11-24 02:20:18,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2639686.6666666665, ans=0.0 2023-11-24 02:20:27,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2639686.6666666665, ans=0.125 2023-11-24 02:20:29,242 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11200, loss[loss=0.06632, simple_loss=0.0842, pruned_loss=0.01384, audio_tagging_loss=0.01038, over 15992.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.09218, pruned_loss=0.01352, audio_tagging_loss=0.009161, over 3041773.08 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:20:34,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2639753.3333333335, ans=0.125 2023-11-24 02:20:53,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2639886.6666666665, ans=0.125 2023-11-24 02:20:55,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2639886.6666666665, ans=0.1 2023-11-24 02:21:12,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396000 2023-11-24 02:21:23,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2640020.0, ans=0.125 2023-11-24 02:21:34,632 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11250, loss[loss=0.04965, simple_loss=0.05794, pruned_loss=0.009193, audio_tagging_loss=0.01148, over 15040.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09175, pruned_loss=0.01349, audio_tagging_loss=0.009247, over 3045370.19 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:21:34,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2640086.6666666665, ans=0.125 2023-11-24 02:21:43,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2640086.6666666665, ans=0.2 2023-11-24 02:21:53,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.556e+01 8.444e+01 8.848e+01 9.459e+01 1.220e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-24 02:22:17,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2640286.6666666665, ans=0.125 2023-11-24 02:22:18,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396050 2023-11-24 02:22:22,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2640286.6666666665, ans=0.125 2023-11-24 02:22:36,415 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11300, loss[loss=0.06192, simple_loss=0.08147, pruned_loss=0.01202, audio_tagging_loss=0.00917, over 13713.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09065, pruned_loss=0.01343, audio_tagging_loss=0.00919, over 3037104.78 frames. ], batch size: 52, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:22:41,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-24 02:22:51,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2640486.6666666665, ans=0.1 2023-11-24 02:22:54,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-24 02:23:19,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396100 2023-11-24 02:23:38,686 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11350, loss[loss=0.07866, simple_loss=0.1109, pruned_loss=0.01581, audio_tagging_loss=0.007372, over 15335.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09148, pruned_loss=0.01361, audio_tagging_loss=0.009001, over 3045415.25 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:23:42,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2640753.3333333335, ans=0.0 2023-11-24 02:23:47,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2640753.3333333335, ans=0.0 2023-11-24 02:23:57,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.374e+01 8.369e+01 8.906e+01 9.576e+01 1.384e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-24 02:24:03,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2640886.6666666665, ans=0.125 2023-11-24 02:24:05,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2640886.6666666665, ans=0.5 2023-11-24 02:24:06,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2640886.6666666665, ans=0.2 2023-11-24 02:24:22,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396150 2023-11-24 02:24:31,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2641020.0, ans=0.1 2023-11-24 02:24:41,097 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11400, loss[loss=0.05628, simple_loss=0.08159, pruned_loss=0.009931, audio_tagging_loss=0.00555, over 16198.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09235, pruned_loss=0.01379, audio_tagging_loss=0.008845, over 3041494.71 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:25:17,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2023-11-24 02:25:23,948 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396200 2023-11-24 02:25:29,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2641353.3333333335, ans=0.125 2023-11-24 02:25:42,735 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11450, loss[loss=0.06786, simple_loss=0.09506, pruned_loss=0.01254, audio_tagging_loss=0.007782, over 16022.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.0922, pruned_loss=0.0136, audio_tagging_loss=0.008807, over 3040483.58 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:25:53,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2641420.0, ans=0.0 2023-11-24 02:25:55,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2641486.6666666665, ans=0.125 2023-11-24 02:26:02,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.524e+01 9.029e+01 9.681e+01 1.161e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 02:26:08,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-24 02:26:22,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2641620.0, ans=0.04949747468305833 2023-11-24 02:26:26,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396250 2023-11-24 02:26:39,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2023-11-24 02:26:45,791 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11500, loss[loss=0.07388, simple_loss=0.1054, pruned_loss=0.01473, audio_tagging_loss=0.006442, over 14892.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09155, pruned_loss=0.0135, audio_tagging_loss=0.008853, over 3038538.49 frames. ], batch size: 53, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:26:47,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2023-11-24 02:27:10,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2641886.6666666665, ans=0.0 2023-11-24 02:27:30,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396300 2023-11-24 02:27:37,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2642020.0, ans=0.1 2023-11-24 02:27:37,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2642020.0, ans=0.125 2023-11-24 02:27:37,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2642020.0, ans=0.125 2023-11-24 02:27:38,445 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:27:38,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2642020.0, ans=0.0 2023-11-24 02:27:45,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2642020.0, ans=0.125 2023-11-24 02:27:47,701 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11550, loss[loss=0.06915, simple_loss=0.08657, pruned_loss=0.01542, audio_tagging_loss=0.01044, over 14965.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09155, pruned_loss=0.01355, audio_tagging_loss=0.008918, over 3042265.05 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:27:51,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2642086.6666666665, ans=0.125 2023-11-24 02:28:02,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-24 02:28:06,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.401e+01 9.028e+01 9.629e+01 1.407e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 02:28:26,661 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 02:28:31,378 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396350 2023-11-24 02:28:49,961 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11600, loss[loss=0.06871, simple_loss=0.09154, pruned_loss=0.01631, audio_tagging_loss=0.006629, over 14080.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09097, pruned_loss=0.01363, audio_tagging_loss=0.009063, over 3039281.95 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:29:31,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2642620.0, ans=0.035 2023-11-24 02:29:33,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2023-11-24 02:29:34,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396400 2023-11-24 02:29:35,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2642620.0, ans=0.0 2023-11-24 02:29:53,914 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11650, loss[loss=0.05, simple_loss=0.06883, pruned_loss=0.007055, audio_tagging_loss=0.008527, over 14559.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09082, pruned_loss=0.01341, audio_tagging_loss=0.008951, over 3042666.02 frames. ], batch size: 55, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:30:04,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2642820.0, ans=0.125 2023-11-24 02:30:12,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.278e+01 8.940e+01 9.530e+01 1.250e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-24 02:30:37,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396450 2023-11-24 02:30:40,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2642953.3333333335, ans=0.0 2023-11-24 02:30:51,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-24 02:30:55,482 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11700, loss[loss=0.05645, simple_loss=0.06481, pruned_loss=0.014, audio_tagging_loss=0.01004, over 15300.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09014, pruned_loss=0.01326, audio_tagging_loss=0.009003, over 3034121.87 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:30:55,836 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:30:56,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2643086.6666666665, ans=0.0 2023-11-24 02:30:57,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2643086.6666666665, ans=0.0 2023-11-24 02:31:09,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-24 02:31:12,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2643153.3333333335, ans=10.0 2023-11-24 02:31:24,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2643220.0, ans=0.0 2023-11-24 02:31:39,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396500 2023-11-24 02:31:50,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2023-11-24 02:31:56,461 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11750, loss[loss=0.06264, simple_loss=0.08473, pruned_loss=0.01166, audio_tagging_loss=0.008614, over 14727.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09038, pruned_loss=0.01333, audio_tagging_loss=0.009005, over 3031452.28 frames. ], batch size: 55, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:31:58,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2643420.0, ans=0.1 2023-11-24 02:31:58,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2643420.0, ans=0.0 2023-11-24 02:32:02,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2643420.0, ans=0.0 2023-11-24 02:32:03,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.47 vs. limit=15.0 2023-11-24 02:32:09,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2643486.6666666665, ans=0.0 2023-11-24 02:32:14,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2643486.6666666665, ans=0.125 2023-11-24 02:32:17,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.547e+01 9.165e+01 9.829e+01 1.276e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 02:32:40,277 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396550 2023-11-24 02:32:40,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2643620.0, ans=0.2 2023-11-24 02:32:41,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2643620.0, ans=0.125 2023-11-24 02:32:58,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2643686.6666666665, ans=0.125 2023-11-24 02:33:00,488 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11800, loss[loss=0.06886, simple_loss=0.08979, pruned_loss=0.01492, audio_tagging_loss=0.009046, over 15677.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.0914, pruned_loss=0.01351, audio_tagging_loss=0.00902, over 3033270.57 frames. ], batch size: 63, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:33:43,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396600 2023-11-24 02:33:43,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2643953.3333333335, ans=0.125 2023-11-24 02:33:59,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-24 02:34:01,639 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11850, loss[loss=0.06804, simple_loss=0.0989, pruned_loss=0.01125, audio_tagging_loss=0.007339, over 15743.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09154, pruned_loss=0.01348, audio_tagging_loss=0.009067, over 3038151.73 frames. ], batch size: 57, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:34:20,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.557e+01 9.128e+01 9.993e+01 1.182e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 02:34:20,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2644153.3333333335, ans=0.125 2023-11-24 02:34:26,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2644220.0, ans=0.1 2023-11-24 02:34:39,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2644286.6666666665, ans=0.125 2023-11-24 02:34:45,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396650 2023-11-24 02:34:53,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2644353.3333333335, ans=0.0 2023-11-24 02:35:02,951 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11900, loss[loss=0.06563, simple_loss=0.08926, pruned_loss=0.01394, audio_tagging_loss=0.007064, over 15831.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09165, pruned_loss=0.0135, audio_tagging_loss=0.009157, over 3042744.16 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:35:05,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2644420.0, ans=0.0 2023-11-24 02:35:11,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2644420.0, ans=0.125 2023-11-24 02:35:22,086 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:35:26,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2644486.6666666665, ans=0.125 2023-11-24 02:35:30,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2644553.3333333335, ans=0.0 2023-11-24 02:35:34,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2644553.3333333335, ans=0.0 2023-11-24 02:35:37,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2644553.3333333335, ans=0.125 2023-11-24 02:35:42,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2644620.0, ans=0.125 2023-11-24 02:35:46,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396700 2023-11-24 02:36:05,990 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 11950, loss[loss=0.07409, simple_loss=0.08451, pruned_loss=0.01719, audio_tagging_loss=0.01465, over 15206.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.0912, pruned_loss=0.0133, audio_tagging_loss=0.009193, over 3045484.03 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 16.0 2023-11-24 02:36:15,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2644753.3333333335, ans=0.2 2023-11-24 02:36:16,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2644753.3333333335, ans=0.125 2023-11-24 02:36:23,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-24 02:36:25,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2023-11-24 02:36:25,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 8.152e+01 8.882e+01 9.494e+01 1.160e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-24 02:36:46,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2644953.3333333335, ans=0.125 2023-11-24 02:36:47,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396750 2023-11-24 02:36:47,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2644953.3333333335, ans=0.0 2023-11-24 02:36:47,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2644953.3333333335, ans=0.125 2023-11-24 02:36:54,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2645020.0, ans=0.125 2023-11-24 02:37:06,006 INFO [train_asr.py:1221] (1/4) Epoch 33, batch 12000, loss[loss=0.08455, simple_loss=0.1128, pruned_loss=0.01908, audio_tagging_loss=0.009078, over 15851.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09153, pruned_loss=0.01354, audio_tagging_loss=0.009304, over 3045419.98 frames. ], batch size: 56, lr: 2.04e-03, grad_scale: 32.0 2023-11-24 02:37:06,007 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 02:37:31,577 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9751, 3.2596, 2.9654, 3.1101, 3.4261, 2.7703, 3.4508, 2.5840], device='cuda:1') 2023-11-24 02:37:37,970 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8181, 5.8868, 5.9178, 5.9100], device='cuda:1') 2023-11-24 02:37:46,444 INFO [train_asr.py:1253] (1/4) Epoch 33, validation: loss=0.05829, simple_loss=0.05098, pruned_loss=0.005164, audio_tagging_loss=0.02763, over 4681554.00 frames. 2023-11-24 02:37:46,445 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 02:37:53,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-24 02:37:55,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2645086.6666666665, ans=0.1 2023-11-24 02:37:57,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2645153.3333333335, ans=0.035 2023-11-24 02:37:59,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2645153.3333333335, ans=0.2 2023-11-24 02:38:07,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2645220.0, ans=0.1 2023-11-24 02:38:48,854 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 0, loss[loss=0.0812, simple_loss=0.09603, pruned_loss=0.01497, audio_tagging_loss=0.01822, over 16026.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.09603, pruned_loss=0.01497, audio_tagging_loss=0.01822, over 16026.00 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:38:48,855 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 02:39:24,529 INFO [train_asr.py:1253] (1/4) Epoch 34, validation: loss=0.058, simple_loss=0.05102, pruned_loss=0.005202, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-24 02:39:24,530 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 02:39:29,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2645253.3333333335, ans=0.2 2023-11-24 02:39:32,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2645253.3333333335, ans=0.125 2023-11-24 02:39:35,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2645253.3333333335, ans=0.0 2023-11-24 02:39:37,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396800 2023-11-24 02:39:58,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2645386.6666666665, ans=0.125 2023-11-24 02:40:16,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.137e+01 9.827e+01 1.054e+02 1.431e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-24 02:40:27,156 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 50, loss[loss=0.08738, simple_loss=0.1159, pruned_loss=0.015, audio_tagging_loss=0.01445, over 16656.00 frames. ], tot_loss[loss=0.07618, simple_loss=0.09144, pruned_loss=0.01327, audio_tagging_loss=0.01718, over 693145.01 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:40:30,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2645586.6666666665, ans=0.0 2023-11-24 02:40:39,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396850 2023-11-24 02:40:53,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2645720.0, ans=0.125 2023-11-24 02:41:00,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2645720.0, ans=0.125 2023-11-24 02:41:16,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2645853.3333333335, ans=0.0 2023-11-24 02:41:23,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2645853.3333333335, ans=0.0 2023-11-24 02:41:29,483 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 100, loss[loss=0.06188, simple_loss=0.06362, pruned_loss=0.01027, audio_tagging_loss=0.0198, over 15180.00 frames. ], tot_loss[loss=0.07384, simple_loss=0.08803, pruned_loss=0.01306, audio_tagging_loss=0.01676, over 1210938.95 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:41:29,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2645920.0, ans=0.125 2023-11-24 02:41:32,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2645920.0, ans=0.125 2023-11-24 02:41:41,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396900 2023-11-24 02:42:08,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-24 02:42:09,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2646120.0, ans=0.125 2023-11-24 02:42:11,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2646120.0, ans=0.1 2023-11-24 02:42:20,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.964e+01 9.568e+01 1.027e+02 1.310e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-24 02:42:31,563 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 150, loss[loss=0.09093, simple_loss=0.1146, pruned_loss=0.0226, audio_tagging_loss=0.01103, over 14296.00 frames. ], tot_loss[loss=0.07303, simple_loss=0.09006, pruned_loss=0.01321, audio_tagging_loss=0.01479, over 1616075.26 frames. ], batch size: 52, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:42:34,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2646253.3333333335, ans=0.125 2023-11-24 02:42:44,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 396950 2023-11-24 02:42:49,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2646320.0, ans=0.125 2023-11-24 02:42:50,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2646320.0, ans=0.125 2023-11-24 02:43:03,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-24 02:43:13,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2646453.3333333335, ans=0.125 2023-11-24 02:43:29,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2646520.0, ans=0.0 2023-11-24 02:43:34,085 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 200, loss[loss=0.06707, simple_loss=0.08661, pruned_loss=0.01289, audio_tagging_loss=0.01087, over 14748.00 frames. ], tot_loss[loss=0.07297, simple_loss=0.09243, pruned_loss=0.01374, audio_tagging_loss=0.01302, over 1939066.73 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:43:35,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2646586.6666666665, ans=0.125 2023-11-24 02:43:39,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2646586.6666666665, ans=0.125 2023-11-24 02:43:45,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2646653.3333333335, ans=0.125 2023-11-24 02:43:46,030 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397000 2023-11-24 02:43:49,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-24 02:43:52,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2646653.3333333335, ans=0.0 2023-11-24 02:43:57,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2646720.0, ans=0.125 2023-11-24 02:43:58,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-24 02:44:13,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2646786.6666666665, ans=0.0 2023-11-24 02:44:19,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2646786.6666666665, ans=0.125 2023-11-24 02:44:22,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2646853.3333333335, ans=0.125 2023-11-24 02:44:24,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.517e+01 9.117e+01 1.003e+02 1.656e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 02:44:29,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2023-11-24 02:44:32,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2646853.3333333335, ans=0.125 2023-11-24 02:44:35,762 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 250, loss[loss=0.06721, simple_loss=0.08085, pruned_loss=0.01546, audio_tagging_loss=0.01132, over 15194.00 frames. ], tot_loss[loss=0.07164, simple_loss=0.09234, pruned_loss=0.01364, audio_tagging_loss=0.01183, over 2186610.96 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:44:38,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-24 02:44:43,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2646920.0, ans=0.0 2023-11-24 02:44:47,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397050 2023-11-24 02:45:00,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2647053.3333333335, ans=0.125 2023-11-24 02:45:07,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2647053.3333333335, ans=0.125 2023-11-24 02:45:14,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2647120.0, ans=0.0 2023-11-24 02:45:15,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-24 02:45:37,018 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 300, loss[loss=0.06971, simple_loss=0.1002, pruned_loss=0.01079, audio_tagging_loss=0.008803, over 16217.00 frames. ], tot_loss[loss=0.07109, simple_loss=0.09308, pruned_loss=0.01363, audio_tagging_loss=0.01093, over 2385824.26 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:45:50,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397100 2023-11-24 02:46:07,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2647386.6666666665, ans=0.125 2023-11-24 02:46:14,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2647453.3333333335, ans=0.125 2023-11-24 02:46:30,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.278e+01 8.816e+01 9.876e+01 1.220e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-24 02:46:40,767 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 350, loss[loss=0.07203, simple_loss=0.09829, pruned_loss=0.01447, audio_tagging_loss=0.008415, over 15395.00 frames. ], tot_loss[loss=0.07044, simple_loss=0.0928, pruned_loss=0.01368, audio_tagging_loss=0.01036, over 2531976.05 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:46:48,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2023-11-24 02:46:52,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397150 2023-11-24 02:46:55,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2647653.3333333335, ans=0.0 2023-11-24 02:47:02,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-24 02:47:36,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2647853.3333333335, ans=0.1 2023-11-24 02:47:42,074 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 400, loss[loss=0.06406, simple_loss=0.08471, pruned_loss=0.01274, audio_tagging_loss=0.008968, over 15256.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09281, pruned_loss=0.01382, audio_tagging_loss=0.01004, over 2643488.21 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:47:54,095 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397200 2023-11-24 02:47:56,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2647986.6666666665, ans=0.0 2023-11-24 02:47:59,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2647986.6666666665, ans=0.2 2023-11-24 02:48:12,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2648053.3333333335, ans=0.1 2023-11-24 02:48:15,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2648053.3333333335, ans=0.05 2023-11-24 02:48:16,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-24 02:48:35,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.562e+01 9.066e+01 9.786e+01 1.375e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-24 02:48:44,723 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 450, loss[loss=0.0547, simple_loss=0.06715, pruned_loss=0.007537, audio_tagging_loss=0.01359, over 15223.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09216, pruned_loss=0.01363, audio_tagging_loss=0.009717, over 2729046.64 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:48:48,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2648253.3333333335, ans=0.125 2023-11-24 02:48:51,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2648253.3333333335, ans=0.125 2023-11-24 02:48:53,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2648253.3333333335, ans=0.0 2023-11-24 02:48:57,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397250 2023-11-24 02:49:01,170 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:49:03,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2648320.0, ans=10.0 2023-11-24 02:49:05,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2648320.0, ans=0.2 2023-11-24 02:49:18,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2648386.6666666665, ans=0.125 2023-11-24 02:49:19,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-24 02:49:29,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2648453.3333333335, ans=10.0 2023-11-24 02:49:42,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2648520.0, ans=0.125 2023-11-24 02:49:46,797 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 500, loss[loss=0.06606, simple_loss=0.08774, pruned_loss=0.01531, audio_tagging_loss=0.006882, over 15260.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.0927, pruned_loss=0.01353, audio_tagging_loss=0.009383, over 2797195.42 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:49:59,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397300 2023-11-24 02:49:59,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2648653.3333333335, ans=0.05 2023-11-24 02:50:08,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2648653.3333333335, ans=0.0 2023-11-24 02:50:09,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2023-11-24 02:50:38,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.347e+01 9.082e+01 9.817e+01 1.239e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 02:50:44,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2648853.3333333335, ans=0.1 2023-11-24 02:50:48,949 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 550, loss[loss=0.07, simple_loss=0.09815, pruned_loss=0.009292, audio_tagging_loss=0.01163, over 15584.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09177, pruned_loss=0.01339, audio_tagging_loss=0.009286, over 2860012.98 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:50:59,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2648986.6666666665, ans=0.1 2023-11-24 02:51:00,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397350 2023-11-24 02:51:02,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2648986.6666666665, ans=0.1 2023-11-24 02:51:09,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-24 02:51:16,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2649053.3333333335, ans=0.5 2023-11-24 02:51:18,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2649053.3333333335, ans=0.1 2023-11-24 02:51:21,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2649053.3333333335, ans=0.125 2023-11-24 02:51:24,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2649120.0, ans=0.125 2023-11-24 02:51:49,951 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 600, loss[loss=0.05985, simple_loss=0.08989, pruned_loss=0.0073, audio_tagging_loss=0.0076, over 14360.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09129, pruned_loss=0.01328, audio_tagging_loss=0.009283, over 2904952.37 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:51:54,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2649253.3333333335, ans=0.125 2023-11-24 02:51:54,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2649253.3333333335, ans=0.0 2023-11-24 02:52:03,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397400 2023-11-24 02:52:23,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2649386.6666666665, ans=0.0 2023-11-24 02:52:27,635 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:52:42,870 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.375e+01 9.177e+01 9.802e+01 2.402e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-24 02:52:52,975 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 650, loss[loss=0.04696, simple_loss=0.05602, pruned_loss=0.007943, audio_tagging_loss=0.01101, over 14744.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09081, pruned_loss=0.01308, audio_tagging_loss=0.009251, over 2936016.33 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:53:04,700 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397450 2023-11-24 02:53:05,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-24 02:53:41,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-24 02:53:54,371 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 700, loss[loss=0.08285, simple_loss=0.1095, pruned_loss=0.02015, audio_tagging_loss=0.007954, over 15661.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.0915, pruned_loss=0.01308, audio_tagging_loss=0.009175, over 2962736.20 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:54:06,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397500 2023-11-24 02:54:20,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2650053.3333333335, ans=0.125 2023-11-24 02:54:21,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-24 02:54:22,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-24 02:54:47,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.698e+01 9.188e+01 9.937e+01 1.605e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-24 02:54:50,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2023-11-24 02:54:55,816 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 750, loss[loss=0.0973, simple_loss=0.1382, pruned_loss=0.0207, audio_tagging_loss=0.007472, over 16477.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09154, pruned_loss=0.01311, audio_tagging_loss=0.009171, over 2983851.09 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:55:09,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397550 2023-11-24 02:55:16,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-11-24 02:55:31,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2650453.3333333335, ans=0.2 2023-11-24 02:55:32,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2650453.3333333335, ans=0.125 2023-11-24 02:55:58,167 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 800, loss[loss=0.06517, simple_loss=0.07881, pruned_loss=0.01235, audio_tagging_loss=0.01341, over 15054.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09131, pruned_loss=0.01326, audio_tagging_loss=0.009235, over 2999267.06 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:55:58,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2650586.6666666665, ans=0.125 2023-11-24 02:56:05,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2650586.6666666665, ans=0.0 2023-11-24 02:56:10,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397600 2023-11-24 02:56:15,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2023-11-24 02:56:51,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.448e+01 9.134e+01 9.872e+01 1.389e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 02:56:59,542 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 850, loss[loss=0.08177, simple_loss=0.1029, pruned_loss=0.0203, audio_tagging_loss=0.01004, over 15846.00 frames. ], tot_loss[loss=0.06879, simple_loss=0.09202, pruned_loss=0.01351, audio_tagging_loss=0.009274, over 3014806.60 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2023-11-24 02:57:12,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397650 2023-11-24 02:57:19,374 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 02:57:32,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-24 02:57:40,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-24 02:57:54,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2651186.6666666665, ans=0.05 2023-11-24 02:58:02,059 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 900, loss[loss=0.0495, simple_loss=0.07088, pruned_loss=0.004715, audio_tagging_loss=0.00934, over 15433.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09129, pruned_loss=0.0133, audio_tagging_loss=0.009376, over 3022398.46 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:58:02,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-24 02:58:08,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2651253.3333333335, ans=10.0 2023-11-24 02:58:10,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2651253.3333333335, ans=0.125 2023-11-24 02:58:14,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397700 2023-11-24 02:58:14,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2651320.0, ans=0.125 2023-11-24 02:58:26,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2651386.6666666665, ans=0.0 2023-11-24 02:58:27,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2651386.6666666665, ans=0.5 2023-11-24 02:58:42,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2023-11-24 02:58:47,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2651453.3333333335, ans=0.125 2023-11-24 02:58:50,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2651520.0, ans=0.0 2023-11-24 02:58:56,280 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.559e+01 9.218e+01 1.010e+02 1.214e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 02:59:04,533 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 950, loss[loss=0.06874, simple_loss=0.08958, pruned_loss=0.01443, audio_tagging_loss=0.009517, over 16419.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09182, pruned_loss=0.0133, audio_tagging_loss=0.00923, over 3027920.70 frames. ], batch size: 62, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 02:59:17,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397750 2023-11-24 02:59:33,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2651720.0, ans=0.025 2023-11-24 02:59:47,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2651786.6666666665, ans=0.125 2023-11-24 02:59:57,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2651853.3333333335, ans=0.2 2023-11-24 03:00:06,444 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1000, loss[loss=0.09823, simple_loss=0.1417, pruned_loss=0.02086, audio_tagging_loss=0.006504, over 16399.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09232, pruned_loss=0.01324, audio_tagging_loss=0.009071, over 3039302.55 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:00:13,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2651920.0, ans=0.0 2023-11-24 03:00:18,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397800 2023-11-24 03:00:19,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2651986.6666666665, ans=0.125 2023-11-24 03:00:22,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-24 03:00:24,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2651986.6666666665, ans=0.125 2023-11-24 03:00:29,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.56 vs. limit=10.0 2023-11-24 03:00:31,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2652053.3333333335, ans=0.0 2023-11-24 03:00:32,251 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:00:46,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2652120.0, ans=0.125 2023-11-24 03:01:01,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.415e+01 8.930e+01 9.689e+01 1.161e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 03:01:09,022 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1050, loss[loss=0.05913, simple_loss=0.087, pruned_loss=0.008158, audio_tagging_loss=0.00747, over 14871.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09173, pruned_loss=0.01328, audio_tagging_loss=0.00904, over 3044526.59 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:01:15,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2652253.3333333335, ans=0.125 2023-11-24 03:01:21,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397850 2023-11-24 03:01:36,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-24 03:01:58,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2023-11-24 03:02:11,737 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1100, loss[loss=0.08496, simple_loss=0.1142, pruned_loss=0.02027, audio_tagging_loss=0.007607, over 15473.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09156, pruned_loss=0.01331, audio_tagging_loss=0.00897, over 3043764.23 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:02:13,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2652586.6666666665, ans=0.0 2023-11-24 03:02:15,975 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:02:24,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397900 2023-11-24 03:02:28,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2652653.3333333335, ans=0.2 2023-11-24 03:02:29,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2652653.3333333335, ans=0.125 2023-11-24 03:02:31,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2652653.3333333335, ans=0.0 2023-11-24 03:02:34,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2023-11-24 03:02:42,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2652720.0, ans=0.04949747468305833 2023-11-24 03:03:04,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2652853.3333333335, ans=0.0 2023-11-24 03:03:06,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.539e+01 9.041e+01 9.654e+01 1.220e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 03:03:13,514 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1150, loss[loss=0.06116, simple_loss=0.08745, pruned_loss=0.008468, audio_tagging_loss=0.008966, over 16593.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09168, pruned_loss=0.01343, audio_tagging_loss=0.008978, over 3039402.08 frames. ], batch size: 60, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:03:25,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 397950 2023-11-24 03:03:29,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2652986.6666666665, ans=0.2 2023-11-24 03:04:12,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2653186.6666666665, ans=0.0 2023-11-24 03:04:14,288 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1200, loss[loss=0.05793, simple_loss=0.07322, pruned_loss=0.01018, audio_tagging_loss=0.01114, over 15474.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.0917, pruned_loss=0.01332, audio_tagging_loss=0.008894, over 3044461.58 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:04:15,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2653253.3333333335, ans=0.125 2023-11-24 03:04:18,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2653253.3333333335, ans=0.125 2023-11-24 03:04:26,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398000 2023-11-24 03:04:51,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=12.0 2023-11-24 03:04:52,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2653453.3333333335, ans=0.0 2023-11-24 03:05:00,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2653453.3333333335, ans=0.04949747468305833 2023-11-24 03:05:08,787 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:05:09,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.593e+01 9.162e+01 9.902e+01 1.327e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 03:05:16,309 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1250, loss[loss=0.0595, simple_loss=0.07393, pruned_loss=0.01037, audio_tagging_loss=0.01217, over 14817.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09115, pruned_loss=0.01329, audio_tagging_loss=0.008884, over 3045251.42 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:05:17,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2653586.6666666665, ans=0.2 2023-11-24 03:05:17,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:05:21,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2653586.6666666665, ans=0.125 2023-11-24 03:05:29,459 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398050 2023-11-24 03:05:41,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-11-24 03:05:50,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-24 03:05:51,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-24 03:06:10,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2653853.3333333335, ans=0.125 2023-11-24 03:06:12,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2653853.3333333335, ans=0.0 2023-11-24 03:06:12,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2653853.3333333335, ans=0.0 2023-11-24 03:06:13,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2653853.3333333335, ans=0.125 2023-11-24 03:06:18,463 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1300, loss[loss=0.06881, simple_loss=0.08982, pruned_loss=0.01324, audio_tagging_loss=0.01066, over 14447.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09125, pruned_loss=0.01342, audio_tagging_loss=0.00887, over 3045713.91 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:06:27,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2653920.0, ans=0.125 2023-11-24 03:06:30,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398100 2023-11-24 03:06:42,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2654053.3333333335, ans=0.1 2023-11-24 03:06:46,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-24 03:06:53,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-24 03:07:07,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2654186.6666666665, ans=0.125 2023-11-24 03:07:13,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.205e+01 8.973e+01 9.705e+01 1.641e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-24 03:07:19,487 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1350, loss[loss=0.0695, simple_loss=0.09516, pruned_loss=0.01325, audio_tagging_loss=0.008669, over 14894.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09123, pruned_loss=0.01355, audio_tagging_loss=0.00878, over 3043304.63 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:07:30,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2654320.0, ans=0.0 2023-11-24 03:07:31,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398150 2023-11-24 03:07:38,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2654320.0, ans=0.125 2023-11-24 03:07:39,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2023-11-24 03:08:02,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-24 03:08:03,979 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:08:06,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2654453.3333333335, ans=0.0 2023-11-24 03:08:19,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2654586.6666666665, ans=0.125 2023-11-24 03:08:20,644 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1400, loss[loss=0.06972, simple_loss=0.09854, pruned_loss=0.01248, audio_tagging_loss=0.007966, over 14557.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09232, pruned_loss=0.0138, audio_tagging_loss=0.00885, over 3044731.01 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:08:22,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2654586.6666666665, ans=0.1 2023-11-24 03:08:34,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398200 2023-11-24 03:08:48,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-24 03:09:05,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=2654786.6666666665, ans=0.02 2023-11-24 03:09:17,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.476e+01 9.016e+01 9.845e+01 1.286e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 03:09:19,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-24 03:09:21,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2654853.3333333335, ans=0.0 2023-11-24 03:09:24,000 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1450, loss[loss=0.0851, simple_loss=0.1115, pruned_loss=0.01947, audio_tagging_loss=0.009854, over 14651.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.0914, pruned_loss=0.01355, audio_tagging_loss=0.008912, over 3040592.91 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:09:35,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398250 2023-11-24 03:09:49,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-24 03:10:11,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2655120.0, ans=0.1 2023-11-24 03:10:15,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2655186.6666666665, ans=0.0 2023-11-24 03:10:19,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2655186.6666666665, ans=0.125 2023-11-24 03:10:25,522 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1500, loss[loss=0.09616, simple_loss=0.1335, pruned_loss=0.02101, audio_tagging_loss=0.00839, over 15744.00 frames. ], tot_loss[loss=0.06882, simple_loss=0.09213, pruned_loss=0.01377, audio_tagging_loss=0.008989, over 3036252.86 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:10:37,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398300 2023-11-24 03:10:58,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-24 03:11:21,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.576e+01 9.222e+01 1.014e+02 1.240e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 03:11:23,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2655520.0, ans=0.125 2023-11-24 03:11:27,360 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1550, loss[loss=0.0749, simple_loss=0.09993, pruned_loss=0.01718, audio_tagging_loss=0.007757, over 15198.00 frames. ], tot_loss[loss=0.0694, simple_loss=0.09293, pruned_loss=0.01382, audio_tagging_loss=0.009115, over 3044474.87 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 16.0 2023-11-24 03:11:41,111 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398350 2023-11-24 03:11:53,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2023-11-24 03:11:56,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-24 03:12:03,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2655720.0, ans=0.0 2023-11-24 03:12:05,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2655786.6666666665, ans=0.125 2023-11-24 03:12:19,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-24 03:12:31,481 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1600, loss[loss=0.07523, simple_loss=0.1002, pruned_loss=0.01756, audio_tagging_loss=0.00755, over 15862.00 frames. ], tot_loss[loss=0.06976, simple_loss=0.09329, pruned_loss=0.01397, audio_tagging_loss=0.009143, over 3046890.62 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:12:43,991 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398400 2023-11-24 03:12:56,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2656053.3333333335, ans=0.0 2023-11-24 03:12:57,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2656053.3333333335, ans=0.1 2023-11-24 03:13:15,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2656120.0, ans=0.0 2023-11-24 03:13:17,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2656120.0, ans=0.05 2023-11-24 03:13:28,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.243e+01 8.406e+01 9.091e+01 9.636e+01 1.326e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-24 03:13:30,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2656186.6666666665, ans=0.0 2023-11-24 03:13:33,938 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1650, loss[loss=0.07431, simple_loss=0.1011, pruned_loss=0.01624, audio_tagging_loss=0.007545, over 14797.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.0927, pruned_loss=0.01375, audio_tagging_loss=0.009195, over 3042950.46 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:13:45,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398450 2023-11-24 03:13:46,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2656320.0, ans=0.125 2023-11-24 03:14:01,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-24 03:14:04,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2656386.6666666665, ans=0.2 2023-11-24 03:14:17,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2656453.3333333335, ans=0.035 2023-11-24 03:14:31,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2656520.0, ans=0.0 2023-11-24 03:14:34,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=15.0 2023-11-24 03:14:35,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2656586.6666666665, ans=0.2 2023-11-24 03:14:36,422 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1700, loss[loss=0.05203, simple_loss=0.0612, pruned_loss=0.009393, audio_tagging_loss=0.01203, over 14945.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09233, pruned_loss=0.01359, audio_tagging_loss=0.009318, over 3041581.08 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:14:38,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2656586.6666666665, ans=0.1 2023-11-24 03:14:39,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2656586.6666666665, ans=0.0 2023-11-24 03:14:47,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2656586.6666666665, ans=0.1 2023-11-24 03:14:49,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398500 2023-11-24 03:15:07,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-24 03:15:07,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2656720.0, ans=15.0 2023-11-24 03:15:32,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.348e+01 8.790e+01 9.631e+01 1.141e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-24 03:15:35,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2656853.3333333335, ans=0.0 2023-11-24 03:15:39,520 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1750, loss[loss=0.04364, simple_loss=0.05527, pruned_loss=0.007717, audio_tagging_loss=0.008287, over 15890.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.09173, pruned_loss=0.01353, audio_tagging_loss=0.009207, over 3044274.90 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:15:44,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2656920.0, ans=0.1 2023-11-24 03:15:51,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398550 2023-11-24 03:16:14,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2657120.0, ans=0.125 2023-11-24 03:16:22,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2657120.0, ans=0.1 2023-11-24 03:16:41,184 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1800, loss[loss=0.05933, simple_loss=0.08047, pruned_loss=0.01, audio_tagging_loss=0.009092, over 17111.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09177, pruned_loss=0.01343, audio_tagging_loss=0.009141, over 3044642.27 frames. ], batch size: 64, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:16:48,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2657253.3333333335, ans=0.2 2023-11-24 03:16:53,755 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398600 2023-11-24 03:17:05,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2657386.6666666665, ans=0.2 2023-11-24 03:17:09,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-24 03:17:13,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2657386.6666666665, ans=0.1 2023-11-24 03:17:13,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2657386.6666666665, ans=0.2 2023-11-24 03:17:21,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2657453.3333333335, ans=0.125 2023-11-24 03:17:28,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-24 03:17:37,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.560e+01 9.161e+01 9.912e+01 1.351e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 03:17:43,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2657586.6666666665, ans=0.2 2023-11-24 03:17:43,953 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1850, loss[loss=0.05955, simple_loss=0.07689, pruned_loss=0.01403, audio_tagging_loss=0.007068, over 14860.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.0912, pruned_loss=0.01331, audio_tagging_loss=0.00912, over 3046990.95 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:17:47,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2657586.6666666665, ans=0.0 2023-11-24 03:17:47,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2657586.6666666665, ans=0.125 2023-11-24 03:17:56,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398650 2023-11-24 03:18:09,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2023-11-24 03:18:12,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2657720.0, ans=0.0 2023-11-24 03:18:12,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=12.0 2023-11-24 03:18:23,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2657786.6666666665, ans=0.95 2023-11-24 03:18:24,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2657786.6666666665, ans=0.125 2023-11-24 03:18:34,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-24 03:18:39,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-11-24 03:18:46,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-11-24 03:18:47,045 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1900, loss[loss=0.0759, simple_loss=0.1097, pruned_loss=0.0146, audio_tagging_loss=0.006457, over 15899.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09108, pruned_loss=0.01327, audio_tagging_loss=0.009034, over 3049448.25 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:18:59,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398700 2023-11-24 03:19:05,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2657986.6666666665, ans=0.0 2023-11-24 03:19:43,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.306e+01 8.939e+01 9.602e+01 1.298e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-24 03:19:47,771 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 1950, loss[loss=0.0676, simple_loss=0.0904, pruned_loss=0.01469, audio_tagging_loss=0.00771, over 15726.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09122, pruned_loss=0.01346, audio_tagging_loss=0.008962, over 3047031.33 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:20:00,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398750 2023-11-24 03:20:07,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2658320.0, ans=0.125 2023-11-24 03:20:16,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2658386.6666666665, ans=0.125 2023-11-24 03:20:49,786 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2000, loss[loss=0.04711, simple_loss=0.05796, pruned_loss=0.006903, audio_tagging_loss=0.01123, over 14461.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09089, pruned_loss=0.01339, audio_tagging_loss=0.008958, over 3047734.10 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:21:02,287 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398800 2023-11-24 03:21:05,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-11-24 03:21:22,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2658720.0, ans=0.125 2023-11-24 03:21:23,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2658720.0, ans=0.2 2023-11-24 03:21:26,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2658786.6666666665, ans=0.125 2023-11-24 03:21:31,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2658786.6666666665, ans=0.0 2023-11-24 03:21:33,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-24 03:21:46,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.321e+01 9.002e+01 9.770e+01 1.545e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-24 03:21:51,416 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2050, loss[loss=0.07686, simple_loss=0.1084, pruned_loss=0.01451, audio_tagging_loss=0.008134, over 15572.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09061, pruned_loss=0.0133, audio_tagging_loss=0.009, over 3040025.91 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:21:51,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2658920.0, ans=0.0 2023-11-24 03:21:56,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2658920.0, ans=0.0 2023-11-24 03:22:02,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2658920.0, ans=0.125 2023-11-24 03:22:03,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2658986.6666666665, ans=0.0 2023-11-24 03:22:04,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398850 2023-11-24 03:22:05,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2658986.6666666665, ans=0.95 2023-11-24 03:22:22,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2659053.3333333335, ans=0.125 2023-11-24 03:22:26,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2659053.3333333335, ans=0.125 2023-11-24 03:22:42,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2659186.6666666665, ans=0.0 2023-11-24 03:22:54,379 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2100, loss[loss=0.07289, simple_loss=0.08894, pruned_loss=0.01859, audio_tagging_loss=0.009836, over 15732.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09101, pruned_loss=0.01334, audio_tagging_loss=0.009028, over 3047955.27 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:23:06,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398900 2023-11-24 03:23:09,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2659320.0, ans=0.0 2023-11-24 03:23:32,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2659453.3333333335, ans=0.125 2023-11-24 03:23:37,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2659453.3333333335, ans=0.0 2023-11-24 03:23:52,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.464e+01 8.942e+01 9.567e+01 1.124e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-24 03:23:56,153 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2150, loss[loss=0.0586, simple_loss=0.08146, pruned_loss=0.008676, audio_tagging_loss=0.009195, over 15131.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09179, pruned_loss=0.01342, audio_tagging_loss=0.008906, over 3046906.57 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:24:09,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 398950 2023-11-24 03:24:19,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2659653.3333333335, ans=0.1 2023-11-24 03:24:34,098 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:24:50,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2659853.3333333335, ans=0.02 2023-11-24 03:24:59,225 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2200, loss[loss=0.07627, simple_loss=0.1013, pruned_loss=0.01516, audio_tagging_loss=0.01044, over 14112.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09237, pruned_loss=0.01354, audio_tagging_loss=0.008845, over 3042681.92 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:25:09,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2659920.0, ans=0.125 2023-11-24 03:25:10,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2659920.0, ans=0.125 2023-11-24 03:25:12,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399000 2023-11-24 03:25:41,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2660120.0, ans=0.125 2023-11-24 03:25:51,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-24 03:25:58,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.513e+01 9.169e+01 1.008e+02 1.249e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 03:25:59,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2660186.6666666665, ans=0.0 2023-11-24 03:26:01,961 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2250, loss[loss=0.05646, simple_loss=0.08568, pruned_loss=0.007142, audio_tagging_loss=0.006475, over 15763.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.0922, pruned_loss=0.01366, audio_tagging_loss=0.008922, over 3044813.83 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:26:12,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-11-24 03:26:13,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399050 2023-11-24 03:26:26,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2660386.6666666665, ans=0.125 2023-11-24 03:27:02,885 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2300, loss[loss=0.06603, simple_loss=0.08281, pruned_loss=0.01486, audio_tagging_loss=0.009762, over 16236.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.0915, pruned_loss=0.01354, audio_tagging_loss=0.009074, over 3049170.99 frames. ], batch size: 64, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:27:05,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2660586.6666666665, ans=0.125 2023-11-24 03:27:11,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-24 03:27:14,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399100 2023-11-24 03:27:42,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2660786.6666666665, ans=0.0 2023-11-24 03:27:48,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2660786.6666666665, ans=0.0 2023-11-24 03:27:57,447 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:28:00,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.488e+01 8.965e+01 9.835e+01 1.215e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 03:28:03,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-24 03:28:05,074 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2350, loss[loss=0.07811, simple_loss=0.1066, pruned_loss=0.01714, audio_tagging_loss=0.007641, over 16421.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09198, pruned_loss=0.0136, audio_tagging_loss=0.009068, over 3054083.11 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:28:12,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2660920.0, ans=0.1 2023-11-24 03:28:18,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399150 2023-11-24 03:28:21,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2660986.6666666665, ans=0.1 2023-11-24 03:28:56,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2661186.6666666665, ans=0.125 2023-11-24 03:29:08,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2661253.3333333335, ans=0.0 2023-11-24 03:29:08,996 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2400, loss[loss=0.08729, simple_loss=0.1243, pruned_loss=0.01846, audio_tagging_loss=0.006685, over 17206.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09347, pruned_loss=0.01377, audio_tagging_loss=0.009085, over 3050721.26 frames. ], batch size: 63, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:29:09,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2661253.3333333335, ans=0.125 2023-11-24 03:29:16,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2661253.3333333335, ans=0.0 2023-11-24 03:29:21,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399200 2023-11-24 03:29:26,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2661320.0, ans=0.0 2023-11-24 03:29:36,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2023-11-24 03:29:40,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2661386.6666666665, ans=0.0 2023-11-24 03:30:08,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.386e+01 9.132e+01 1.001e+02 1.473e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 03:30:10,823 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2450, loss[loss=0.09095, simple_loss=0.1257, pruned_loss=0.01923, audio_tagging_loss=0.008853, over 14976.00 frames. ], tot_loss[loss=0.06993, simple_loss=0.09387, pruned_loss=0.01391, audio_tagging_loss=0.009081, over 3046824.72 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:30:12,456 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:30:15,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2661586.6666666665, ans=0.035 2023-11-24 03:30:22,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399250 2023-11-24 03:30:40,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-24 03:30:45,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2661720.0, ans=0.0 2023-11-24 03:31:11,902 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2500, loss[loss=0.06175, simple_loss=0.07077, pruned_loss=0.01318, audio_tagging_loss=0.01319, over 15852.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09274, pruned_loss=0.01382, audio_tagging_loss=0.009173, over 3055900.26 frames. ], batch size: 62, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:31:12,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2661920.0, ans=0.125 2023-11-24 03:31:14,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2023-11-24 03:31:26,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399300 2023-11-24 03:31:33,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2661986.6666666665, ans=0.0 2023-11-24 03:32:13,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.419e+01 8.936e+01 9.796e+01 1.161e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-24 03:32:16,384 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2550, loss[loss=0.07986, simple_loss=0.1003, pruned_loss=0.02184, audio_tagging_loss=0.007865, over 15010.00 frames. ], tot_loss[loss=0.06925, simple_loss=0.09267, pruned_loss=0.01382, audio_tagging_loss=0.009087, over 3050805.05 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:32:16,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2662253.3333333335, ans=0.125 2023-11-24 03:32:28,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399350 2023-11-24 03:32:29,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2662320.0, ans=0.0 2023-11-24 03:32:43,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2662386.6666666665, ans=0.0 2023-11-24 03:32:47,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2662386.6666666665, ans=0.0 2023-11-24 03:32:54,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2662453.3333333335, ans=0.0 2023-11-24 03:32:57,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2662453.3333333335, ans=0.2 2023-11-24 03:33:12,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-11-24 03:33:18,426 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2600, loss[loss=0.05794, simple_loss=0.0682, pruned_loss=0.01116, audio_tagging_loss=0.01268, over 14323.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09163, pruned_loss=0.01345, audio_tagging_loss=0.008987, over 3045419.44 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:33:30,212 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399400 2023-11-24 03:33:59,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-24 03:34:17,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.667e+01 9.247e+01 9.854e+01 1.236e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 03:34:19,849 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2650, loss[loss=0.06109, simple_loss=0.08706, pruned_loss=0.009664, audio_tagging_loss=0.0079, over 15902.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09085, pruned_loss=0.01323, audio_tagging_loss=0.008913, over 3049952.80 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:34:23,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2662920.0, ans=0.125 2023-11-24 03:34:32,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399450 2023-11-24 03:34:45,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2663053.3333333335, ans=0.125 2023-11-24 03:34:46,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2663053.3333333335, ans=0.0 2023-11-24 03:34:49,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2663053.3333333335, ans=0.125 2023-11-24 03:34:49,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2663053.3333333335, ans=0.1 2023-11-24 03:35:09,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2663186.6666666665, ans=0.04949747468305833 2023-11-24 03:35:23,157 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2700, loss[loss=0.04832, simple_loss=0.06348, pruned_loss=0.007245, audio_tagging_loss=0.00934, over 14656.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09015, pruned_loss=0.01319, audio_tagging_loss=0.008968, over 3053993.75 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:35:35,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399500 2023-11-24 03:35:41,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2663320.0, ans=0.125 2023-11-24 03:36:14,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2663520.0, ans=0.125 2023-11-24 03:36:17,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2663520.0, ans=0.1 2023-11-24 03:36:23,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.295e+01 8.758e+01 9.490e+01 1.330e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-24 03:36:25,798 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2750, loss[loss=0.07752, simple_loss=0.1103, pruned_loss=0.01404, audio_tagging_loss=0.008323, over 15411.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.08981, pruned_loss=0.01314, audio_tagging_loss=0.008988, over 3055118.50 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:36:29,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2663586.6666666665, ans=0.125 2023-11-24 03:36:38,034 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399550 2023-11-24 03:36:58,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2663720.0, ans=0.125 2023-11-24 03:36:59,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2663720.0, ans=0.125 2023-11-24 03:37:03,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2663786.6666666665, ans=0.125 2023-11-24 03:37:18,097 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:37:27,421 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2800, loss[loss=0.06353, simple_loss=0.09079, pruned_loss=0.01036, audio_tagging_loss=0.007777, over 14912.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08964, pruned_loss=0.01303, audio_tagging_loss=0.008915, over 3058383.68 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:37:30,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-24 03:37:35,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2663920.0, ans=0.0 2023-11-24 03:37:40,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399600 2023-11-24 03:37:52,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2664053.3333333335, ans=0.0 2023-11-24 03:37:55,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=12.0 2023-11-24 03:38:02,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2664053.3333333335, ans=0.1 2023-11-24 03:38:06,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-24 03:38:08,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2664120.0, ans=0.125 2023-11-24 03:38:13,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2664120.0, ans=0.0 2023-11-24 03:38:27,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.351e+01 8.911e+01 9.699e+01 1.155e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-24 03:38:30,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-11-24 03:38:30,606 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2850, loss[loss=0.06738, simple_loss=0.08839, pruned_loss=0.01273, audio_tagging_loss=0.01045, over 13820.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.0908, pruned_loss=0.01319, audio_tagging_loss=0.008811, over 3053274.19 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:38:42,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399650 2023-11-24 03:38:52,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-24 03:38:53,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2664320.0, ans=0.0 2023-11-24 03:38:59,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2664386.6666666665, ans=0.1 2023-11-24 03:38:59,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-24 03:39:08,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2023-11-24 03:39:18,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2664453.3333333335, ans=0.125 2023-11-24 03:39:31,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2664586.6666666665, ans=0.125 2023-11-24 03:39:32,502 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2900, loss[loss=0.06093, simple_loss=0.08383, pruned_loss=0.01021, audio_tagging_loss=0.008804, over 15691.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09028, pruned_loss=0.01305, audio_tagging_loss=0.008875, over 3051082.27 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:39:36,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2664586.6666666665, ans=0.0 2023-11-24 03:39:44,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2664653.3333333335, ans=0.125 2023-11-24 03:39:45,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399700 2023-11-24 03:39:45,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2664653.3333333335, ans=0.125 2023-11-24 03:39:49,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2664653.3333333335, ans=0.5 2023-11-24 03:40:06,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2664720.0, ans=0.125 2023-11-24 03:40:13,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2664786.6666666665, ans=0.125 2023-11-24 03:40:14,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2664786.6666666665, ans=0.0 2023-11-24 03:40:25,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2664853.3333333335, ans=0.125 2023-11-24 03:40:27,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.99 vs. limit=10.0 2023-11-24 03:40:31,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.311e+01 9.104e+01 9.838e+01 1.191e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 03:40:33,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2664920.0, ans=0.125 2023-11-24 03:40:34,370 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 2950, loss[loss=0.05119, simple_loss=0.06663, pruned_loss=0.008766, audio_tagging_loss=0.009111, over 14698.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09006, pruned_loss=0.01301, audio_tagging_loss=0.00894, over 3050913.65 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:40:38,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=15.0 2023-11-24 03:40:47,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399750 2023-11-24 03:41:11,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665120.0, ans=0.1 2023-11-24 03:41:17,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2665120.0, ans=0.125 2023-11-24 03:41:32,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:41:35,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-11-24 03:41:35,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-24 03:41:36,935 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3000, loss[loss=0.05815, simple_loss=0.06937, pruned_loss=0.01251, audio_tagging_loss=0.01096, over 15457.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09005, pruned_loss=0.01314, audio_tagging_loss=0.008963, over 3044423.54 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:41:36,936 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 03:42:07,603 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2885, 3.0277, 3.3219, 2.8900, 3.7437, 3.7907, 3.3096, 3.2448], device='cuda:1') 2023-11-24 03:42:18,343 INFO [train_asr.py:1253] (1/4) Epoch 34, validation: loss=0.05766, simple_loss=0.05087, pruned_loss=0.005081, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-24 03:42:18,344 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 03:42:18,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2665253.3333333335, ans=0.05 2023-11-24 03:42:25,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2665253.3333333335, ans=0.0 2023-11-24 03:42:27,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2665253.3333333335, ans=0.2 2023-11-24 03:42:31,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399800 2023-11-24 03:42:32,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665320.0, ans=0.1 2023-11-24 03:42:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2665453.3333333335, ans=0.0 2023-11-24 03:43:18,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.558e+01 9.257e+01 9.837e+01 1.292e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 03:43:20,160 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:43:21,064 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3050, loss[loss=0.07441, simple_loss=0.1066, pruned_loss=0.01282, audio_tagging_loss=0.008287, over 15277.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.08994, pruned_loss=0.01312, audio_tagging_loss=0.00902, over 3046428.88 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:43:24,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-24 03:43:34,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399850 2023-11-24 03:43:45,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2665720.0, ans=0.0 2023-11-24 03:43:57,310 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:44:14,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2665853.3333333335, ans=0.125 2023-11-24 03:44:16,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665853.3333333335, ans=0.1 2023-11-24 03:44:21,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665853.3333333335, ans=0.1 2023-11-24 03:44:23,994 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3100, loss[loss=0.07353, simple_loss=0.09876, pruned_loss=0.01613, audio_tagging_loss=0.008014, over 15698.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09075, pruned_loss=0.0133, audio_tagging_loss=0.009049, over 3042104.98 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:44:36,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399900 2023-11-24 03:44:41,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2665986.6666666665, ans=0.125 2023-11-24 03:44:42,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2665986.6666666665, ans=0.1 2023-11-24 03:44:58,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2666053.3333333335, ans=0.0 2023-11-24 03:45:06,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2666120.0, ans=0.125 2023-11-24 03:45:09,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2666120.0, ans=0.125 2023-11-24 03:45:11,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-24 03:45:23,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.765e+01 9.339e+01 1.018e+02 1.384e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-24 03:45:26,231 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3150, loss[loss=0.05292, simple_loss=0.07009, pruned_loss=0.00585, audio_tagging_loss=0.01202, over 14420.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09072, pruned_loss=0.01316, audio_tagging_loss=0.009139, over 3039331.32 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:45:33,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2666253.3333333335, ans=0.0 2023-11-24 03:45:38,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 399950 2023-11-24 03:45:56,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2666386.6666666665, ans=0.125 2023-11-24 03:46:03,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2666453.3333333335, ans=0.125 2023-11-24 03:46:26,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2666520.0, ans=0.125 2023-11-24 03:46:28,852 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3200, loss[loss=0.0639, simple_loss=0.08638, pruned_loss=0.00884, audio_tagging_loss=0.01187, over 15319.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09021, pruned_loss=0.01301, audio_tagging_loss=0.009187, over 3039185.77 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:46:41,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400000 2023-11-24 03:46:50,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2666653.3333333335, ans=0.0 2023-11-24 03:47:00,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2666720.0, ans=0.1 2023-11-24 03:47:05,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2666720.0, ans=0.0 2023-11-24 03:47:07,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2666720.0, ans=0.125 2023-11-24 03:47:09,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2666786.6666666665, ans=0.125 2023-11-24 03:47:31,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.504e+01 9.216e+01 9.938e+01 1.136e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 03:47:34,774 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3250, loss[loss=0.0705, simple_loss=0.1011, pruned_loss=0.01269, audio_tagging_loss=0.007237, over 15767.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.08958, pruned_loss=0.01283, audio_tagging_loss=0.009367, over 3043795.84 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:47:47,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400050 2023-11-24 03:47:48,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2666986.6666666665, ans=0.125 2023-11-24 03:47:51,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2666986.6666666665, ans=0.125 2023-11-24 03:48:25,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2667186.6666666665, ans=0.2 2023-11-24 03:48:28,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2667186.6666666665, ans=0.125 2023-11-24 03:48:35,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2667186.6666666665, ans=0.125 2023-11-24 03:48:37,195 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3300, loss[loss=0.06735, simple_loss=0.09482, pruned_loss=0.01305, audio_tagging_loss=0.006882, over 15850.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09006, pruned_loss=0.01289, audio_tagging_loss=0.009406, over 3048427.44 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:48:49,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400100 2023-11-24 03:48:54,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2667320.0, ans=0.1 2023-11-24 03:49:11,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2667386.6666666665, ans=0.125 2023-11-24 03:49:12,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-24 03:49:18,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2667453.3333333335, ans=0.125 2023-11-24 03:49:37,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.541e+01 9.179e+01 9.840e+01 1.429e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 03:49:38,552 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3350, loss[loss=0.05603, simple_loss=0.07191, pruned_loss=0.01298, audio_tagging_loss=0.007098, over 15318.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09101, pruned_loss=0.01318, audio_tagging_loss=0.009148, over 3052955.36 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:49:39,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2667586.6666666665, ans=0.1 2023-11-24 03:49:46,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2667586.6666666665, ans=0.95 2023-11-24 03:49:51,811 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400150 2023-11-24 03:49:54,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2023-11-24 03:50:04,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2667720.0, ans=0.125 2023-11-24 03:50:12,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=22.5 2023-11-24 03:50:19,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2667786.6666666665, ans=0.1 2023-11-24 03:50:28,899 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 03:50:30,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-24 03:50:32,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2667853.3333333335, ans=0.125 2023-11-24 03:50:40,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2667920.0, ans=0.125 2023-11-24 03:50:40,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2667920.0, ans=0.1 2023-11-24 03:50:41,766 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3400, loss[loss=0.06911, simple_loss=0.09283, pruned_loss=0.01288, audio_tagging_loss=0.009825, over 14173.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09117, pruned_loss=0.01322, audio_tagging_loss=0.00901, over 3050058.53 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:50:54,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400200 2023-11-24 03:50:54,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2667986.6666666665, ans=0.1 2023-11-24 03:50:54,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2667986.6666666665, ans=0.125 2023-11-24 03:51:09,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-24 03:51:14,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2668053.3333333335, ans=0.2 2023-11-24 03:51:14,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2668053.3333333335, ans=0.125 2023-11-24 03:51:37,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2668186.6666666665, ans=0.1 2023-11-24 03:51:43,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.468e+01 9.078e+01 9.580e+01 1.134e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 03:51:45,105 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3450, loss[loss=0.06137, simple_loss=0.06951, pruned_loss=0.01325, audio_tagging_loss=0.01337, over 14922.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09235, pruned_loss=0.01348, audio_tagging_loss=0.008846, over 3050536.17 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:51:54,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2668253.3333333335, ans=0.125 2023-11-24 03:51:56,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2668320.0, ans=0.125 2023-11-24 03:51:57,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400250 2023-11-24 03:52:04,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2668320.0, ans=0.0 2023-11-24 03:52:31,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2668453.3333333335, ans=0.125 2023-11-24 03:52:36,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2668520.0, ans=0.125 2023-11-24 03:52:42,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2668520.0, ans=0.125 2023-11-24 03:52:45,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2668586.6666666665, ans=0.125 2023-11-24 03:52:46,998 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3500, loss[loss=0.0653, simple_loss=0.0869, pruned_loss=0.01317, audio_tagging_loss=0.008677, over 15803.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09216, pruned_loss=0.01333, audio_tagging_loss=0.008778, over 3049574.37 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:52:59,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400300 2023-11-24 03:53:09,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2668653.3333333335, ans=0.125 2023-11-24 03:53:18,509 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:53:47,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.784e+01 9.365e+01 1.022e+02 1.307e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-24 03:53:48,873 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3550, loss[loss=0.05031, simple_loss=0.06946, pruned_loss=0.007961, audio_tagging_loss=0.007619, over 14872.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09107, pruned_loss=0.01325, audio_tagging_loss=0.008749, over 3047910.93 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:54:00,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2668920.0, ans=0.2 2023-11-24 03:54:02,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400350 2023-11-24 03:54:22,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-24 03:54:31,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2669120.0, ans=0.2 2023-11-24 03:54:31,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2669120.0, ans=0.2 2023-11-24 03:54:48,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-24 03:54:50,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2669186.6666666665, ans=0.125 2023-11-24 03:54:52,796 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3600, loss[loss=0.04114, simple_loss=0.048, pruned_loss=0.004228, audio_tagging_loss=0.01291, over 16857.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09072, pruned_loss=0.01319, audio_tagging_loss=0.008801, over 3050242.98 frames. ], batch size: 68, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 03:54:54,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-11-24 03:55:04,826 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400400 2023-11-24 03:55:33,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2669453.3333333335, ans=0.0 2023-11-24 03:55:34,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2023-11-24 03:55:45,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2669520.0, ans=0.125 2023-11-24 03:55:52,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2669520.0, ans=0.0 2023-11-24 03:55:54,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.024e+01 8.332e+01 8.867e+01 9.623e+01 1.551e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-24 03:55:54,362 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3650, loss[loss=0.06438, simple_loss=0.08555, pruned_loss=0.01398, audio_tagging_loss=0.007624, over 13734.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09123, pruned_loss=0.01346, audio_tagging_loss=0.008813, over 3049369.50 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:55:54,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2669586.6666666665, ans=0.0 2023-11-24 03:56:00,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2669586.6666666665, ans=10.0 2023-11-24 03:56:06,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400450 2023-11-24 03:56:27,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669720.0, ans=0.1 2023-11-24 03:56:27,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2669720.0, ans=0.125 2023-11-24 03:56:39,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2669786.6666666665, ans=0.125 2023-11-24 03:56:42,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2023-11-24 03:56:43,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2669853.3333333335, ans=0.05 2023-11-24 03:56:52,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2669853.3333333335, ans=0.1 2023-11-24 03:56:56,224 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3700, loss[loss=0.06429, simple_loss=0.08454, pruned_loss=0.01071, audio_tagging_loss=0.01131, over 15238.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09133, pruned_loss=0.01343, audio_tagging_loss=0.008857, over 3054273.58 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:57:10,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400500 2023-11-24 03:57:16,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2669986.6666666665, ans=0.0 2023-11-24 03:57:30,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2670053.3333333335, ans=0.1 2023-11-24 03:57:39,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2670120.0, ans=0.0 2023-11-24 03:57:56,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2670186.6666666665, ans=0.0 2023-11-24 03:58:00,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.459e+01 9.233e+01 9.964e+01 1.289e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 03:58:00,132 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3750, loss[loss=0.06693, simple_loss=0.09797, pruned_loss=0.01057, audio_tagging_loss=0.007377, over 15684.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.0919, pruned_loss=0.01358, audio_tagging_loss=0.008808, over 3055010.56 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:58:00,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2670253.3333333335, ans=0.0 2023-11-24 03:58:11,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400550 2023-11-24 03:58:12,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2670320.0, ans=0.125 2023-11-24 03:58:40,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2670453.3333333335, ans=0.125 2023-11-24 03:58:41,605 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 03:59:01,175 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3800, loss[loss=0.05209, simple_loss=0.06361, pruned_loss=0.01043, audio_tagging_loss=0.009856, over 15372.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09168, pruned_loss=0.01343, audio_tagging_loss=0.008939, over 3055883.37 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 03:59:02,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2670586.6666666665, ans=0.07 2023-11-24 03:59:09,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2670586.6666666665, ans=0.125 2023-11-24 03:59:13,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400600 2023-11-24 03:59:17,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2670653.3333333335, ans=0.1 2023-11-24 03:59:22,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2023-11-24 03:59:31,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2670720.0, ans=0.125 2023-11-24 03:59:34,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2670720.0, ans=0.04949747468305833 2023-11-24 03:59:43,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2670786.6666666665, ans=0.125 2023-11-24 04:00:03,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.605e+01 9.126e+01 9.740e+01 1.269e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-24 04:00:03,081 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3850, loss[loss=0.05177, simple_loss=0.06753, pruned_loss=0.009107, audio_tagging_loss=0.008896, over 15428.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09096, pruned_loss=0.01349, audio_tagging_loss=0.009044, over 3049603.32 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:00:15,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-24 04:00:16,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400650 2023-11-24 04:00:26,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2670986.6666666665, ans=0.0 2023-11-24 04:00:28,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-11-24 04:00:31,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2671053.3333333335, ans=0.0 2023-11-24 04:01:06,125 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3900, loss[loss=0.04762, simple_loss=0.05726, pruned_loss=0.00751, audio_tagging_loss=0.01148, over 17513.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09075, pruned_loss=0.01349, audio_tagging_loss=0.009097, over 3053972.33 frames. ], batch size: 71, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:01:06,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2671253.3333333335, ans=0.0 2023-11-24 04:01:18,883 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400700 2023-11-24 04:01:29,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2671320.0, ans=0.1 2023-11-24 04:02:08,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.467e+01 9.035e+01 9.770e+01 1.225e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 04:02:08,956 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 3950, loss[loss=0.06348, simple_loss=0.09366, pruned_loss=0.008584, audio_tagging_loss=0.008069, over 15287.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09164, pruned_loss=0.01361, audio_tagging_loss=0.009133, over 3047877.05 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:02:20,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400750 2023-11-24 04:02:35,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2671720.0, ans=0.0 2023-11-24 04:03:02,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2671853.3333333335, ans=0.2 2023-11-24 04:03:09,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2671920.0, ans=0.035 2023-11-24 04:03:10,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-24 04:03:10,951 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4000, loss[loss=0.0881, simple_loss=0.1214, pruned_loss=0.01935, audio_tagging_loss=0.008027, over 15772.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.09203, pruned_loss=0.01375, audio_tagging_loss=0.009206, over 3052445.66 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:03:11,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2671920.0, ans=0.125 2023-11-24 04:03:13,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2671920.0, ans=0.2 2023-11-24 04:03:24,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400800 2023-11-24 04:03:38,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2672053.3333333335, ans=0.125 2023-11-24 04:04:02,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=22.5 2023-11-24 04:04:09,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=8.0 2023-11-24 04:04:11,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2023-11-24 04:04:13,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.478e+01 9.170e+01 1.003e+02 1.153e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 04:04:13,930 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4050, loss[loss=0.04637, simple_loss=0.05262, pruned_loss=0.01031, audio_tagging_loss=0.009748, over 14958.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.0915, pruned_loss=0.01351, audio_tagging_loss=0.009238, over 3051049.96 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:04:14,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2672253.3333333335, ans=0.125 2023-11-24 04:04:16,845 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 04:04:21,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2672253.3333333335, ans=0.125 2023-11-24 04:04:23,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2672253.3333333335, ans=0.04949747468305833 2023-11-24 04:04:26,463 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400850 2023-11-24 04:04:36,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2672320.0, ans=0.1 2023-11-24 04:05:10,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-11-24 04:05:15,956 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4100, loss[loss=0.08158, simple_loss=0.1256, pruned_loss=0.01304, audio_tagging_loss=0.005733, over 15395.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09307, pruned_loss=0.01362, audio_tagging_loss=0.009147, over 3052742.24 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:05:26,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-11-24 04:05:28,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400900 2023-11-24 04:05:31,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-24 04:05:34,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2672653.3333333335, ans=0.125 2023-11-24 04:05:39,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-11-24 04:05:47,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2672720.0, ans=0.1 2023-11-24 04:06:03,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-24 04:06:13,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2672853.3333333335, ans=0.125 2023-11-24 04:06:18,021 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4150, loss[loss=0.07979, simple_loss=0.1108, pruned_loss=0.01462, audio_tagging_loss=0.009794, over 15909.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09273, pruned_loss=0.01352, audio_tagging_loss=0.009049, over 3043753.57 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:06:19,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.718e+01 9.320e+01 1.030e+02 1.270e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-24 04:06:31,268 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 400950 2023-11-24 04:06:47,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2673053.3333333335, ans=0.1 2023-11-24 04:06:47,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2673053.3333333335, ans=0.125 2023-11-24 04:06:49,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2673053.3333333335, ans=0.125 2023-11-24 04:06:55,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-24 04:06:56,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2673120.0, ans=0.2 2023-11-24 04:06:59,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2673120.0, ans=0.0 2023-11-24 04:07:02,441 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 04:07:05,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2673120.0, ans=22.5 2023-11-24 04:07:21,658 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4200, loss[loss=0.06668, simple_loss=0.08899, pruned_loss=0.01434, audio_tagging_loss=0.007845, over 15862.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09077, pruned_loss=0.01319, audio_tagging_loss=0.009082, over 3045761.31 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:07:22,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2673253.3333333335, ans=0.04949747468305833 2023-11-24 04:07:24,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2023-11-24 04:07:32,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2673253.3333333335, ans=0.125 2023-11-24 04:07:34,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401000 2023-11-24 04:07:49,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2673386.6666666665, ans=0.125 2023-11-24 04:07:51,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2673386.6666666665, ans=0.0 2023-11-24 04:07:53,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.71 vs. limit=10.0 2023-11-24 04:08:01,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2673453.3333333335, ans=0.125 2023-11-24 04:08:05,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2673453.3333333335, ans=0.125 2023-11-24 04:08:22,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2673520.0, ans=0.125 2023-11-24 04:08:24,418 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4250, loss[loss=0.05294, simple_loss=0.07265, pruned_loss=0.007169, audio_tagging_loss=0.009445, over 15133.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09072, pruned_loss=0.0131, audio_tagging_loss=0.009017, over 3054661.52 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:08:25,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.323e+01 9.029e+01 9.665e+01 1.222e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 04:08:29,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2673586.6666666665, ans=0.125 2023-11-24 04:08:36,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401050 2023-11-24 04:08:37,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2673653.3333333335, ans=0.125 2023-11-24 04:08:46,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2673653.3333333335, ans=0.2 2023-11-24 04:09:07,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2673786.6666666665, ans=0.0 2023-11-24 04:09:24,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2023-11-24 04:09:26,466 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4300, loss[loss=0.06538, simple_loss=0.09392, pruned_loss=0.01077, audio_tagging_loss=0.007651, over 15320.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09218, pruned_loss=0.01332, audio_tagging_loss=0.008883, over 3052318.55 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:09:35,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2673920.0, ans=0.0 2023-11-24 04:09:38,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401100 2023-11-24 04:09:40,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2673986.6666666665, ans=0.1 2023-11-24 04:10:05,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2674120.0, ans=0.2 2023-11-24 04:10:06,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2674120.0, ans=0.125 2023-11-24 04:10:14,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2674120.0, ans=0.125 2023-11-24 04:10:28,548 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4350, loss[loss=0.0775, simple_loss=0.1036, pruned_loss=0.01787, audio_tagging_loss=0.007854, over 15307.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09246, pruned_loss=0.01338, audio_tagging_loss=0.008839, over 3047554.80 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:10:30,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.375e+01 8.839e+01 9.580e+01 1.285e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-24 04:10:32,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2674253.3333333335, ans=0.125 2023-11-24 04:10:41,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401150 2023-11-24 04:10:49,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2674320.0, ans=0.07 2023-11-24 04:10:58,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2674386.6666666665, ans=0.0 2023-11-24 04:11:13,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2674453.3333333335, ans=0.125 2023-11-24 04:11:24,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2674520.0, ans=0.125 2023-11-24 04:11:31,217 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4400, loss[loss=0.06987, simple_loss=0.09327, pruned_loss=0.01439, audio_tagging_loss=0.008847, over 15450.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09263, pruned_loss=0.01334, audio_tagging_loss=0.008718, over 3054003.22 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:11:31,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-24 04:11:43,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401200 2023-11-24 04:11:46,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-24 04:11:47,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-24 04:11:50,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2674653.3333333335, ans=0.0 2023-11-24 04:12:00,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2023-11-24 04:12:04,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2023-11-24 04:12:04,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=22.5 2023-11-24 04:12:14,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2674786.6666666665, ans=0.04949747468305833 2023-11-24 04:12:14,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2674786.6666666665, ans=0.0 2023-11-24 04:12:33,401 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4450, loss[loss=0.06082, simple_loss=0.0827, pruned_loss=0.01321, audio_tagging_loss=0.006261, over 16094.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09198, pruned_loss=0.01342, audio_tagging_loss=0.008749, over 3055558.79 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:12:34,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.725e+01 9.329e+01 1.003e+02 1.264e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-24 04:12:36,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2674920.0, ans=0.1 2023-11-24 04:12:41,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2674920.0, ans=0.0 2023-11-24 04:12:43,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2674920.0, ans=0.2 2023-11-24 04:12:45,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401250 2023-11-24 04:12:46,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2674986.6666666665, ans=0.5 2023-11-24 04:12:51,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2674986.6666666665, ans=0.0 2023-11-24 04:13:19,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2675120.0, ans=0.125 2023-11-24 04:13:28,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2675186.6666666665, ans=0.1 2023-11-24 04:13:35,765 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4500, loss[loss=0.06352, simple_loss=0.08149, pruned_loss=0.01327, audio_tagging_loss=0.009504, over 16249.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09198, pruned_loss=0.01346, audio_tagging_loss=0.008753, over 3051913.36 frames. ], batch size: 62, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:13:44,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2675253.3333333335, ans=0.125 2023-11-24 04:13:49,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401300 2023-11-24 04:14:01,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-11-24 04:14:01,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-24 04:14:16,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2675453.3333333335, ans=0.1 2023-11-24 04:14:34,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2023-11-24 04:14:38,761 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4550, loss[loss=0.05302, simple_loss=0.07712, pruned_loss=0.006273, audio_tagging_loss=0.008182, over 13373.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09148, pruned_loss=0.0133, audio_tagging_loss=0.008825, over 3044692.55 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:14:39,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.786e+01 9.235e+01 9.923e+01 1.429e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 04:14:50,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401350 2023-11-24 04:14:57,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2023-11-24 04:15:07,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2023-11-24 04:15:10,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2675720.0, ans=0.125 2023-11-24 04:15:10,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2675720.0, ans=0.09899494936611666 2023-11-24 04:15:24,798 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 04:15:40,265 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4600, loss[loss=0.08458, simple_loss=0.1227, pruned_loss=0.01484, audio_tagging_loss=0.008408, over 15987.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09207, pruned_loss=0.01348, audio_tagging_loss=0.008782, over 3043064.52 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:15:52,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401400 2023-11-24 04:16:12,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2676053.3333333335, ans=0.2 2023-11-24 04:16:16,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2676053.3333333335, ans=0.0 2023-11-24 04:16:42,185 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4650, loss[loss=0.06619, simple_loss=0.08653, pruned_loss=0.01275, audio_tagging_loss=0.01018, over 15436.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.0913, pruned_loss=0.01328, audio_tagging_loss=0.008867, over 3049841.34 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:16:43,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.311e+01 8.909e+01 9.483e+01 1.432e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-24 04:16:45,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2676253.3333333335, ans=0.125 2023-11-24 04:16:55,741 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401450 2023-11-24 04:17:04,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2676320.0, ans=0.125 2023-11-24 04:17:13,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2676386.6666666665, ans=0.0 2023-11-24 04:17:17,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2676386.6666666665, ans=0.0 2023-11-24 04:17:18,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2676453.3333333335, ans=0.125 2023-11-24 04:17:23,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2676453.3333333335, ans=0.0 2023-11-24 04:17:37,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2676520.0, ans=0.0 2023-11-24 04:17:45,187 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4700, loss[loss=0.08792, simple_loss=0.124, pruned_loss=0.01762, audio_tagging_loss=0.008293, over 15618.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09148, pruned_loss=0.01335, audio_tagging_loss=0.008988, over 3046206.28 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:17:57,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401500 2023-11-24 04:18:09,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2676720.0, ans=0.125 2023-11-24 04:18:29,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2676786.6666666665, ans=0.125 2023-11-24 04:18:46,870 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4750, loss[loss=0.07441, simple_loss=0.1055, pruned_loss=0.01232, audio_tagging_loss=0.009357, over 16178.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09174, pruned_loss=0.01344, audio_tagging_loss=0.009053, over 3047367.09 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:18:47,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.351e+01 9.004e+01 9.816e+01 1.181e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-24 04:18:49,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2676920.0, ans=0.2 2023-11-24 04:18:58,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401550 2023-11-24 04:19:00,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2676986.6666666665, ans=0.125 2023-11-24 04:19:16,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-24 04:19:23,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=22.5 2023-11-24 04:19:27,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2677120.0, ans=0.0 2023-11-24 04:19:30,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=22.5 2023-11-24 04:19:46,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2677253.3333333335, ans=0.95 2023-11-24 04:19:47,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2023-11-24 04:19:47,772 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4800, loss[loss=0.06151, simple_loss=0.0864, pruned_loss=0.01062, audio_tagging_loss=0.007684, over 13958.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.0912, pruned_loss=0.01349, audio_tagging_loss=0.009145, over 3047608.14 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:19:56,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2677253.3333333335, ans=0.125 2023-11-24 04:20:01,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401600 2023-11-24 04:20:18,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-24 04:20:52,270 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4850, loss[loss=0.06293, simple_loss=0.08617, pruned_loss=0.008331, audio_tagging_loss=0.01152, over 15704.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09081, pruned_loss=0.01323, audio_tagging_loss=0.009298, over 3054797.22 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:20:53,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.375e+01 9.054e+01 9.821e+01 1.202e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-24 04:21:04,076 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401650 2023-11-24 04:21:05,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2023-11-24 04:21:08,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2677653.3333333335, ans=0.0 2023-11-24 04:21:09,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2677653.3333333335, ans=0.0 2023-11-24 04:21:11,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2677653.3333333335, ans=0.5 2023-11-24 04:21:43,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-24 04:21:46,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-24 04:21:53,712 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4900, loss[loss=0.06673, simple_loss=0.08145, pruned_loss=0.01606, audio_tagging_loss=0.009941, over 14186.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09107, pruned_loss=0.01326, audio_tagging_loss=0.009284, over 3048466.53 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:22:05,638 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401700 2023-11-24 04:22:30,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2678120.0, ans=0.0 2023-11-24 04:22:34,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2678120.0, ans=0.1 2023-11-24 04:22:55,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-24 04:22:55,409 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 4950, loss[loss=0.08235, simple_loss=0.1025, pruned_loss=0.02205, audio_tagging_loss=0.009037, over 14893.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09099, pruned_loss=0.01327, audio_tagging_loss=0.009074, over 3051561.55 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:22:56,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.560e+01 8.976e+01 9.753e+01 1.389e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-24 04:23:08,534 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401750 2023-11-24 04:23:15,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2678320.0, ans=0.125 2023-11-24 04:23:30,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-11-24 04:23:41,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2678453.3333333335, ans=0.1 2023-11-24 04:23:57,864 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5000, loss[loss=0.08233, simple_loss=0.112, pruned_loss=0.01535, audio_tagging_loss=0.01098, over 14588.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09217, pruned_loss=0.01339, audio_tagging_loss=0.008913, over 3042314.63 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:24:09,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-24 04:24:10,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401800 2023-11-24 04:24:40,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2678786.6666666665, ans=0.1 2023-11-24 04:24:43,427 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:25:00,938 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5050, loss[loss=0.06119, simple_loss=0.08569, pruned_loss=0.009729, audio_tagging_loss=0.008616, over 15610.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.09231, pruned_loss=0.01345, audio_tagging_loss=0.00893, over 3041295.60 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:25:02,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.608e+01 9.238e+01 9.938e+01 4.414e+02, threshold=1.848e+02, percent-clipped=1.0 2023-11-24 04:25:12,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401850 2023-11-24 04:25:18,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2678986.6666666665, ans=0.0 2023-11-24 04:25:47,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2679120.0, ans=0.0 2023-11-24 04:26:02,688 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5100, loss[loss=0.05478, simple_loss=0.07089, pruned_loss=0.01104, audio_tagging_loss=0.008302, over 14487.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09199, pruned_loss=0.01338, audio_tagging_loss=0.008908, over 3038909.53 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:26:11,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2679253.3333333335, ans=0.125 2023-11-24 04:26:15,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-24 04:26:16,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401900 2023-11-24 04:26:18,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-24 04:26:22,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679320.0, ans=0.1 2023-11-24 04:26:47,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2679453.3333333335, ans=0.125 2023-11-24 04:27:01,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2679520.0, ans=0.0 2023-11-24 04:27:05,520 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5150, loss[loss=0.08181, simple_loss=0.1203, pruned_loss=0.01517, audio_tagging_loss=0.006506, over 15597.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09183, pruned_loss=0.01339, audio_tagging_loss=0.008781, over 3032933.56 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:27:07,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679586.6666666665, ans=0.1 2023-11-24 04:27:07,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.413e+01 8.884e+01 9.698e+01 1.235e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-24 04:27:08,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-24 04:27:18,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 401950 2023-11-24 04:27:45,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2679786.6666666665, ans=0.2 2023-11-24 04:27:58,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2679853.3333333335, ans=0.5 2023-11-24 04:28:08,605 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5200, loss[loss=0.04075, simple_loss=0.05237, pruned_loss=0.006085, audio_tagging_loss=0.00848, over 14537.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09161, pruned_loss=0.01338, audio_tagging_loss=0.008911, over 3027028.70 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:28:20,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402000 2023-11-24 04:28:25,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-24 04:28:50,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2680120.0, ans=0.125 2023-11-24 04:28:50,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2680120.0, ans=0.125 2023-11-24 04:29:09,996 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5250, loss[loss=0.05999, simple_loss=0.08319, pruned_loss=0.007811, audio_tagging_loss=0.01058, over 14822.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09216, pruned_loss=0.01356, audio_tagging_loss=0.008855, over 3031104.03 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 32.0 2023-11-24 04:29:11,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2680253.3333333335, ans=0.125 2023-11-24 04:29:12,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.432e+01 8.228e+01 8.916e+01 9.618e+01 1.278e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-24 04:29:21,910 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402050 2023-11-24 04:29:30,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2680320.0, ans=0.5 2023-11-24 04:29:42,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2680386.6666666665, ans=0.1 2023-11-24 04:29:48,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2680453.3333333335, ans=0.1 2023-11-24 04:29:50,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2680453.3333333335, ans=0.2 2023-11-24 04:29:51,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2680453.3333333335, ans=0.125 2023-11-24 04:29:56,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2680453.3333333335, ans=0.125 2023-11-24 04:30:12,374 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5300, loss[loss=0.06375, simple_loss=0.08346, pruned_loss=0.01486, audio_tagging_loss=0.007155, over 14340.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09227, pruned_loss=0.01364, audio_tagging_loss=0.008701, over 3036133.72 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:30:13,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2680586.6666666665, ans=0.125 2023-11-24 04:30:25,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402100 2023-11-24 04:30:33,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2680653.3333333335, ans=0.2 2023-11-24 04:30:42,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2680720.0, ans=0.07 2023-11-24 04:31:04,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2680853.3333333335, ans=0.2 2023-11-24 04:31:15,249 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5350, loss[loss=0.05992, simple_loss=0.08038, pruned_loss=0.0127, audio_tagging_loss=0.007034, over 14479.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09194, pruned_loss=0.01355, audio_tagging_loss=0.008665, over 3030784.05 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:31:18,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.724e+01 9.285e+01 9.851e+01 1.330e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-24 04:31:27,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402150 2023-11-24 04:31:37,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2680986.6666666665, ans=0.2 2023-11-24 04:31:43,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2681053.3333333335, ans=0.125 2023-11-24 04:31:53,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2681120.0, ans=0.125 2023-11-24 04:32:00,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-11-24 04:32:16,950 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5400, loss[loss=0.08025, simple_loss=0.105, pruned_loss=0.01791, audio_tagging_loss=0.009831, over 15309.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09227, pruned_loss=0.01363, audio_tagging_loss=0.008726, over 3031463.66 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:32:21,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2023-11-24 04:32:28,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402200 2023-11-24 04:32:41,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2681386.6666666665, ans=0.0 2023-11-24 04:32:42,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2681386.6666666665, ans=0.02 2023-11-24 04:33:13,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2681520.0, ans=0.0 2023-11-24 04:33:18,580 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5450, loss[loss=0.07875, simple_loss=0.1134, pruned_loss=0.01499, audio_tagging_loss=0.00706, over 14787.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09253, pruned_loss=0.01359, audio_tagging_loss=0.008788, over 3033144.31 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 8.0 2023-11-24 04:33:24,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.363e+01 9.005e+01 9.739e+01 1.241e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-24 04:33:31,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402250 2023-11-24 04:33:31,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2681653.3333333335, ans=0.2 2023-11-24 04:33:39,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-24 04:33:51,161 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:33:56,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2681786.6666666665, ans=0.125 2023-11-24 04:34:17,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2681853.3333333335, ans=0.0 2023-11-24 04:34:21,298 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5500, loss[loss=0.07864, simple_loss=0.1119, pruned_loss=0.01465, audio_tagging_loss=0.008061, over 14710.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09278, pruned_loss=0.01377, audio_tagging_loss=0.008893, over 3040360.51 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 8.0 2023-11-24 04:34:33,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402300 2023-11-24 04:34:37,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-24 04:35:14,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2682186.6666666665, ans=0.05 2023-11-24 04:35:16,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-11-24 04:35:22,496 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5550, loss[loss=0.07753, simple_loss=0.1049, pruned_loss=0.01481, audio_tagging_loss=0.01025, over 14848.00 frames. ], tot_loss[loss=0.06928, simple_loss=0.09277, pruned_loss=0.01381, audio_tagging_loss=0.00909, over 3030562.54 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 8.0 2023-11-24 04:35:27,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.518e+01 9.178e+01 1.002e+02 1.571e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 04:35:32,748 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:35:34,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402350 2023-11-24 04:36:05,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=22.5 2023-11-24 04:36:22,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=22.5 2023-11-24 04:36:24,124 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5600, loss[loss=0.06119, simple_loss=0.08369, pruned_loss=0.009698, audio_tagging_loss=0.009648, over 15033.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09181, pruned_loss=0.01359, audio_tagging_loss=0.009173, over 3028065.36 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2023-11-24 04:36:37,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402400 2023-11-24 04:36:46,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2682653.3333333335, ans=0.125 2023-11-24 04:37:05,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2682786.6666666665, ans=0.125 2023-11-24 04:37:08,380 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 04:37:27,408 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5650, loss[loss=0.07252, simple_loss=0.09508, pruned_loss=0.01574, audio_tagging_loss=0.009244, over 15977.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09189, pruned_loss=0.01365, audio_tagging_loss=0.009259, over 3038720.01 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:37:27,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2682920.0, ans=0.0 2023-11-24 04:37:33,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.272e+01 8.812e+01 9.562e+01 1.491e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-24 04:37:40,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402450 2023-11-24 04:37:52,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2683053.3333333335, ans=0.0 2023-11-24 04:37:54,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2683053.3333333335, ans=0.0 2023-11-24 04:38:09,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2683120.0, ans=0.1 2023-11-24 04:38:13,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2683120.0, ans=0.0 2023-11-24 04:38:21,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2683186.6666666665, ans=0.2 2023-11-24 04:38:29,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2683186.6666666665, ans=0.07 2023-11-24 04:38:31,380 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5700, loss[loss=0.05939, simple_loss=0.08593, pruned_loss=0.008511, audio_tagging_loss=0.007917, over 14809.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09215, pruned_loss=0.01363, audio_tagging_loss=0.009219, over 3044214.70 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:38:38,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2683253.3333333335, ans=0.1 2023-11-24 04:38:38,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2683253.3333333335, ans=0.125 2023-11-24 04:38:40,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=12.0 2023-11-24 04:38:41,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2683253.3333333335, ans=0.125 2023-11-24 04:38:43,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402500 2023-11-24 04:38:48,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=22.5 2023-11-24 04:38:49,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2683320.0, ans=0.125 2023-11-24 04:39:16,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-24 04:39:22,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2683520.0, ans=0.125 2023-11-24 04:39:31,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2683520.0, ans=0.1 2023-11-24 04:39:33,793 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5750, loss[loss=0.05271, simple_loss=0.06775, pruned_loss=0.008408, audio_tagging_loss=0.01043, over 16039.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09176, pruned_loss=0.01351, audio_tagging_loss=0.00905, over 3047596.47 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:39:37,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2683586.6666666665, ans=0.125 2023-11-24 04:39:39,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.509e+01 8.554e+01 9.219e+01 1.000e+02 1.500e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 04:39:42,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2683586.6666666665, ans=0.125 2023-11-24 04:39:46,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2683653.3333333335, ans=0.0 2023-11-24 04:39:47,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402550 2023-11-24 04:40:01,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.74 vs. limit=12.0 2023-11-24 04:40:07,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2683720.0, ans=0.125 2023-11-24 04:40:29,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2683853.3333333335, ans=0.125 2023-11-24 04:40:32,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2683853.3333333335, ans=0.04949747468305833 2023-11-24 04:40:37,106 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5800, loss[loss=0.06524, simple_loss=0.08158, pruned_loss=0.0156, audio_tagging_loss=0.008854, over 15613.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09121, pruned_loss=0.01337, audio_tagging_loss=0.008983, over 3048114.42 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:40:38,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2683920.0, ans=0.2 2023-11-24 04:40:49,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402600 2023-11-24 04:40:58,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2683986.6666666665, ans=0.125 2023-11-24 04:40:59,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2683986.6666666665, ans=0.0 2023-11-24 04:41:00,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2684053.3333333335, ans=0.95 2023-11-24 04:41:05,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2684053.3333333335, ans=0.1 2023-11-24 04:41:08,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2684053.3333333335, ans=0.0 2023-11-24 04:41:09,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-24 04:41:18,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2684120.0, ans=0.1 2023-11-24 04:41:22,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2684120.0, ans=0.0 2023-11-24 04:41:26,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2684186.6666666665, ans=0.2 2023-11-24 04:41:38,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2023-11-24 04:41:39,342 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5850, loss[loss=0.0464, simple_loss=0.05614, pruned_loss=0.008482, audio_tagging_loss=0.009851, over 14082.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09103, pruned_loss=0.01346, audio_tagging_loss=0.008982, over 3047627.58 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:41:40,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2684253.3333333335, ans=0.125 2023-11-24 04:41:44,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.321e+01 9.009e+01 9.790e+01 1.176e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-24 04:41:46,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2684253.3333333335, ans=15.0 2023-11-24 04:41:51,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402650 2023-11-24 04:42:07,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2684386.6666666665, ans=0.125 2023-11-24 04:42:14,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2684386.6666666665, ans=0.125 2023-11-24 04:42:18,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-24 04:42:35,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2684520.0, ans=0.1 2023-11-24 04:42:41,235 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5900, loss[loss=0.07943, simple_loss=0.1112, pruned_loss=0.01609, audio_tagging_loss=0.007731, over 14489.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09127, pruned_loss=0.01336, audio_tagging_loss=0.008913, over 3048922.08 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:42:54,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402700 2023-11-24 04:43:00,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2684653.3333333335, ans=0.1 2023-11-24 04:43:00,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2684653.3333333335, ans=0.0 2023-11-24 04:43:14,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2684720.0, ans=0.125 2023-11-24 04:43:27,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2684786.6666666665, ans=0.1 2023-11-24 04:43:44,345 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 5950, loss[loss=0.05776, simple_loss=0.07786, pruned_loss=0.009513, audio_tagging_loss=0.009314, over 16721.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09173, pruned_loss=0.01345, audio_tagging_loss=0.008863, over 3050749.59 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:43:44,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2684920.0, ans=0.125 2023-11-24 04:43:49,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.328e+01 9.136e+01 9.988e+01 1.334e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 04:43:57,001 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402750 2023-11-24 04:44:04,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2684986.6666666665, ans=0.125 2023-11-24 04:44:16,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2685053.3333333335, ans=0.2 2023-11-24 04:44:18,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2685053.3333333335, ans=0.125 2023-11-24 04:44:30,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2023-11-24 04:44:39,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2685186.6666666665, ans=0.125 2023-11-24 04:44:45,807 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6000, loss[loss=0.06094, simple_loss=0.07864, pruned_loss=0.01195, audio_tagging_loss=0.00967, over 14812.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09152, pruned_loss=0.01346, audio_tagging_loss=0.008803, over 3043862.56 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:44:45,808 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 04:45:26,688 INFO [train_asr.py:1253] (1/4) Epoch 34, validation: loss=0.05772, simple_loss=0.05082, pruned_loss=0.005023, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-24 04:45:26,689 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 04:45:33,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2685253.3333333335, ans=0.2 2023-11-24 04:45:40,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402800 2023-11-24 04:45:49,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2685320.0, ans=0.125 2023-11-24 04:45:52,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-11-24 04:46:11,976 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 04:46:24,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2685520.0, ans=0.1 2023-11-24 04:46:29,978 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6050, loss[loss=0.0648, simple_loss=0.08862, pruned_loss=0.01231, audio_tagging_loss=0.008179, over 14687.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09128, pruned_loss=0.0134, audio_tagging_loss=0.008818, over 3042151.10 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:46:35,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.324e+01 9.101e+01 9.713e+01 1.173e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 04:46:42,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402850 2023-11-24 04:46:48,480 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:46:56,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2685720.0, ans=0.125 2023-11-24 04:46:58,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-24 04:47:07,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2685786.6666666665, ans=10.0 2023-11-24 04:47:16,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2023-11-24 04:47:27,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=2685853.3333333335, ans=12.0 2023-11-24 04:47:28,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2685853.3333333335, ans=0.125 2023-11-24 04:47:31,917 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6100, loss[loss=0.05348, simple_loss=0.07807, pruned_loss=0.005427, audio_tagging_loss=0.009018, over 15696.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09147, pruned_loss=0.01333, audio_tagging_loss=0.008774, over 3047439.99 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:47:32,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2685920.0, ans=0.1 2023-11-24 04:47:40,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2685920.0, ans=0.0 2023-11-24 04:47:43,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402900 2023-11-24 04:47:43,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2685986.6666666665, ans=0.0 2023-11-24 04:47:59,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2686053.3333333335, ans=0.1 2023-11-24 04:48:04,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2686053.3333333335, ans=0.2 2023-11-24 04:48:15,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2686120.0, ans=0.0 2023-11-24 04:48:26,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-24 04:48:33,316 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6150, loss[loss=0.07016, simple_loss=0.09401, pruned_loss=0.01361, audio_tagging_loss=0.009544, over 14795.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09153, pruned_loss=0.01334, audio_tagging_loss=0.008808, over 3042229.19 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:48:38,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.509e+01 8.476e+01 9.051e+01 9.662e+01 1.133e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 04:48:46,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 402950 2023-11-24 04:49:06,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2686386.6666666665, ans=0.125 2023-11-24 04:49:18,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-24 04:49:36,248 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6200, loss[loss=0.06484, simple_loss=0.08757, pruned_loss=0.01197, audio_tagging_loss=0.009083, over 13842.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09109, pruned_loss=0.01322, audio_tagging_loss=0.00887, over 3044206.64 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:49:46,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-11-24 04:49:49,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403000 2023-11-24 04:49:54,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2686653.3333333335, ans=0.07 2023-11-24 04:50:20,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2686786.6666666665, ans=0.2 2023-11-24 04:50:21,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2686786.6666666665, ans=0.1 2023-11-24 04:50:38,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2686920.0, ans=0.125 2023-11-24 04:50:39,666 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6250, loss[loss=0.06836, simple_loss=0.08926, pruned_loss=0.01582, audio_tagging_loss=0.0079, over 15077.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09023, pruned_loss=0.01312, audio_tagging_loss=0.008915, over 3037420.01 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:50:41,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2686920.0, ans=0.125 2023-11-24 04:50:45,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.512e+01 9.019e+01 9.887e+01 1.410e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-24 04:50:48,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2686920.0, ans=0.1 2023-11-24 04:50:51,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403050 2023-11-24 04:51:15,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2687053.3333333335, ans=0.0 2023-11-24 04:51:32,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-24 04:51:34,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2687186.6666666665, ans=0.125 2023-11-24 04:51:40,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.55 vs. limit=6.0 2023-11-24 04:51:41,211 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6300, loss[loss=0.06123, simple_loss=0.08225, pruned_loss=0.0103, audio_tagging_loss=0.00981, over 14952.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09148, pruned_loss=0.01333, audio_tagging_loss=0.008945, over 3046619.31 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:51:47,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2687253.3333333335, ans=0.0 2023-11-24 04:51:53,704 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403100 2023-11-24 04:51:56,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2687320.0, ans=0.2 2023-11-24 04:52:02,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2687320.0, ans=0.1 2023-11-24 04:52:06,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2687386.6666666665, ans=0.1 2023-11-24 04:52:10,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2687386.6666666665, ans=0.2 2023-11-24 04:52:29,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2687520.0, ans=0.05 2023-11-24 04:52:43,754 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6350, loss[loss=0.07535, simple_loss=0.1081, pruned_loss=0.01275, audio_tagging_loss=0.008539, over 15673.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09278, pruned_loss=0.01356, audio_tagging_loss=0.008978, over 3049319.23 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:52:44,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2687586.6666666665, ans=0.125 2023-11-24 04:52:46,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2687586.6666666665, ans=0.125 2023-11-24 04:52:50,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.312e+01 8.895e+01 9.882e+01 1.344e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-24 04:52:56,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403150 2023-11-24 04:53:10,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-24 04:53:46,056 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6400, loss[loss=0.07598, simple_loss=0.1053, pruned_loss=0.01661, audio_tagging_loss=0.006698, over 16616.00 frames. ], tot_loss[loss=0.06882, simple_loss=0.09236, pruned_loss=0.01354, audio_tagging_loss=0.009106, over 3049649.73 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:53:46,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2687920.0, ans=0.2 2023-11-24 04:53:55,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2687920.0, ans=0.125 2023-11-24 04:53:55,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2687920.0, ans=0.0 2023-11-24 04:53:57,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403200 2023-11-24 04:54:17,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2688053.3333333335, ans=0.1 2023-11-24 04:54:21,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2688053.3333333335, ans=15.0 2023-11-24 04:54:47,660 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6450, loss[loss=0.06534, simple_loss=0.0863, pruned_loss=0.009074, audio_tagging_loss=0.01312, over 15364.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09196, pruned_loss=0.01341, audio_tagging_loss=0.009157, over 3042816.71 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:54:53,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.531e+01 8.209e+01 8.936e+01 9.468e+01 2.072e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-24 04:55:00,075 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403250 2023-11-24 04:55:00,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2688320.0, ans=0.035 2023-11-24 04:55:07,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:55:12,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-24 04:55:17,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2688386.6666666665, ans=0.125 2023-11-24 04:55:22,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2688386.6666666665, ans=0.125 2023-11-24 04:55:26,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2688453.3333333335, ans=0.0 2023-11-24 04:55:36,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2688520.0, ans=0.0 2023-11-24 04:55:36,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2688520.0, ans=0.125 2023-11-24 04:55:44,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2688520.0, ans=0.5 2023-11-24 04:55:48,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2688586.6666666665, ans=0.125 2023-11-24 04:55:49,665 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6500, loss[loss=0.05994, simple_loss=0.08213, pruned_loss=0.01122, audio_tagging_loss=0.007655, over 15208.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09143, pruned_loss=0.01341, audio_tagging_loss=0.009163, over 3043259.94 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 04:55:52,963 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 04:56:02,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403300 2023-11-24 04:56:40,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2688853.3333333335, ans=0.125 2023-11-24 04:56:52,699 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6550, loss[loss=0.09728, simple_loss=0.1359, pruned_loss=0.02351, audio_tagging_loss=0.005834, over 16595.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.09218, pruned_loss=0.01355, audio_tagging_loss=0.009015, over 3043519.91 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:56:59,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.633e+01 9.315e+01 9.907e+01 1.273e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 04:57:00,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2688920.0, ans=0.5 2023-11-24 04:57:05,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403350 2023-11-24 04:57:10,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2688986.6666666665, ans=0.125 2023-11-24 04:57:12,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-24 04:57:22,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2689053.3333333335, ans=0.0 2023-11-24 04:57:48,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2689186.6666666665, ans=0.125 2023-11-24 04:57:54,998 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6600, loss[loss=0.07847, simple_loss=0.1031, pruned_loss=0.01496, audio_tagging_loss=0.01198, over 16215.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09154, pruned_loss=0.0133, audio_tagging_loss=0.008899, over 3045108.97 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:57:57,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2689253.3333333335, ans=0.125 2023-11-24 04:58:01,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2689253.3333333335, ans=0.0 2023-11-24 04:58:06,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403400 2023-11-24 04:58:28,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2689386.6666666665, ans=0.015 2023-11-24 04:58:34,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2689453.3333333335, ans=0.125 2023-11-24 04:58:51,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.0 2023-11-24 04:58:57,514 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6650, loss[loss=0.09281, simple_loss=0.1333, pruned_loss=0.01944, audio_tagging_loss=0.00673, over 16113.00 frames. ], tot_loss[loss=0.06878, simple_loss=0.0928, pruned_loss=0.0136, audio_tagging_loss=0.008782, over 3042976.20 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 04:58:57,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2689586.6666666665, ans=0.0 2023-11-24 04:58:59,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2689586.6666666665, ans=0.1 2023-11-24 04:59:04,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2689586.6666666665, ans=0.1 2023-11-24 04:59:05,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.526e+01 8.351e+01 9.016e+01 9.664e+01 1.302e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 04:59:11,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403450 2023-11-24 04:59:18,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2689653.3333333335, ans=0.125 2023-11-24 04:59:40,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-11-24 05:00:00,591 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6700, loss[loss=0.07812, simple_loss=0.1125, pruned_loss=0.0132, audio_tagging_loss=0.008676, over 14925.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.0928, pruned_loss=0.01356, audio_tagging_loss=0.008749, over 3043010.64 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:00:05,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2689920.0, ans=0.09899494936611666 2023-11-24 05:00:10,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2689920.0, ans=0.125 2023-11-24 05:00:12,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403500 2023-11-24 05:00:15,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2689986.6666666665, ans=10.0 2023-11-24 05:00:17,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2023-11-24 05:00:17,812 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 05:00:43,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-24 05:00:48,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2690120.0, ans=0.5 2023-11-24 05:01:01,989 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6750, loss[loss=0.07036, simple_loss=0.1004, pruned_loss=0.01351, audio_tagging_loss=0.006658, over 14438.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09276, pruned_loss=0.01349, audio_tagging_loss=0.008716, over 3035028.14 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:01:04,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2690253.3333333335, ans=0.1 2023-11-24 05:01:09,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.241e+01 8.716e+01 9.523e+01 2.058e+02, threshold=1.743e+02, percent-clipped=1.0 2023-11-24 05:01:14,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403550 2023-11-24 05:01:15,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2023-11-24 05:01:36,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2690386.6666666665, ans=0.0 2023-11-24 05:01:38,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.31 vs. limit=10.0 2023-11-24 05:01:58,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2690520.0, ans=0.125 2023-11-24 05:02:04,019 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6800, loss[loss=0.0688, simple_loss=0.09114, pruned_loss=0.01495, audio_tagging_loss=0.008275, over 14963.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09239, pruned_loss=0.01349, audio_tagging_loss=0.008742, over 3033397.02 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:02:14,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2690586.6666666665, ans=0.125 2023-11-24 05:02:17,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403600 2023-11-24 05:02:22,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2690653.3333333335, ans=0.0 2023-11-24 05:02:30,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2690720.0, ans=0.125 2023-11-24 05:02:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2690720.0, ans=0.125 2023-11-24 05:02:41,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2023-11-24 05:03:01,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2023-11-24 05:03:07,286 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6850, loss[loss=0.07139, simple_loss=0.09479, pruned_loss=0.01604, audio_tagging_loss=0.007952, over 14641.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09255, pruned_loss=0.01368, audio_tagging_loss=0.00866, over 3041949.18 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:03:15,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.418e+01 8.943e+01 9.846e+01 1.358e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-24 05:03:19,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403650 2023-11-24 05:03:19,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-24 05:03:27,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2690986.6666666665, ans=0.125 2023-11-24 05:04:00,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-24 05:04:02,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2691186.6666666665, ans=0.125 2023-11-24 05:04:08,462 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6900, loss[loss=0.06071, simple_loss=0.07926, pruned_loss=0.0109, audio_tagging_loss=0.01017, over 15895.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09175, pruned_loss=0.01353, audio_tagging_loss=0.00879, over 3051032.10 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:04:20,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403700 2023-11-24 05:04:25,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2691320.0, ans=0.0 2023-11-24 05:04:31,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2691320.0, ans=0.2 2023-11-24 05:04:47,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2023-11-24 05:04:55,836 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 05:05:00,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2691520.0, ans=0.125 2023-11-24 05:05:10,069 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 6950, loss[loss=0.07025, simple_loss=0.09799, pruned_loss=0.01271, audio_tagging_loss=0.008551, over 15851.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09243, pruned_loss=0.01362, audio_tagging_loss=0.008716, over 3046360.17 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:05:19,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.551e+01 9.218e+01 9.903e+01 1.445e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 05:05:21,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2691586.6666666665, ans=0.125 2023-11-24 05:05:23,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403750 2023-11-24 05:05:26,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2691653.3333333335, ans=0.035 2023-11-24 05:05:30,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-24 05:05:42,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2691720.0, ans=0.125 2023-11-24 05:05:56,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2691786.6666666665, ans=0.125 2023-11-24 05:06:13,665 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7000, loss[loss=0.06068, simple_loss=0.07751, pruned_loss=0.01263, audio_tagging_loss=0.009289, over 14529.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09069, pruned_loss=0.01334, audio_tagging_loss=0.008912, over 3048647.87 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:06:20,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2691920.0, ans=0.125 2023-11-24 05:06:23,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-24 05:06:25,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403800 2023-11-24 05:06:36,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2692053.3333333335, ans=0.125 2023-11-24 05:06:48,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2692120.0, ans=0.2 2023-11-24 05:07:15,349 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7050, loss[loss=0.06593, simple_loss=0.08665, pruned_loss=0.01282, audio_tagging_loss=0.009791, over 15190.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.0912, pruned_loss=0.01332, audio_tagging_loss=0.009042, over 3046999.51 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:07:23,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.613e+01 8.305e+01 8.871e+01 9.954e+01 1.371e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-24 05:07:27,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403850 2023-11-24 05:07:32,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2692320.0, ans=0.0 2023-11-24 05:07:37,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2692320.0, ans=0.125 2023-11-24 05:07:59,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2692453.3333333335, ans=0.125 2023-11-24 05:08:06,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2692520.0, ans=0.125 2023-11-24 05:08:16,589 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7100, loss[loss=0.05653, simple_loss=0.07972, pruned_loss=0.007385, audio_tagging_loss=0.009286, over 14855.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09147, pruned_loss=0.01321, audio_tagging_loss=0.009056, over 3047136.64 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:08:18,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2692586.6666666665, ans=0.1 2023-11-24 05:08:25,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2692586.6666666665, ans=0.125 2023-11-24 05:08:30,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403900 2023-11-24 05:08:33,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2692653.3333333335, ans=0.09899494936611666 2023-11-24 05:08:40,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2692653.3333333335, ans=0.0 2023-11-24 05:08:43,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-24 05:08:44,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2692720.0, ans=0.2 2023-11-24 05:08:54,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-24 05:09:20,741 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7150, loss[loss=0.05086, simple_loss=0.06346, pruned_loss=0.0114, audio_tagging_loss=0.007731, over 15292.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09076, pruned_loss=0.013, audio_tagging_loss=0.00906, over 3043708.22 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:09:25,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2692920.0, ans=0.125 2023-11-24 05:09:29,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.812e+01 9.268e+01 1.001e+02 1.464e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 05:09:31,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2692920.0, ans=0.1 2023-11-24 05:09:33,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 403950 2023-11-24 05:10:22,845 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7200, loss[loss=0.05648, simple_loss=0.07827, pruned_loss=0.008578, audio_tagging_loss=0.008769, over 15327.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09151, pruned_loss=0.01325, audio_tagging_loss=0.009105, over 3051272.90 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:10:34,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404000 2023-11-24 05:10:56,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2693386.6666666665, ans=0.2 2023-11-24 05:11:02,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2693386.6666666665, ans=0.125 2023-11-24 05:11:03,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2693453.3333333335, ans=0.2 2023-11-24 05:11:08,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-24 05:11:14,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2693453.3333333335, ans=0.1 2023-11-24 05:11:28,069 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7250, loss[loss=0.06869, simple_loss=0.08564, pruned_loss=0.01645, audio_tagging_loss=0.009419, over 13678.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09073, pruned_loss=0.01301, audio_tagging_loss=0.009162, over 3052197.54 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:11:28,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2693586.6666666665, ans=0.1 2023-11-24 05:11:34,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2693586.6666666665, ans=0.025 2023-11-24 05:11:34,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=22.5 2023-11-24 05:11:36,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.579e+01 9.139e+01 1.008e+02 1.154e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 05:11:39,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-24 05:11:40,482 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404050 2023-11-24 05:11:42,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2693653.3333333335, ans=0.0 2023-11-24 05:12:30,880 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7300, loss[loss=0.05475, simple_loss=0.07359, pruned_loss=0.0108, audio_tagging_loss=0.007154, over 14608.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09074, pruned_loss=0.01299, audio_tagging_loss=0.009032, over 3058538.21 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:12:36,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2693920.0, ans=0.125 2023-11-24 05:12:43,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404100 2023-11-24 05:12:53,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2693986.6666666665, ans=0.1 2023-11-24 05:12:55,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2694053.3333333335, ans=0.125 2023-11-24 05:13:32,966 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7350, loss[loss=0.06518, simple_loss=0.09506, pruned_loss=0.01102, audio_tagging_loss=0.006625, over 14226.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09057, pruned_loss=0.01301, audio_tagging_loss=0.008995, over 3055372.72 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:13:33,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2694253.3333333335, ans=0.125 2023-11-24 05:13:34,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2694253.3333333335, ans=0.2 2023-11-24 05:13:43,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.435e+01 9.060e+01 9.696e+01 1.244e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 05:13:45,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404150 2023-11-24 05:13:52,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2694320.0, ans=0.125 2023-11-24 05:13:55,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2694386.6666666665, ans=0.0 2023-11-24 05:14:04,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=12.0 2023-11-24 05:14:06,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2694386.6666666665, ans=0.0 2023-11-24 05:14:15,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2694453.3333333335, ans=0.0 2023-11-24 05:14:23,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2694520.0, ans=0.125 2023-11-24 05:14:32,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-24 05:14:34,630 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7400, loss[loss=0.0575, simple_loss=0.07679, pruned_loss=0.01002, audio_tagging_loss=0.009081, over 13680.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09026, pruned_loss=0.01309, audio_tagging_loss=0.008923, over 3051434.55 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:14:37,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2694586.6666666665, ans=0.125 2023-11-24 05:14:39,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2694586.6666666665, ans=0.125 2023-11-24 05:14:47,284 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404200 2023-11-24 05:14:52,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-24 05:15:37,376 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7450, loss[loss=0.05756, simple_loss=0.08064, pruned_loss=0.01116, audio_tagging_loss=0.006081, over 14088.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09011, pruned_loss=0.01321, audio_tagging_loss=0.008894, over 3043906.46 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:15:48,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.446e+01 9.170e+01 9.700e+01 1.341e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 05:15:49,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2694986.6666666665, ans=0.1 2023-11-24 05:15:50,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404250 2023-11-24 05:15:52,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2694986.6666666665, ans=0.2 2023-11-24 05:16:05,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2695053.3333333335, ans=0.125 2023-11-24 05:16:17,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2695120.0, ans=0.125 2023-11-24 05:16:34,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2695186.6666666665, ans=0.1 2023-11-24 05:16:40,528 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7500, loss[loss=0.07505, simple_loss=0.1016, pruned_loss=0.01628, audio_tagging_loss=0.007965, over 15113.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0902, pruned_loss=0.0132, audio_tagging_loss=0.008835, over 3041797.93 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:16:40,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2695253.3333333335, ans=0.125 2023-11-24 05:16:43,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=22.5 2023-11-24 05:16:52,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404300 2023-11-24 05:16:53,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2695320.0, ans=0.125 2023-11-24 05:17:04,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2695386.6666666665, ans=0.125 2023-11-24 05:17:07,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-24 05:17:19,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2695453.3333333335, ans=0.0 2023-11-24 05:17:24,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2695453.3333333335, ans=0.1 2023-11-24 05:17:29,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2695520.0, ans=0.125 2023-11-24 05:17:32,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2695520.0, ans=0.125 2023-11-24 05:17:36,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2023-11-24 05:17:41,544 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7550, loss[loss=0.04546, simple_loss=0.05678, pruned_loss=0.006153, audio_tagging_loss=0.01092, over 14520.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09036, pruned_loss=0.01318, audio_tagging_loss=0.008804, over 3042400.07 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:17:51,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.670e+01 9.167e+01 9.885e+01 1.310e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 05:17:53,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404350 2023-11-24 05:18:10,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2695720.0, ans=0.2 2023-11-24 05:18:14,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-24 05:18:22,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2023-11-24 05:18:26,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2695786.6666666665, ans=0.125 2023-11-24 05:18:29,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2695853.3333333335, ans=0.0 2023-11-24 05:18:43,190 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7600, loss[loss=0.05748, simple_loss=0.07313, pruned_loss=0.01246, audio_tagging_loss=0.008446, over 14682.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09091, pruned_loss=0.0133, audio_tagging_loss=0.008775, over 3045181.21 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:18:49,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2695920.0, ans=0.125 2023-11-24 05:18:56,260 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404400 2023-11-24 05:19:08,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2696053.3333333335, ans=0.125 2023-11-24 05:19:37,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2696186.6666666665, ans=0.125 2023-11-24 05:19:45,979 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7650, loss[loss=0.06522, simple_loss=0.09164, pruned_loss=0.01392, audio_tagging_loss=0.005478, over 14774.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09086, pruned_loss=0.01343, audio_tagging_loss=0.008746, over 3044002.45 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:19:46,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2023-11-24 05:19:49,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-24 05:19:57,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.265e+01 8.997e+01 9.695e+01 1.208e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-24 05:19:58,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404450 2023-11-24 05:19:59,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-24 05:20:06,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2696320.0, ans=0.125 2023-11-24 05:20:23,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2023-11-24 05:20:37,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2696520.0, ans=0.125 2023-11-24 05:20:38,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2696520.0, ans=0.0 2023-11-24 05:20:48,145 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7700, loss[loss=0.05151, simple_loss=0.06564, pruned_loss=0.008997, audio_tagging_loss=0.009695, over 14446.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09138, pruned_loss=0.01343, audio_tagging_loss=0.008797, over 3046048.53 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:21:00,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404500 2023-11-24 05:21:06,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2696653.3333333335, ans=0.125 2023-11-24 05:21:21,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2696720.0, ans=0.1 2023-11-24 05:21:29,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2696786.6666666665, ans=0.2 2023-11-24 05:21:49,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2696920.0, ans=0.125 2023-11-24 05:21:50,153 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7750, loss[loss=0.07933, simple_loss=0.1048, pruned_loss=0.01478, audio_tagging_loss=0.01217, over 16012.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09139, pruned_loss=0.0135, audio_tagging_loss=0.008914, over 3050277.37 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:22:01,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.444e+01 9.119e+01 9.673e+01 1.144e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 05:22:03,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404550 2023-11-24 05:22:31,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2697120.0, ans=0.2 2023-11-24 05:22:33,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-11-24 05:22:42,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2697186.6666666665, ans=0.125 2023-11-24 05:22:45,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-24 05:22:53,780 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7800, loss[loss=0.0694, simple_loss=0.09612, pruned_loss=0.01288, audio_tagging_loss=0.008464, over 14985.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09175, pruned_loss=0.01358, audio_tagging_loss=0.008883, over 3048432.03 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:23:05,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404600 2023-11-24 05:23:07,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2697320.0, ans=0.125 2023-11-24 05:23:21,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2697386.6666666665, ans=0.1 2023-11-24 05:23:56,278 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7850, loss[loss=0.1016, simple_loss=0.141, pruned_loss=0.0232, audio_tagging_loss=0.007876, over 15708.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09086, pruned_loss=0.01341, audio_tagging_loss=0.009023, over 3042046.09 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:24:07,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.451e+01 9.106e+01 9.579e+01 1.135e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 05:24:08,843 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404650 2023-11-24 05:24:28,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2697720.0, ans=0.2 2023-11-24 05:24:29,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2697720.0, ans=0.125 2023-11-24 05:24:51,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2697853.3333333335, ans=0.0 2023-11-24 05:24:57,994 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7900, loss[loss=0.09506, simple_loss=0.1421, pruned_loss=0.01735, audio_tagging_loss=0.00665, over 15858.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09152, pruned_loss=0.01357, audio_tagging_loss=0.008994, over 3039854.19 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:25:11,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404700 2023-11-24 05:25:22,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2023-11-24 05:25:31,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 05:26:01,536 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 7950, loss[loss=0.07008, simple_loss=0.1002, pruned_loss=0.01114, audio_tagging_loss=0.008855, over 15462.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09127, pruned_loss=0.01368, audio_tagging_loss=0.009141, over 3040106.96 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:26:06,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2698253.3333333335, ans=0.125 2023-11-24 05:26:12,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.346e+01 8.927e+01 9.659e+01 1.276e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-24 05:26:13,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404750 2023-11-24 05:26:15,940 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 05:26:17,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2698320.0, ans=0.125 2023-11-24 05:26:27,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-24 05:26:32,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2698386.6666666665, ans=0.2 2023-11-24 05:26:36,923 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 05:26:38,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2698453.3333333335, ans=0.125 2023-11-24 05:26:49,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-11-24 05:26:53,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2698520.0, ans=0.0 2023-11-24 05:27:00,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2698520.0, ans=0.125 2023-11-24 05:27:03,767 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8000, loss[loss=0.08164, simple_loss=0.1083, pruned_loss=0.01919, audio_tagging_loss=0.008315, over 13928.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09133, pruned_loss=0.01372, audio_tagging_loss=0.00907, over 3034711.30 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:27:10,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-24 05:27:16,441 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404800 2023-11-24 05:27:16,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2698653.3333333335, ans=0.0 2023-11-24 05:27:33,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-24 05:27:40,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2698786.6666666665, ans=0.125 2023-11-24 05:28:06,015 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8050, loss[loss=0.05735, simple_loss=0.07461, pruned_loss=0.01189, audio_tagging_loss=0.008151, over 14142.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09084, pruned_loss=0.01369, audio_tagging_loss=0.009147, over 3037810.48 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:28:17,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.621e+01 9.093e+01 9.694e+01 1.272e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 05:28:19,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404850 2023-11-24 05:28:56,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-24 05:29:02,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2699186.6666666665, ans=0.125 2023-11-24 05:29:06,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2023-11-24 05:29:08,654 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8100, loss[loss=0.0613, simple_loss=0.08647, pruned_loss=0.01267, audio_tagging_loss=0.005392, over 16845.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09105, pruned_loss=0.01376, audio_tagging_loss=0.009114, over 3039406.51 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:29:21,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404900 2023-11-24 05:29:26,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2699320.0, ans=0.2 2023-11-24 05:29:42,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2699386.6666666665, ans=0.1 2023-11-24 05:29:42,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2699386.6666666665, ans=0.125 2023-11-24 05:29:50,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2699453.3333333335, ans=0.125 2023-11-24 05:29:58,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2699520.0, ans=0.125 2023-11-24 05:30:08,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2699520.0, ans=0.125 2023-11-24 05:30:10,896 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8150, loss[loss=0.06983, simple_loss=0.09525, pruned_loss=0.0131, audio_tagging_loss=0.009107, over 15540.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.0904, pruned_loss=0.01356, audio_tagging_loss=0.009094, over 3038937.31 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:30:22,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.549e+01 9.218e+01 9.858e+01 1.332e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 05:30:22,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 404950 2023-11-24 05:30:49,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2023-11-24 05:31:02,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2699853.3333333335, ans=0.5 2023-11-24 05:31:12,009 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8200, loss[loss=0.05057, simple_loss=0.07413, pruned_loss=0.006787, audio_tagging_loss=0.006717, over 14187.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09098, pruned_loss=0.01356, audio_tagging_loss=0.008945, over 3042816.44 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:31:12,059 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 05:31:21,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2699920.0, ans=0.125 2023-11-24 05:31:24,920 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405000 2023-11-24 05:31:28,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2699986.6666666665, ans=0.05 2023-11-24 05:31:29,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-11-24 05:31:35,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-24 05:31:39,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2700053.3333333335, ans=0.125 2023-11-24 05:31:45,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2700053.3333333335, ans=0.125 2023-11-24 05:31:46,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2700053.3333333335, ans=0.1 2023-11-24 05:32:08,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2700186.6666666665, ans=0.2 2023-11-24 05:32:09,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2700186.6666666665, ans=0.0 2023-11-24 05:32:10,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2700186.6666666665, ans=0.0 2023-11-24 05:32:14,746 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8250, loss[loss=0.08757, simple_loss=0.1128, pruned_loss=0.02046, audio_tagging_loss=0.01069, over 15761.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09155, pruned_loss=0.01376, audio_tagging_loss=0.008892, over 3049514.67 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:32:15,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2700253.3333333335, ans=0.125 2023-11-24 05:32:15,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2700253.3333333335, ans=0.1 2023-11-24 05:32:27,219 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405050 2023-11-24 05:32:27,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2023-11-24 05:32:28,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.524e+01 9.156e+01 9.753e+01 1.627e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 05:32:33,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2700320.0, ans=0.125 2023-11-24 05:32:33,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-24 05:32:57,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2700453.3333333335, ans=0.125 2023-11-24 05:33:11,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2023-11-24 05:33:16,923 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8300, loss[loss=0.04756, simple_loss=0.05072, pruned_loss=0.006046, audio_tagging_loss=0.01615, over 16116.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09149, pruned_loss=0.01367, audio_tagging_loss=0.008896, over 3053470.79 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:33:20,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2700586.6666666665, ans=0.125 2023-11-24 05:33:23,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-24 05:33:26,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2700586.6666666665, ans=0.05 2023-11-24 05:33:28,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405100 2023-11-24 05:33:37,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2700653.3333333335, ans=0.1 2023-11-24 05:33:42,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-24 05:33:52,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2023-11-24 05:33:54,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2700786.6666666665, ans=0.125 2023-11-24 05:34:03,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2700786.6666666665, ans=0.125 2023-11-24 05:34:18,266 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8350, loss[loss=0.06469, simple_loss=0.08, pruned_loss=0.01422, audio_tagging_loss=0.01047, over 14296.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09056, pruned_loss=0.01333, audio_tagging_loss=0.008896, over 3046522.20 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 8.0 2023-11-24 05:34:22,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2700920.0, ans=0.125 2023-11-24 05:34:26,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2700920.0, ans=0.125 2023-11-24 05:34:30,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405150 2023-11-24 05:34:31,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.407e+01 9.166e+01 9.742e+01 1.258e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 05:34:49,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2701053.3333333335, ans=0.125 2023-11-24 05:35:06,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2701186.6666666665, ans=0.125 2023-11-24 05:35:18,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2701253.3333333335, ans=0.125 2023-11-24 05:35:19,461 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8400, loss[loss=0.059, simple_loss=0.07307, pruned_loss=0.01084, audio_tagging_loss=0.01163, over 17255.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08909, pruned_loss=0.01297, audio_tagging_loss=0.009016, over 3045917.48 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:35:32,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405200 2023-11-24 05:35:40,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-24 05:35:45,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2701386.6666666665, ans=0.125 2023-11-24 05:35:53,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2701386.6666666665, ans=0.0 2023-11-24 05:36:21,623 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8450, loss[loss=0.06294, simple_loss=0.08538, pruned_loss=0.0137, audio_tagging_loss=0.006549, over 15397.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08921, pruned_loss=0.01302, audio_tagging_loss=0.009043, over 3043907.62 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:36:24,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2701586.6666666665, ans=0.0 2023-11-24 05:36:32,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2701653.3333333335, ans=0.125 2023-11-24 05:36:33,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405250 2023-11-24 05:36:34,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.098e+01 8.659e+01 9.489e+01 1.219e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-24 05:36:34,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2701653.3333333335, ans=0.125 2023-11-24 05:36:36,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2701653.3333333335, ans=0.0 2023-11-24 05:37:10,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2701853.3333333335, ans=0.0 2023-11-24 05:37:15,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2701853.3333333335, ans=0.0 2023-11-24 05:37:23,299 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8500, loss[loss=0.07888, simple_loss=0.1044, pruned_loss=0.01948, audio_tagging_loss=0.007181, over 15306.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0901, pruned_loss=0.01311, audio_tagging_loss=0.009059, over 3044206.60 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:37:25,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2701920.0, ans=0.0 2023-11-24 05:37:35,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405300 2023-11-24 05:37:38,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2701986.6666666665, ans=0.04949747468305833 2023-11-24 05:38:07,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2702120.0, ans=0.125 2023-11-24 05:38:09,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2702120.0, ans=0.125 2023-11-24 05:38:14,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2702186.6666666665, ans=0.125 2023-11-24 05:38:24,811 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8550, loss[loss=0.07211, simple_loss=0.1051, pruned_loss=0.01257, audio_tagging_loss=0.007004, over 14922.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09076, pruned_loss=0.01321, audio_tagging_loss=0.00909, over 3039676.95 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:38:25,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2023-11-24 05:38:39,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405350 2023-11-24 05:38:40,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.730e+01 8.556e+01 9.186e+01 9.838e+01 1.237e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-24 05:39:06,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=22.5 2023-11-24 05:39:07,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2702453.3333333335, ans=0.125 2023-11-24 05:39:23,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2702520.0, ans=0.125 2023-11-24 05:39:25,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2702520.0, ans=0.5 2023-11-24 05:39:29,201 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8600, loss[loss=0.05895, simple_loss=0.07146, pruned_loss=0.01209, audio_tagging_loss=0.01113, over 14937.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.08949, pruned_loss=0.01292, audio_tagging_loss=0.009195, over 3040225.69 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:39:36,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2702586.6666666665, ans=0.125 2023-11-24 05:39:41,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405400 2023-11-24 05:39:53,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2702720.0, ans=0.0 2023-11-24 05:40:06,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-24 05:40:31,243 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8650, loss[loss=0.07848, simple_loss=0.1052, pruned_loss=0.01516, audio_tagging_loss=0.01073, over 15910.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08875, pruned_loss=0.0127, audio_tagging_loss=0.009263, over 3049065.91 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:40:35,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2702920.0, ans=0.2 2023-11-24 05:40:38,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2702920.0, ans=0.125 2023-11-24 05:40:39,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 05:40:40,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2023-11-24 05:40:43,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405450 2023-11-24 05:40:44,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.340e+01 8.964e+01 9.438e+01 1.251e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 05:41:00,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2703053.3333333335, ans=0.125 2023-11-24 05:41:03,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2703053.3333333335, ans=0.04949747468305833 2023-11-24 05:41:26,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2703186.6666666665, ans=0.1 2023-11-24 05:41:33,406 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8700, loss[loss=0.07945, simple_loss=0.1146, pruned_loss=0.0168, audio_tagging_loss=0.00537, over 14897.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09016, pruned_loss=0.013, audio_tagging_loss=0.009245, over 3055226.95 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:41:46,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405500 2023-11-24 05:41:48,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2703320.0, ans=0.125 2023-11-24 05:41:56,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2703320.0, ans=0.1 2023-11-24 05:42:33,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2703520.0, ans=0.125 2023-11-24 05:42:37,528 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8750, loss[loss=0.06548, simple_loss=0.0807, pruned_loss=0.01784, audio_tagging_loss=0.007289, over 13369.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09113, pruned_loss=0.0132, audio_tagging_loss=0.009199, over 3049401.53 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:42:42,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2703586.6666666665, ans=0.125 2023-11-24 05:42:50,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405550 2023-11-24 05:42:51,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.699e+01 9.632e+01 1.055e+02 1.285e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-24 05:43:03,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2703720.0, ans=0.0 2023-11-24 05:43:06,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2703720.0, ans=0.0 2023-11-24 05:43:17,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2703786.6666666665, ans=0.035 2023-11-24 05:43:39,329 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8800, loss[loss=0.07066, simple_loss=0.09774, pruned_loss=0.01429, audio_tagging_loss=0.007498, over 15311.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09161, pruned_loss=0.01322, audio_tagging_loss=0.009269, over 3049453.01 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:43:40,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-24 05:43:48,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2703920.0, ans=0.0 2023-11-24 05:43:51,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405600 2023-11-24 05:43:57,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2703986.6666666665, ans=0.125 2023-11-24 05:44:15,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2704053.3333333335, ans=0.125 2023-11-24 05:44:35,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2704186.6666666665, ans=0.0 2023-11-24 05:44:37,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2704186.6666666665, ans=0.125 2023-11-24 05:44:40,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2704253.3333333335, ans=0.025 2023-11-24 05:44:41,159 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8850, loss[loss=0.05941, simple_loss=0.07876, pruned_loss=0.01011, audio_tagging_loss=0.009919, over 15633.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09196, pruned_loss=0.01322, audio_tagging_loss=0.009272, over 3050753.55 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:44:53,063 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 05:44:54,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405650 2023-11-24 05:44:55,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.541e+01 8.982e+01 9.482e+01 1.517e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-24 05:45:43,999 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8900, loss[loss=0.06564, simple_loss=0.08234, pruned_loss=0.01424, audio_tagging_loss=0.01023, over 14935.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09245, pruned_loss=0.01335, audio_tagging_loss=0.009164, over 3046412.69 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:45:51,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2704586.6666666665, ans=15.0 2023-11-24 05:45:56,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405700 2023-11-24 05:46:05,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2704653.3333333335, ans=0.09899494936611666 2023-11-24 05:46:18,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-11-24 05:46:43,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2704853.3333333335, ans=0.125 2023-11-24 05:46:45,641 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 8950, loss[loss=0.08839, simple_loss=0.1219, pruned_loss=0.01861, audio_tagging_loss=0.008812, over 14680.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09287, pruned_loss=0.01338, audio_tagging_loss=0.009021, over 3043828.19 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:46:48,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2704920.0, ans=0.125 2023-11-24 05:46:58,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405750 2023-11-24 05:46:59,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.112e+01 8.729e+01 9.252e+01 9.908e+01 1.235e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 05:47:07,765 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 05:47:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2705053.3333333335, ans=0.125 2023-11-24 05:47:14,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2705053.3333333335, ans=0.025 2023-11-24 05:47:29,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2705120.0, ans=0.05 2023-11-24 05:47:47,496 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9000, loss[loss=0.08333, simple_loss=0.1178, pruned_loss=0.01642, audio_tagging_loss=0.008036, over 15580.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09283, pruned_loss=0.01346, audio_tagging_loss=0.008825, over 3042371.30 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:47:47,497 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 05:48:30,289 INFO [train_asr.py:1253] (1/4) Epoch 34, validation: loss=0.05898, simple_loss=0.05084, pruned_loss=0.005097, audio_tagging_loss=0.02846, over 4681554.00 frames. 2023-11-24 05:48:30,290 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 05:48:42,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405800 2023-11-24 05:48:43,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2705320.0, ans=0.0 2023-11-24 05:48:59,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2705386.6666666665, ans=0.125 2023-11-24 05:49:04,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2705386.6666666665, ans=0.2 2023-11-24 05:49:04,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2705386.6666666665, ans=0.125 2023-11-24 05:49:05,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2705453.3333333335, ans=0.2 2023-11-24 05:49:12,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2705453.3333333335, ans=0.125 2023-11-24 05:49:12,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2705453.3333333335, ans=0.0 2023-11-24 05:49:13,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-24 05:49:27,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2705520.0, ans=0.05 2023-11-24 05:49:32,027 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9050, loss[loss=0.06557, simple_loss=0.08575, pruned_loss=0.01246, audio_tagging_loss=0.01024, over 15746.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09275, pruned_loss=0.01344, audio_tagging_loss=0.008745, over 3056404.74 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:49:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2705586.6666666665, ans=0.07 2023-11-24 05:49:43,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2705653.3333333335, ans=0.125 2023-11-24 05:49:44,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405850 2023-11-24 05:49:46,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.606e+01 9.162e+01 9.789e+01 1.610e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 05:49:52,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-11-24 05:49:54,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2705653.3333333335, ans=0.0 2023-11-24 05:50:34,563 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9100, loss[loss=0.07472, simple_loss=0.1051, pruned_loss=0.01714, audio_tagging_loss=0.005036, over 15326.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09295, pruned_loss=0.01348, audio_tagging_loss=0.008676, over 3059484.49 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:50:43,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2705920.0, ans=0.04949747468305833 2023-11-24 05:50:47,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405900 2023-11-24 05:51:00,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2706053.3333333335, ans=0.2 2023-11-24 05:51:08,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2706053.3333333335, ans=0.0 2023-11-24 05:51:13,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2706120.0, ans=0.125 2023-11-24 05:51:22,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=12.0 2023-11-24 05:51:32,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2706186.6666666665, ans=0.1 2023-11-24 05:51:37,129 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9150, loss[loss=0.05008, simple_loss=0.06371, pruned_loss=0.006919, audio_tagging_loss=0.01131, over 14311.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09333, pruned_loss=0.01357, audio_tagging_loss=0.008643, over 3057513.81 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 05:51:46,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2706253.3333333335, ans=0.2 2023-11-24 05:51:49,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 405950 2023-11-24 05:51:51,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.866e+01 9.361e+01 1.025e+02 1.380e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-24 05:52:02,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2706386.6666666665, ans=6.0 2023-11-24 05:52:06,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2706386.6666666665, ans=0.2 2023-11-24 05:52:23,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-11-24 05:52:39,408 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9200, loss[loss=0.0899, simple_loss=0.1248, pruned_loss=0.01981, audio_tagging_loss=0.007703, over 15706.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09313, pruned_loss=0.01357, audio_tagging_loss=0.008662, over 3053855.27 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:52:43,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2706586.6666666665, ans=0.0 2023-11-24 05:52:51,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406000 2023-11-24 05:52:52,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2706653.3333333335, ans=0.125 2023-11-24 05:53:32,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2706853.3333333335, ans=0.125 2023-11-24 05:53:41,420 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9250, loss[loss=0.0722, simple_loss=0.1022, pruned_loss=0.01549, audio_tagging_loss=0.0056, over 15270.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09259, pruned_loss=0.01352, audio_tagging_loss=0.008797, over 3061569.58 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:53:51,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-24 05:53:54,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406050 2023-11-24 05:53:57,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.286e+01 8.929e+01 9.652e+01 1.621e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 05:53:59,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2706986.6666666665, ans=0.0 2023-11-24 05:54:00,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2706986.6666666665, ans=0.125 2023-11-24 05:54:16,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2707053.3333333335, ans=0.1 2023-11-24 05:54:36,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2707186.6666666665, ans=0.1 2023-11-24 05:54:45,412 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9300, loss[loss=0.05539, simple_loss=0.07974, pruned_loss=0.006943, audio_tagging_loss=0.008575, over 15115.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09118, pruned_loss=0.01337, audio_tagging_loss=0.008865, over 3058369.60 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:54:47,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2707253.3333333335, ans=0.1 2023-11-24 05:54:57,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406100 2023-11-24 05:55:06,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707320.0, ans=0.1 2023-11-24 05:55:34,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707520.0, ans=0.1 2023-11-24 05:55:46,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2707586.6666666665, ans=0.1 2023-11-24 05:55:47,295 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9350, loss[loss=0.06134, simple_loss=0.07833, pruned_loss=0.0111, audio_tagging_loss=0.01107, over 14809.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09121, pruned_loss=0.01329, audio_tagging_loss=0.008887, over 3050192.62 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:55:55,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2707586.6666666665, ans=0.0 2023-11-24 05:55:59,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406150 2023-11-24 05:56:01,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.542e+01 9.315e+01 1.015e+02 1.329e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 05:56:08,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2707653.3333333335, ans=0.0 2023-11-24 05:56:12,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2707720.0, ans=6.0 2023-11-24 05:56:13,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2707720.0, ans=0.0 2023-11-24 05:56:17,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2707720.0, ans=0.0 2023-11-24 05:56:32,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2707786.6666666665, ans=0.0 2023-11-24 05:56:48,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2707920.0, ans=0.5 2023-11-24 05:56:49,155 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9400, loss[loss=0.06909, simple_loss=0.09566, pruned_loss=0.01284, audio_tagging_loss=0.008418, over 14590.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09157, pruned_loss=0.01339, audio_tagging_loss=0.008951, over 3047089.25 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:56:51,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2707920.0, ans=0.125 2023-11-24 05:56:55,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-24 05:57:01,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2707986.6666666665, ans=0.125 2023-11-24 05:57:02,299 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406200 2023-11-24 05:57:09,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=22.5 2023-11-24 05:57:27,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2708120.0, ans=0.0 2023-11-24 05:57:28,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2708120.0, ans=0.125 2023-11-24 05:57:50,054 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 05:57:51,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-24 05:57:52,353 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9450, loss[loss=0.08251, simple_loss=0.1062, pruned_loss=0.01674, audio_tagging_loss=0.01266, over 16709.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09136, pruned_loss=0.01333, audio_tagging_loss=0.009016, over 3047573.30 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:58:00,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2708253.3333333335, ans=0.125 2023-11-24 05:58:05,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406250 2023-11-24 05:58:07,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.368e+01 8.774e+01 9.342e+01 1.287e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-24 05:58:14,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2708320.0, ans=0.2 2023-11-24 05:58:19,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2708386.6666666665, ans=0.125 2023-11-24 05:58:36,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2708453.3333333335, ans=0.04949747468305833 2023-11-24 05:58:53,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2708520.0, ans=0.125 2023-11-24 05:58:55,169 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9500, loss[loss=0.06938, simple_loss=0.09193, pruned_loss=0.01427, audio_tagging_loss=0.009147, over 15419.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09124, pruned_loss=0.0133, audio_tagging_loss=0.009127, over 3045196.42 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 05:59:05,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2708586.6666666665, ans=0.125 2023-11-24 05:59:07,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406300 2023-11-24 05:59:17,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2708653.3333333335, ans=0.2 2023-11-24 05:59:34,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2708786.6666666665, ans=0.1 2023-11-24 05:59:34,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2708786.6666666665, ans=0.2 2023-11-24 05:59:40,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2023-11-24 05:59:51,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2708853.3333333335, ans=0.1 2023-11-24 05:59:56,983 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9550, loss[loss=0.05308, simple_loss=0.07036, pruned_loss=0.01055, audio_tagging_loss=0.007351, over 14602.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09109, pruned_loss=0.01324, audio_tagging_loss=0.009291, over 3041947.23 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 16.0 2023-11-24 06:00:02,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2708920.0, ans=0.95 2023-11-24 06:00:03,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2708920.0, ans=0.125 2023-11-24 06:00:09,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406350 2023-11-24 06:00:13,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.403e+01 9.161e+01 9.740e+01 1.180e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 06:00:15,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2708986.6666666665, ans=0.125 2023-11-24 06:00:27,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2709053.3333333335, ans=0.125 2023-11-24 06:00:52,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-24 06:00:59,379 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9600, loss[loss=0.06575, simple_loss=0.0857, pruned_loss=0.01555, audio_tagging_loss=0.007351, over 14579.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09088, pruned_loss=0.01304, audio_tagging_loss=0.009212, over 3034320.66 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 06:00:59,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2709253.3333333335, ans=0.125 2023-11-24 06:01:02,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2709253.3333333335, ans=0.125 2023-11-24 06:01:08,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2709253.3333333335, ans=0.0 2023-11-24 06:01:09,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2709253.3333333335, ans=0.2 2023-11-24 06:01:13,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406400 2023-11-24 06:01:18,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2709320.0, ans=0.125 2023-11-24 06:01:31,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2709386.6666666665, ans=0.0 2023-11-24 06:01:36,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2709453.3333333335, ans=0.125 2023-11-24 06:01:48,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2709520.0, ans=0.1 2023-11-24 06:01:58,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2709520.0, ans=0.125 2023-11-24 06:02:01,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-24 06:02:03,398 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9650, loss[loss=0.05511, simple_loss=0.07738, pruned_loss=0.008149, audio_tagging_loss=0.008267, over 13228.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.0905, pruned_loss=0.01305, audio_tagging_loss=0.0092, over 3033689.05 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 32.0 2023-11-24 06:02:15,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406450 2023-11-24 06:02:16,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2709653.3333333335, ans=0.125 2023-11-24 06:02:18,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.449e+01 8.956e+01 9.585e+01 1.411e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-24 06:02:25,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-11-24 06:02:30,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2709720.0, ans=0.0 2023-11-24 06:02:36,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2709720.0, ans=0.125 2023-11-24 06:02:37,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:02:42,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-24 06:02:46,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2709786.6666666665, ans=0.2 2023-11-24 06:03:04,828 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9700, loss[loss=0.05537, simple_loss=0.0717, pruned_loss=0.009246, audio_tagging_loss=0.01028, over 16474.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.08991, pruned_loss=0.01305, audio_tagging_loss=0.009119, over 3037566.68 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:03:14,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=12.0 2023-11-24 06:03:16,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406500 2023-11-24 06:03:23,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2709986.6666666665, ans=0.125 2023-11-24 06:03:44,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=22.5 2023-11-24 06:03:48,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2710120.0, ans=0.125 2023-11-24 06:04:06,189 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9750, loss[loss=0.0698, simple_loss=0.09955, pruned_loss=0.01518, audio_tagging_loss=0.004844, over 15103.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09102, pruned_loss=0.01323, audio_tagging_loss=0.008962, over 3034344.73 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:04:19,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406550 2023-11-24 06:04:21,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2710320.0, ans=0.125 2023-11-24 06:04:24,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.565e+01 9.211e+01 9.883e+01 1.176e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 06:04:37,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2710386.6666666665, ans=0.1 2023-11-24 06:04:44,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-24 06:04:46,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:04:52,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2710453.3333333335, ans=0.0 2023-11-24 06:05:09,508 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9800, loss[loss=0.06268, simple_loss=0.08665, pruned_loss=0.01245, audio_tagging_loss=0.006914, over 15767.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09026, pruned_loss=0.01312, audio_tagging_loss=0.008952, over 3028038.05 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:05:10,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2710586.6666666665, ans=0.125 2023-11-24 06:05:16,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2710586.6666666665, ans=0.0 2023-11-24 06:05:22,218 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406600 2023-11-24 06:05:24,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2710653.3333333335, ans=0.125 2023-11-24 06:05:33,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2710720.0, ans=0.125 2023-11-24 06:05:37,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-24 06:05:41,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-24 06:05:46,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2710786.6666666665, ans=0.0 2023-11-24 06:06:01,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2023-11-24 06:06:04,965 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:06:09,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2710853.3333333335, ans=0.125 2023-11-24 06:06:12,006 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9850, loss[loss=0.0505, simple_loss=0.06779, pruned_loss=0.007378, audio_tagging_loss=0.009222, over 15281.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.0904, pruned_loss=0.01298, audio_tagging_loss=0.008903, over 3030742.41 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:06:15,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2710920.0, ans=0.125 2023-11-24 06:06:15,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2710920.0, ans=0.125 2023-11-24 06:06:23,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406650 2023-11-24 06:06:28,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.541e+01 9.131e+01 9.880e+01 1.241e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 06:06:31,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2710986.6666666665, ans=0.125 2023-11-24 06:06:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2711053.3333333335, ans=0.09899494936611666 2023-11-24 06:07:01,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2711186.6666666665, ans=0.125 2023-11-24 06:07:04,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=8.0 2023-11-24 06:07:05,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2711186.6666666665, ans=0.0 2023-11-24 06:07:07,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2711186.6666666665, ans=0.125 2023-11-24 06:07:13,492 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9900, loss[loss=0.05317, simple_loss=0.06376, pruned_loss=0.008731, audio_tagging_loss=0.01256, over 14328.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09094, pruned_loss=0.01291, audio_tagging_loss=0.008795, over 3029781.47 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:07:16,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2711253.3333333335, ans=0.125 2023-11-24 06:07:22,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2711253.3333333335, ans=0.125 2023-11-24 06:07:26,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2711320.0, ans=0.0 2023-11-24 06:07:27,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406700 2023-11-24 06:08:00,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2711453.3333333335, ans=0.09899494936611666 2023-11-24 06:08:16,817 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 9950, loss[loss=0.06092, simple_loss=0.08307, pruned_loss=0.01254, audio_tagging_loss=0.006843, over 16734.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09032, pruned_loss=0.01293, audio_tagging_loss=0.008782, over 3038855.41 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:08:21,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-24 06:08:29,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406750 2023-11-24 06:08:31,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-24 06:08:34,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.279e+01 9.180e+01 9.821e+01 1.550e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 06:08:35,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2711653.3333333335, ans=0.0 2023-11-24 06:08:56,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2711786.6666666665, ans=0.125 2023-11-24 06:09:06,483 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:09:17,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2711920.0, ans=0.125 2023-11-24 06:09:18,687 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10000, loss[loss=0.07976, simple_loss=0.1048, pruned_loss=0.01851, audio_tagging_loss=0.008856, over 15150.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08986, pruned_loss=0.01291, audio_tagging_loss=0.008807, over 3040055.85 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:09:30,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406800 2023-11-24 06:10:10,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2712186.6666666665, ans=0.0 2023-11-24 06:10:11,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2712186.6666666665, ans=0.125 2023-11-24 06:10:17,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2712186.6666666665, ans=0.125 2023-11-24 06:10:17,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-11-24 06:10:20,486 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10050, loss[loss=0.0731, simple_loss=0.1098, pruned_loss=0.01155, audio_tagging_loss=0.006664, over 14069.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08976, pruned_loss=0.01301, audio_tagging_loss=0.008781, over 3036980.17 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:10:33,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406850 2023-11-24 06:10:35,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2712320.0, ans=0.0 2023-11-24 06:10:38,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2712320.0, ans=0.0 2023-11-24 06:10:39,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.347e+01 9.098e+01 9.653e+01 1.255e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 06:10:58,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-24 06:11:13,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2712520.0, ans=0.125 2023-11-24 06:11:13,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2712520.0, ans=0.125 2023-11-24 06:11:22,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2712586.6666666665, ans=0.125 2023-11-24 06:11:23,346 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10100, loss[loss=0.05756, simple_loss=0.07366, pruned_loss=0.01238, audio_tagging_loss=0.008345, over 15233.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09034, pruned_loss=0.013, audio_tagging_loss=0.008864, over 3043196.30 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:11:24,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2712586.6666666665, ans=0.125 2023-11-24 06:11:31,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2712586.6666666665, ans=0.125 2023-11-24 06:11:35,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406900 2023-11-24 06:11:39,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2712653.3333333335, ans=0.125 2023-11-24 06:11:45,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2712653.3333333335, ans=0.0 2023-11-24 06:11:45,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2712653.3333333335, ans=0.025 2023-11-24 06:11:53,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2712720.0, ans=0.09899494936611666 2023-11-24 06:12:09,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=12.0 2023-11-24 06:12:12,369 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:12:24,233 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10150, loss[loss=0.08086, simple_loss=0.1182, pruned_loss=0.01576, audio_tagging_loss=0.005993, over 15935.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09157, pruned_loss=0.01319, audio_tagging_loss=0.008899, over 3043927.03 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:12:28,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-24 06:12:36,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 406950 2023-11-24 06:12:38,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2712986.6666666665, ans=0.125 2023-11-24 06:12:42,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.502e+01 9.089e+01 9.719e+01 1.256e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-24 06:12:47,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2713053.3333333335, ans=0.1 2023-11-24 06:12:53,105 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:12:53,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2713053.3333333335, ans=0.0 2023-11-24 06:12:59,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2713053.3333333335, ans=0.1 2023-11-24 06:13:00,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2023-11-24 06:13:06,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2713120.0, ans=0.1 2023-11-24 06:13:17,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2713186.6666666665, ans=0.2 2023-11-24 06:13:26,442 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10200, loss[loss=0.06384, simple_loss=0.08931, pruned_loss=0.01135, audio_tagging_loss=0.007835, over 15189.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.091, pruned_loss=0.01327, audio_tagging_loss=0.009016, over 3047184.31 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:13:31,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2713253.3333333335, ans=0.125 2023-11-24 06:13:39,730 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407000 2023-11-24 06:13:51,002 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:13:58,495 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:14:04,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-24 06:14:10,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2713453.3333333335, ans=0.125 2023-11-24 06:14:10,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2713453.3333333335, ans=0.125 2023-11-24 06:14:21,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2713520.0, ans=0.125 2023-11-24 06:14:29,223 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10250, loss[loss=0.05742, simple_loss=0.07285, pruned_loss=0.01112, audio_tagging_loss=0.009876, over 14985.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.0911, pruned_loss=0.01336, audio_tagging_loss=0.009119, over 3050855.26 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:14:41,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407050 2023-11-24 06:14:47,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.707e+01 9.386e+01 1.043e+02 1.340e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-24 06:15:09,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2713786.6666666665, ans=0.0 2023-11-24 06:15:23,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2713853.3333333335, ans=0.0 2023-11-24 06:15:24,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2713853.3333333335, ans=0.5 2023-11-24 06:15:28,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2713853.3333333335, ans=0.0 2023-11-24 06:15:30,884 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10300, loss[loss=0.06556, simple_loss=0.08865, pruned_loss=0.01287, audio_tagging_loss=0.008359, over 16471.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.08963, pruned_loss=0.0131, audio_tagging_loss=0.009189, over 3054904.82 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:15:42,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407100 2023-11-24 06:15:42,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2713986.6666666665, ans=0.2 2023-11-24 06:15:50,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2713986.6666666665, ans=0.0 2023-11-24 06:16:02,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2714053.3333333335, ans=0.1 2023-11-24 06:16:02,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-24 06:16:22,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2714186.6666666665, ans=0.1 2023-11-24 06:16:23,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2714186.6666666665, ans=0.1 2023-11-24 06:16:26,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2714186.6666666665, ans=0.125 2023-11-24 06:16:32,181 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10350, loss[loss=0.06639, simple_loss=0.08086, pruned_loss=0.01494, audio_tagging_loss=0.01102, over 15097.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.08977, pruned_loss=0.01301, audio_tagging_loss=0.009247, over 3044537.31 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:16:32,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2714253.3333333335, ans=0.125 2023-11-24 06:16:45,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407150 2023-11-24 06:16:52,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.459e+01 9.063e+01 9.492e+01 1.258e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-24 06:17:26,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2714520.0, ans=0.125 2023-11-24 06:17:31,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=2714520.0, ans=12.0 2023-11-24 06:17:35,049 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10400, loss[loss=0.05474, simple_loss=0.07578, pruned_loss=0.00875, audio_tagging_loss=0.008099, over 15494.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.08974, pruned_loss=0.01305, audio_tagging_loss=0.009279, over 3037343.30 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:17:47,882 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407200 2023-11-24 06:17:50,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2714653.3333333335, ans=0.1 2023-11-24 06:17:56,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2714653.3333333335, ans=0.0 2023-11-24 06:17:57,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-24 06:18:15,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2714786.6666666665, ans=0.0 2023-11-24 06:18:21,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2714786.6666666665, ans=0.025 2023-11-24 06:18:38,037 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10450, loss[loss=0.05387, simple_loss=0.06656, pruned_loss=0.01083, audio_tagging_loss=0.009756, over 14154.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.08979, pruned_loss=0.01299, audio_tagging_loss=0.009248, over 3041928.43 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:18:49,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407250 2023-11-24 06:18:53,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2714986.6666666665, ans=0.0 2023-11-24 06:18:55,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.473e+01 9.221e+01 1.007e+02 1.315e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 06:19:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2715053.3333333335, ans=0.035 2023-11-24 06:19:11,807 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:19:13,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=22.5 2023-11-24 06:19:28,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2715186.6666666665, ans=0.0 2023-11-24 06:19:38,973 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10500, loss[loss=0.06868, simple_loss=0.08945, pruned_loss=0.01505, audio_tagging_loss=0.0089, over 15194.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09016, pruned_loss=0.0131, audio_tagging_loss=0.009053, over 3049843.27 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:19:51,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407300 2023-11-24 06:19:53,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=22.5 2023-11-24 06:20:41,118 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10550, loss[loss=0.05703, simple_loss=0.07578, pruned_loss=0.009257, audio_tagging_loss=0.009885, over 15581.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09055, pruned_loss=0.01305, audio_tagging_loss=0.0089, over 3047451.73 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:20:54,300 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407350 2023-11-24 06:20:54,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2715653.3333333335, ans=0.2 2023-11-24 06:21:01,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.501e+01 9.231e+01 9.962e+01 1.175e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 06:21:05,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-24 06:21:08,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2715720.0, ans=0.125 2023-11-24 06:21:32,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2023-11-24 06:21:43,197 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10600, loss[loss=0.07233, simple_loss=0.09585, pruned_loss=0.017, audio_tagging_loss=0.007401, over 14873.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09075, pruned_loss=0.01306, audio_tagging_loss=0.008751, over 3047708.63 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:21:47,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2023-11-24 06:21:47,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=12.0 2023-11-24 06:21:55,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407400 2023-11-24 06:22:02,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2715986.6666666665, ans=0.0 2023-11-24 06:22:08,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2716053.3333333335, ans=0.125 2023-11-24 06:22:08,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2716053.3333333335, ans=0.0 2023-11-24 06:22:12,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-11-24 06:22:24,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2716120.0, ans=0.125 2023-11-24 06:22:30,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2716120.0, ans=0.0 2023-11-24 06:22:45,104 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10650, loss[loss=0.08425, simple_loss=0.1212, pruned_loss=0.01457, audio_tagging_loss=0.009058, over 15698.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09068, pruned_loss=0.01302, audio_tagging_loss=0.008817, over 3052854.36 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:22:55,824 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:22:56,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407450 2023-11-24 06:22:57,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2023-11-24 06:23:02,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-24 06:23:04,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.300e+01 8.906e+01 9.684e+01 1.178e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-24 06:23:12,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2716386.6666666665, ans=0.125 2023-11-24 06:23:26,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=12.0 2023-11-24 06:23:36,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2716520.0, ans=0.125 2023-11-24 06:23:42,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-24 06:23:46,562 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10700, loss[loss=0.06461, simple_loss=0.08387, pruned_loss=0.01466, audio_tagging_loss=0.008012, over 14880.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09082, pruned_loss=0.01301, audio_tagging_loss=0.008705, over 3046782.03 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:23:50,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2716586.6666666665, ans=0.0 2023-11-24 06:24:00,609 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407500 2023-11-24 06:24:16,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-24 06:24:23,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2716786.6666666665, ans=0.0 2023-11-24 06:24:30,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2716786.6666666665, ans=0.125 2023-11-24 06:24:31,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2716786.6666666665, ans=0.025 2023-11-24 06:24:41,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2716853.3333333335, ans=0.125 2023-11-24 06:24:49,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2716920.0, ans=0.125 2023-11-24 06:24:50,036 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10750, loss[loss=0.05712, simple_loss=0.06401, pruned_loss=0.01393, audio_tagging_loss=0.01118, over 14644.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09126, pruned_loss=0.01314, audio_tagging_loss=0.008701, over 3045773.55 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:24:55,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2716920.0, ans=0.0 2023-11-24 06:25:02,081 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407550 2023-11-24 06:25:02,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2716986.6666666665, ans=0.5 2023-11-24 06:25:05,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2716986.6666666665, ans=0.125 2023-11-24 06:25:08,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2716986.6666666665, ans=0.125 2023-11-24 06:25:09,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.785e+01 8.501e+01 8.981e+01 9.807e+01 1.307e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-24 06:25:16,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2717053.3333333335, ans=0.1 2023-11-24 06:25:16,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2023-11-24 06:25:20,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2717053.3333333335, ans=0.125 2023-11-24 06:25:28,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2717120.0, ans=0.025 2023-11-24 06:25:42,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2717186.6666666665, ans=0.2 2023-11-24 06:25:46,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2717186.6666666665, ans=0.2 2023-11-24 06:25:51,852 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10800, loss[loss=0.07576, simple_loss=0.09938, pruned_loss=0.01903, audio_tagging_loss=0.007035, over 15167.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09123, pruned_loss=0.01314, audio_tagging_loss=0.008765, over 3045792.44 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:26:04,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407600 2023-11-24 06:26:21,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2717386.6666666665, ans=0.0 2023-11-24 06:26:21,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2717386.6666666665, ans=0.05 2023-11-24 06:26:22,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-11-24 06:26:31,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2717453.3333333335, ans=0.035 2023-11-24 06:26:54,162 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10850, loss[loss=0.0773, simple_loss=0.1073, pruned_loss=0.01709, audio_tagging_loss=0.006587, over 14679.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09067, pruned_loss=0.01294, audio_tagging_loss=0.008832, over 3052040.66 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:27:07,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407650 2023-11-24 06:27:09,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2717653.3333333335, ans=0.1 2023-11-24 06:27:13,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2717653.3333333335, ans=0.125 2023-11-24 06:27:15,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.695e+01 9.228e+01 1.009e+02 1.373e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 06:27:18,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-24 06:27:24,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2717720.0, ans=0.0 2023-11-24 06:27:45,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2717853.3333333335, ans=0.0 2023-11-24 06:27:46,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-24 06:27:54,166 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:27:57,726 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10900, loss[loss=0.07058, simple_loss=0.1072, pruned_loss=0.009232, audio_tagging_loss=0.007751, over 15623.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09137, pruned_loss=0.01296, audio_tagging_loss=0.008792, over 3054291.36 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:28:00,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2717920.0, ans=0.1 2023-11-24 06:28:03,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2717920.0, ans=0.0 2023-11-24 06:28:06,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2717920.0, ans=0.125 2023-11-24 06:28:10,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407700 2023-11-24 06:28:34,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2718120.0, ans=0.5 2023-11-24 06:28:51,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2718186.6666666665, ans=0.125 2023-11-24 06:28:59,460 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 10950, loss[loss=0.08685, simple_loss=0.1158, pruned_loss=0.0199, audio_tagging_loss=0.009057, over 15456.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09215, pruned_loss=0.01318, audio_tagging_loss=0.008832, over 3057296.28 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:29:03,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2718253.3333333335, ans=0.125 2023-11-24 06:29:11,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407750 2023-11-24 06:29:13,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-24 06:29:15,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2718320.0, ans=0.125 2023-11-24 06:29:20,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.403e+01 8.864e+01 9.655e+01 1.476e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-24 06:29:26,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-24 06:30:00,913 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11000, loss[loss=0.05788, simple_loss=0.07706, pruned_loss=0.008882, audio_tagging_loss=0.01047, over 14362.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09195, pruned_loss=0.01324, audio_tagging_loss=0.008875, over 3050490.06 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:30:04,678 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:30:07,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2718586.6666666665, ans=0.07 2023-11-24 06:30:10,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2718586.6666666665, ans=0.1 2023-11-24 06:30:11,000 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:30:14,006 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407800 2023-11-24 06:30:39,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2718786.6666666665, ans=0.0 2023-11-24 06:30:52,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2718853.3333333335, ans=0.05 2023-11-24 06:31:00,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2718853.3333333335, ans=0.125 2023-11-24 06:31:04,752 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11050, loss[loss=0.06891, simple_loss=0.09379, pruned_loss=0.01392, audio_tagging_loss=0.008091, over 15953.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09165, pruned_loss=0.01328, audio_tagging_loss=0.008978, over 3051630.76 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:31:17,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407850 2023-11-24 06:31:19,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2718986.6666666665, ans=0.2 2023-11-24 06:31:26,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.514e+01 8.990e+01 9.675e+01 1.640e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-24 06:31:33,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-24 06:31:44,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2719120.0, ans=0.1 2023-11-24 06:31:54,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-24 06:31:59,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2719186.6666666665, ans=0.125 2023-11-24 06:32:05,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-24 06:32:06,875 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11100, loss[loss=0.1005, simple_loss=0.1376, pruned_loss=0.0241, audio_tagging_loss=0.007625, over 15821.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09136, pruned_loss=0.01341, audio_tagging_loss=0.009173, over 3054245.04 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:32:19,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407900 2023-11-24 06:32:25,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-24 06:32:33,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-24 06:32:55,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2719520.0, ans=0.5 2023-11-24 06:32:58,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2719520.0, ans=0.125 2023-11-24 06:33:03,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2719520.0, ans=0.0 2023-11-24 06:33:08,617 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11150, loss[loss=0.101, simple_loss=0.1325, pruned_loss=0.02503, audio_tagging_loss=0.009681, over 15120.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09001, pruned_loss=0.0131, audio_tagging_loss=0.009358, over 3050538.88 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:33:21,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 407950 2023-11-24 06:33:29,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2719653.3333333335, ans=0.125 2023-11-24 06:33:31,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.657e+01 9.225e+01 9.905e+01 1.226e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-24 06:33:36,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2719720.0, ans=0.2 2023-11-24 06:34:03,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2719853.3333333335, ans=0.125 2023-11-24 06:34:10,874 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11200, loss[loss=0.0712, simple_loss=0.1029, pruned_loss=0.01322, audio_tagging_loss=0.006511, over 15383.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.08905, pruned_loss=0.01289, audio_tagging_loss=0.00939, over 3049411.34 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:34:19,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2719920.0, ans=0.125 2023-11-24 06:34:23,151 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408000 2023-11-24 06:34:44,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2720053.3333333335, ans=0.0 2023-11-24 06:34:49,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2720053.3333333335, ans=0.2 2023-11-24 06:35:11,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2720186.6666666665, ans=0.125 2023-11-24 06:35:16,694 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11250, loss[loss=0.05937, simple_loss=0.0819, pruned_loss=0.01079, audio_tagging_loss=0.007637, over 15314.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08812, pruned_loss=0.01272, audio_tagging_loss=0.009411, over 3047408.12 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:35:17,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2720253.3333333335, ans=0.125 2023-11-24 06:35:18,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2720253.3333333335, ans=0.0 2023-11-24 06:35:29,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408050 2023-11-24 06:35:39,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.539e+01 9.181e+01 9.755e+01 1.709e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 06:35:49,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2720386.6666666665, ans=0.2 2023-11-24 06:35:54,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-24 06:35:55,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-24 06:36:03,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2720453.3333333335, ans=0.125 2023-11-24 06:36:14,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2720520.0, ans=0.015 2023-11-24 06:36:18,122 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11300, loss[loss=0.05848, simple_loss=0.08319, pruned_loss=0.009877, audio_tagging_loss=0.007006, over 15282.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08942, pruned_loss=0.01298, audio_tagging_loss=0.009261, over 3041826.25 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:36:25,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2720586.6666666665, ans=10.0 2023-11-24 06:36:25,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2720586.6666666665, ans=0.125 2023-11-24 06:36:26,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-24 06:36:30,475 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408100 2023-11-24 06:36:31,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2720653.3333333335, ans=0.125 2023-11-24 06:37:02,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2720786.6666666665, ans=0.2 2023-11-24 06:37:05,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2720786.6666666665, ans=0.1 2023-11-24 06:37:19,931 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11350, loss[loss=0.0557, simple_loss=0.07233, pruned_loss=0.008888, audio_tagging_loss=0.01064, over 14212.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08896, pruned_loss=0.01296, audio_tagging_loss=0.009085, over 3041392.55 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:37:24,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2720920.0, ans=0.125 2023-11-24 06:37:24,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2720920.0, ans=0.1 2023-11-24 06:37:33,094 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408150 2023-11-24 06:37:39,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2720986.6666666665, ans=0.5 2023-11-24 06:37:41,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-24 06:37:41,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2720986.6666666665, ans=0.0 2023-11-24 06:37:43,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.373e+01 9.046e+01 9.724e+01 1.113e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 06:37:49,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-24 06:38:02,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2721120.0, ans=0.035 2023-11-24 06:38:11,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2721186.6666666665, ans=0.2 2023-11-24 06:38:12,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2721186.6666666665, ans=0.125 2023-11-24 06:38:20,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2721186.6666666665, ans=0.125 2023-11-24 06:38:21,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2721253.3333333335, ans=0.125 2023-11-24 06:38:22,783 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11400, loss[loss=0.08335, simple_loss=0.1195, pruned_loss=0.01829, audio_tagging_loss=0.005339, over 14969.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.08968, pruned_loss=0.01308, audio_tagging_loss=0.008996, over 3040196.44 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:38:27,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2721253.3333333335, ans=0.125 2023-11-24 06:38:34,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408200 2023-11-24 06:38:38,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2721320.0, ans=0.125 2023-11-24 06:38:54,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2721386.6666666665, ans=0.2 2023-11-24 06:38:57,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2721386.6666666665, ans=0.1 2023-11-24 06:39:03,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2721453.3333333335, ans=0.0 2023-11-24 06:39:10,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2721453.3333333335, ans=0.125 2023-11-24 06:39:24,328 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11450, loss[loss=0.07241, simple_loss=0.09516, pruned_loss=0.0164, audio_tagging_loss=0.008434, over 16304.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0891, pruned_loss=0.01298, audio_tagging_loss=0.009009, over 3044564.84 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:39:33,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2721586.6666666665, ans=0.125 2023-11-24 06:39:36,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408250 2023-11-24 06:39:37,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2721653.3333333335, ans=0.04949747468305833 2023-11-24 06:39:48,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.611e+01 9.148e+01 9.766e+01 1.313e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 06:39:55,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=12.0 2023-11-24 06:39:57,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2721720.0, ans=0.05 2023-11-24 06:40:03,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2721786.6666666665, ans=0.0 2023-11-24 06:40:08,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2721786.6666666665, ans=0.1 2023-11-24 06:40:16,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2721853.3333333335, ans=0.05 2023-11-24 06:40:25,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2721920.0, ans=0.125 2023-11-24 06:40:26,813 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11500, loss[loss=0.05565, simple_loss=0.07333, pruned_loss=0.0105, audio_tagging_loss=0.008487, over 15911.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08903, pruned_loss=0.01289, audio_tagging_loss=0.008972, over 3051173.68 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:40:31,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-24 06:40:34,929 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:40:38,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2721986.6666666665, ans=0.1 2023-11-24 06:40:38,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2721986.6666666665, ans=0.125 2023-11-24 06:40:39,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408300 2023-11-24 06:40:43,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2721986.6666666665, ans=0.1 2023-11-24 06:40:45,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2023-11-24 06:41:19,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2722186.6666666665, ans=0.2 2023-11-24 06:41:28,741 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11550, loss[loss=0.07895, simple_loss=0.1115, pruned_loss=0.0157, audio_tagging_loss=0.007514, over 15387.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08921, pruned_loss=0.01288, audio_tagging_loss=0.008961, over 3045441.47 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 8.0 2023-11-24 06:41:31,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2722253.3333333335, ans=0.125 2023-11-24 06:41:40,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408350 2023-11-24 06:41:49,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2722320.0, ans=0.125 2023-11-24 06:41:51,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.495e+01 8.414e+01 9.031e+01 9.585e+01 1.239e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 06:42:07,496 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 06:42:21,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2722520.0, ans=0.0 2023-11-24 06:42:29,710 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11600, loss[loss=0.0575, simple_loss=0.07835, pruned_loss=0.009172, audio_tagging_loss=0.00915, over 14403.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09007, pruned_loss=0.01301, audio_tagging_loss=0.00886, over 3048953.41 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:42:42,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408400 2023-11-24 06:43:00,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2722720.0, ans=0.125 2023-11-24 06:43:09,011 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:43:15,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2023-11-24 06:43:16,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2722786.6666666665, ans=0.125 2023-11-24 06:43:16,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2722786.6666666665, ans=0.125 2023-11-24 06:43:30,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2722920.0, ans=0.125 2023-11-24 06:43:31,833 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11650, loss[loss=0.0518, simple_loss=0.06502, pruned_loss=0.007932, audio_tagging_loss=0.01136, over 13392.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08997, pruned_loss=0.01292, audio_tagging_loss=0.008885, over 3049056.27 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:43:45,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408450 2023-11-24 06:43:51,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2722986.6666666665, ans=0.125 2023-11-24 06:43:53,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2722986.6666666665, ans=0.125 2023-11-24 06:43:55,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.490e+01 8.921e+01 9.548e+01 1.142e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-24 06:44:01,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2723053.3333333335, ans=0.04949747468305833 2023-11-24 06:44:34,530 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11700, loss[loss=0.06148, simple_loss=0.08843, pruned_loss=0.01, audio_tagging_loss=0.007263, over 14198.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.0905, pruned_loss=0.01309, audio_tagging_loss=0.008908, over 3051933.06 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:44:34,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2723253.3333333335, ans=10.0 2023-11-24 06:44:46,466 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408500 2023-11-24 06:44:59,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2723386.6666666665, ans=0.1 2023-11-24 06:45:10,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:45:10,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2723453.3333333335, ans=0.125 2023-11-24 06:45:16,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2723453.3333333335, ans=0.1 2023-11-24 06:45:16,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2023-11-24 06:45:17,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-24 06:45:35,965 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11750, loss[loss=0.06829, simple_loss=0.09776, pruned_loss=0.01358, audio_tagging_loss=0.005826, over 14636.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09041, pruned_loss=0.01311, audio_tagging_loss=0.008825, over 3053770.15 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:45:36,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2723586.6666666665, ans=0.2 2023-11-24 06:45:36,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2723586.6666666665, ans=0.125 2023-11-24 06:45:46,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2723653.3333333335, ans=0.125 2023-11-24 06:45:48,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408550 2023-11-24 06:45:52,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2723653.3333333335, ans=0.0 2023-11-24 06:45:59,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.698e+01 9.400e+01 1.049e+02 1.274e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-24 06:46:02,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2723720.0, ans=0.125 2023-11-24 06:46:03,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2723720.0, ans=0.125 2023-11-24 06:46:07,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-24 06:46:33,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2723853.3333333335, ans=0.125 2023-11-24 06:46:36,804 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11800, loss[loss=0.06131, simple_loss=0.08471, pruned_loss=0.01174, audio_tagging_loss=0.007208, over 15110.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09067, pruned_loss=0.01325, audio_tagging_loss=0.008866, over 3054322.06 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:46:51,026 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408600 2023-11-24 06:46:52,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2723986.6666666665, ans=0.125 2023-11-24 06:46:58,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2723986.6666666665, ans=0.125 2023-11-24 06:47:08,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2724053.3333333335, ans=0.0 2023-11-24 06:47:37,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-24 06:47:40,766 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11850, loss[loss=0.06399, simple_loss=0.08838, pruned_loss=0.01165, audio_tagging_loss=0.008151, over 13826.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.08971, pruned_loss=0.013, audio_tagging_loss=0.008985, over 3049002.95 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:47:52,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408650 2023-11-24 06:48:02,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2724320.0, ans=0.2 2023-11-24 06:48:03,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.550e+01 9.032e+01 9.774e+01 1.343e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 06:48:11,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2023-11-24 06:48:34,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2724520.0, ans=0.2 2023-11-24 06:48:42,004 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11900, loss[loss=0.06266, simple_loss=0.0766, pruned_loss=0.01265, audio_tagging_loss=0.01171, over 15596.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.08966, pruned_loss=0.01302, audio_tagging_loss=0.009173, over 3046650.96 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:48:54,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408700 2023-11-24 06:49:12,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2724720.0, ans=0.125 2023-11-24 06:49:18,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-24 06:49:26,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2724786.6666666665, ans=0.0 2023-11-24 06:49:36,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2724853.3333333335, ans=0.1 2023-11-24 06:49:42,885 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 11950, loss[loss=0.07284, simple_loss=0.1026, pruned_loss=0.0134, audio_tagging_loss=0.008151, over 14682.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09033, pruned_loss=0.01314, audio_tagging_loss=0.009216, over 3049678.67 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 16.0 2023-11-24 06:49:46,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:49:50,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2023-11-24 06:49:51,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.15 vs. limit=10.0 2023-11-24 06:49:56,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408750 2023-11-24 06:49:56,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2724986.6666666665, ans=0.05 2023-11-24 06:49:59,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2724986.6666666665, ans=0.0 2023-11-24 06:50:07,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.293e+01 8.993e+01 9.649e+01 1.275e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-24 06:50:20,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2725120.0, ans=0.0 2023-11-24 06:50:23,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2725120.0, ans=0.0 2023-11-24 06:50:34,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2725186.6666666665, ans=0.0 2023-11-24 06:50:39,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2725186.6666666665, ans=0.125 2023-11-24 06:50:42,947 INFO [train_asr.py:1221] (1/4) Epoch 34, batch 12000, loss[loss=0.07422, simple_loss=0.1054, pruned_loss=0.01167, audio_tagging_loss=0.009859, over 15998.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09016, pruned_loss=0.01301, audio_tagging_loss=0.00928, over 3049993.39 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 32.0 2023-11-24 06:50:42,948 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 06:51:11,715 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2922, 4.2872, 4.4695, 4.4222], device='cuda:1') 2023-11-24 06:51:19,021 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1315, 2.5940, 4.9637, 2.9253], device='cuda:1') 2023-11-24 06:51:25,746 INFO [train_asr.py:1253] (1/4) Epoch 34, validation: loss=0.05837, simple_loss=0.05087, pruned_loss=0.005158, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-24 06:51:25,747 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 06:51:29,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-24 06:51:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2725253.3333333335, ans=0.125 2023-11-24 06:51:36,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408800 2023-11-24 06:51:37,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=2725320.0, ans=0.02 2023-11-24 06:52:25,026 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 0, loss[loss=0.06483, simple_loss=0.06493, pruned_loss=0.008385, audio_tagging_loss=0.02398, over 15186.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.06493, pruned_loss=0.008385, audio_tagging_loss=0.02398, over 15186.00 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 06:52:25,027 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 06:52:56,023 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2752, 3.0509, 3.2728, 2.9601, 3.7005, 3.7664, 3.3216, 3.2017], device='cuda:1') 2023-11-24 06:53:00,543 INFO [train_asr.py:1253] (1/4) Epoch 35, validation: loss=0.05805, simple_loss=0.05089, pruned_loss=0.005144, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-24 06:53:00,544 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 06:53:15,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2725473.3333333335, ans=0.0 2023-11-24 06:53:24,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2725473.3333333335, ans=0.2 2023-11-24 06:53:28,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2725540.0, ans=10.0 2023-11-24 06:53:35,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2023-11-24 06:53:41,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2023-11-24 06:53:47,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408850 2023-11-24 06:53:57,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.972e+01 1.010e+02 1.109e+02 1.515e+02, threshold=2.019e+02, percent-clipped=0.0 2023-11-24 06:54:03,115 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 50, loss[loss=0.07437, simple_loss=0.09255, pruned_loss=0.01226, audio_tagging_loss=0.01584, over 16659.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09509, pruned_loss=0.01379, audio_tagging_loss=0.0169, over 684700.68 frames. ], batch size: 62, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 06:54:28,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2725873.3333333335, ans=0.125 2023-11-24 06:54:50,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408900 2023-11-24 06:55:03,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2726006.6666666665, ans=0.09899494936611666 2023-11-24 06:55:04,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2726006.6666666665, ans=0.125 2023-11-24 06:55:06,608 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 100, loss[loss=0.07039, simple_loss=0.08181, pruned_loss=0.01204, audio_tagging_loss=0.01745, over 16021.00 frames. ], tot_loss[loss=0.07548, simple_loss=0.09109, pruned_loss=0.01346, audio_tagging_loss=0.01648, over 1207815.85 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 06:55:15,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.15 vs. limit=15.0 2023-11-24 06:55:26,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2726140.0, ans=0.125 2023-11-24 06:55:31,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2726206.6666666665, ans=0.1 2023-11-24 06:55:31,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-11-24 06:55:53,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 408950 2023-11-24 06:56:04,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 9.169e+01 9.656e+01 1.031e+02 1.256e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-24 06:56:08,794 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 150, loss[loss=0.08243, simple_loss=0.1127, pruned_loss=0.01813, audio_tagging_loss=0.00797, over 14723.00 frames. ], tot_loss[loss=0.0742, simple_loss=0.09216, pruned_loss=0.01341, audio_tagging_loss=0.01471, over 1621228.36 frames. ], batch size: 52, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 06:56:10,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2726406.6666666665, ans=0.1 2023-11-24 06:56:11,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2726406.6666666665, ans=0.2 2023-11-24 06:56:11,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2726406.6666666665, ans=0.125 2023-11-24 06:56:22,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2726473.3333333335, ans=0.1 2023-11-24 06:56:25,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2726473.3333333335, ans=0.1 2023-11-24 06:56:25,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2726473.3333333335, ans=0.125 2023-11-24 06:56:35,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2726540.0, ans=0.125 2023-11-24 06:56:54,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409000 2023-11-24 06:56:58,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2726673.3333333335, ans=0.0 2023-11-24 06:57:03,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2726673.3333333335, ans=0.0 2023-11-24 06:57:03,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2726673.3333333335, ans=0.125 2023-11-24 06:57:11,054 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 200, loss[loss=0.0745, simple_loss=0.1016, pruned_loss=0.0152, audio_tagging_loss=0.008495, over 16226.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.09125, pruned_loss=0.01322, audio_tagging_loss=0.01312, over 1928819.47 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 06:57:13,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2726740.0, ans=0.1 2023-11-24 06:57:18,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2726740.0, ans=0.125 2023-11-24 06:57:37,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-24 06:57:50,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2726940.0, ans=0.125 2023-11-24 06:57:52,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2726940.0, ans=0.125 2023-11-24 06:57:56,175 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409050 2023-11-24 06:58:07,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2727006.6666666665, ans=0.04949747468305833 2023-11-24 06:58:09,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.427e+01 9.100e+01 9.790e+01 1.266e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 06:58:12,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2727073.3333333335, ans=0.2 2023-11-24 06:58:13,109 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 250, loss[loss=0.06381, simple_loss=0.07916, pruned_loss=0.01237, audio_tagging_loss=0.01186, over 13871.00 frames. ], tot_loss[loss=0.06986, simple_loss=0.08975, pruned_loss=0.013, audio_tagging_loss=0.01199, over 2176658.03 frames. ], batch size: 55, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 06:58:15,759 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 06:58:23,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2727140.0, ans=0.125 2023-11-24 06:58:25,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2727140.0, ans=0.0 2023-11-24 06:58:31,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2727140.0, ans=0.0 2023-11-24 06:58:32,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2727140.0, ans=0.125 2023-11-24 06:58:59,001 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409100 2023-11-24 06:59:08,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-24 06:59:11,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-24 06:59:14,735 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 300, loss[loss=0.08718, simple_loss=0.1242, pruned_loss=0.01871, audio_tagging_loss=0.006358, over 15546.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.09021, pruned_loss=0.0132, audio_tagging_loss=0.01103, over 2369114.58 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 06:59:48,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2727540.0, ans=0.025 2023-11-24 07:00:00,060 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409150 2023-11-24 07:00:02,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2727606.6666666665, ans=0.125 2023-11-24 07:00:07,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2727673.3333333335, ans=0.1 2023-11-24 07:00:13,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.645e+01 9.268e+01 9.915e+01 1.259e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 07:00:16,045 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 350, loss[loss=0.0765, simple_loss=0.09882, pruned_loss=0.01751, audio_tagging_loss=0.009581, over 15715.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09098, pruned_loss=0.01318, audio_tagging_loss=0.01047, over 2523318.83 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 8.0 2023-11-24 07:00:25,762 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:01:01,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-24 07:01:02,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409200 2023-11-24 07:01:05,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2728006.6666666665, ans=0.125 2023-11-24 07:01:09,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2728006.6666666665, ans=0.125 2023-11-24 07:01:16,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2728006.6666666665, ans=0.0 2023-11-24 07:01:19,544 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 400, loss[loss=0.05132, simple_loss=0.06802, pruned_loss=0.007999, audio_tagging_loss=0.009308, over 15433.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09007, pruned_loss=0.01302, audio_tagging_loss=0.01009, over 2640739.27 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:01:50,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2023-11-24 07:02:05,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409250 2023-11-24 07:02:18,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.492e+01 8.964e+01 9.627e+01 1.454e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 07:02:21,382 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 450, loss[loss=0.05839, simple_loss=0.07941, pruned_loss=0.008479, audio_tagging_loss=0.01021, over 15559.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09061, pruned_loss=0.01318, audio_tagging_loss=0.009815, over 2736292.09 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:03:08,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409300 2023-11-24 07:03:11,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2728673.3333333335, ans=0.95 2023-11-24 07:03:21,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-24 07:03:23,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2728740.0, ans=0.1 2023-11-24 07:03:24,185 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 500, loss[loss=0.08371, simple_loss=0.1075, pruned_loss=0.01757, audio_tagging_loss=0.0124, over 15952.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.0901, pruned_loss=0.01297, audio_tagging_loss=0.009584, over 2799886.94 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:03:29,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2728740.0, ans=0.125 2023-11-24 07:03:44,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2728806.6666666665, ans=0.125 2023-11-24 07:03:48,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2728873.3333333335, ans=0.2 2023-11-24 07:03:55,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2728873.3333333335, ans=0.0 2023-11-24 07:03:57,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-24 07:04:10,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409350 2023-11-24 07:04:12,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729006.6666666665, ans=0.1 2023-11-24 07:04:24,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.423e+01 9.068e+01 9.980e+01 1.380e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 07:04:26,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2729073.3333333335, ans=0.125 2023-11-24 07:04:27,223 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 550, loss[loss=0.07655, simple_loss=0.1024, pruned_loss=0.01657, audio_tagging_loss=0.008803, over 15865.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.08997, pruned_loss=0.01302, audio_tagging_loss=0.009476, over 2860306.74 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:04:30,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729073.3333333335, ans=0.1 2023-11-24 07:04:41,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2729140.0, ans=0.125 2023-11-24 07:04:43,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-24 07:05:02,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2729273.3333333335, ans=0.1 2023-11-24 07:05:12,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409400 2023-11-24 07:05:21,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-24 07:05:28,485 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 600, loss[loss=0.0888, simple_loss=0.123, pruned_loss=0.01945, audio_tagging_loss=0.007854, over 15762.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.08968, pruned_loss=0.01304, audio_tagging_loss=0.009429, over 2900696.65 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:05:28,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2729406.6666666665, ans=0.0 2023-11-24 07:05:38,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2729406.6666666665, ans=0.125 2023-11-24 07:05:53,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-24 07:06:06,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2729606.6666666665, ans=0.0 2023-11-24 07:06:07,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2729606.6666666665, ans=0.125 2023-11-24 07:06:14,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409450 2023-11-24 07:06:16,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2729673.3333333335, ans=0.125 2023-11-24 07:06:20,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2729673.3333333335, ans=10.0 2023-11-24 07:06:27,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.229e+01 8.791e+01 9.631e+01 1.307e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-24 07:06:29,977 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 650, loss[loss=0.07459, simple_loss=0.107, pruned_loss=0.01446, audio_tagging_loss=0.006628, over 16008.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09092, pruned_loss=0.01324, audio_tagging_loss=0.009277, over 2945692.84 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:06:36,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2729740.0, ans=0.0 2023-11-24 07:06:39,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2729740.0, ans=0.125 2023-11-24 07:06:48,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2729806.6666666665, ans=0.09899494936611666 2023-11-24 07:06:58,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729873.3333333335, ans=0.1 2023-11-24 07:07:01,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2729873.3333333335, ans=0.125 2023-11-24 07:07:13,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2729940.0, ans=0.125 2023-11-24 07:07:15,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409500 2023-11-24 07:07:17,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729940.0, ans=0.1 2023-11-24 07:07:32,855 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 700, loss[loss=0.06194, simple_loss=0.07904, pruned_loss=0.012, audio_tagging_loss=0.01042, over 15672.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09108, pruned_loss=0.01306, audio_tagging_loss=0.009176, over 2971944.86 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:07:40,529 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:07:41,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-24 07:08:01,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2730206.6666666665, ans=0.2 2023-11-24 07:08:11,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.90 vs. limit=22.5 2023-11-24 07:08:13,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2730273.3333333335, ans=0.0 2023-11-24 07:08:13,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2730273.3333333335, ans=0.0 2023-11-24 07:08:15,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2730273.3333333335, ans=0.0 2023-11-24 07:08:19,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409550 2023-11-24 07:08:22,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2023-11-24 07:08:24,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-24 07:08:32,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.410e+01 9.143e+01 1.010e+02 1.194e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 07:08:34,820 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 750, loss[loss=0.07056, simple_loss=0.0949, pruned_loss=0.01403, audio_tagging_loss=0.009074, over 15084.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09134, pruned_loss=0.0131, audio_tagging_loss=0.009155, over 2990448.88 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:08:34,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2730406.6666666665, ans=0.125 2023-11-24 07:08:39,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-24 07:08:55,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2023-11-24 07:09:15,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2730606.6666666665, ans=0.2 2023-11-24 07:09:20,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409600 2023-11-24 07:09:22,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-24 07:09:36,489 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 800, loss[loss=0.07345, simple_loss=0.107, pruned_loss=0.01304, audio_tagging_loss=0.006903, over 16447.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09225, pruned_loss=0.0132, audio_tagging_loss=0.009088, over 3008397.69 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:09:39,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2730740.0, ans=0.125 2023-11-24 07:10:22,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409650 2023-11-24 07:10:35,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.567e+01 9.382e+01 1.008e+02 1.693e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-24 07:10:38,340 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 850, loss[loss=0.06155, simple_loss=0.0749, pruned_loss=0.01058, audio_tagging_loss=0.01352, over 14820.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09193, pruned_loss=0.01318, audio_tagging_loss=0.009198, over 3009584.12 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:10:44,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2731073.3333333335, ans=0.125 2023-11-24 07:10:49,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2731073.3333333335, ans=0.0 2023-11-24 07:11:02,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.60 vs. limit=22.5 2023-11-24 07:11:24,123 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409700 2023-11-24 07:11:24,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2731273.3333333335, ans=0.125 2023-11-24 07:11:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2731340.0, ans=0.125 2023-11-24 07:11:40,595 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 900, loss[loss=0.06724, simple_loss=0.09254, pruned_loss=0.0136, audio_tagging_loss=0.007373, over 14927.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09265, pruned_loss=0.01349, audio_tagging_loss=0.009184, over 3017416.95 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:11:40,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2731406.6666666665, ans=0.0 2023-11-24 07:11:49,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2731406.6666666665, ans=0.05 2023-11-24 07:12:26,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409750 2023-11-24 07:12:30,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2731673.3333333335, ans=0.1 2023-11-24 07:12:39,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.786e+01 9.264e+01 9.713e+01 1.175e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 07:12:42,026 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 950, loss[loss=0.06604, simple_loss=0.09085, pruned_loss=0.009654, audio_tagging_loss=0.01096, over 15055.00 frames. ], tot_loss[loss=0.06904, simple_loss=0.09246, pruned_loss=0.01358, audio_tagging_loss=0.009237, over 3023744.59 frames. ], batch size: 55, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:12:45,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2731740.0, ans=0.0 2023-11-24 07:12:52,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-24 07:13:03,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2731806.6666666665, ans=0.035 2023-11-24 07:13:10,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2731873.3333333335, ans=0.2 2023-11-24 07:13:11,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2731873.3333333335, ans=0.1 2023-11-24 07:13:20,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2731940.0, ans=0.0 2023-11-24 07:13:28,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409800 2023-11-24 07:13:31,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2732006.6666666665, ans=0.125 2023-11-24 07:13:34,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2732006.6666666665, ans=0.125 2023-11-24 07:13:34,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2732006.6666666665, ans=0.5 2023-11-24 07:13:44,711 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1000, loss[loss=0.07655, simple_loss=0.1026, pruned_loss=0.01804, audio_tagging_loss=0.007191, over 15827.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.09188, pruned_loss=0.01363, audio_tagging_loss=0.009051, over 3027786.13 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:13:52,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2732073.3333333335, ans=0.125 2023-11-24 07:14:03,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2023-11-24 07:14:11,500 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:14:31,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409850 2023-11-24 07:14:46,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.432e+01 9.037e+01 9.609e+01 1.581e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 07:14:48,079 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1050, loss[loss=0.05649, simple_loss=0.07877, pruned_loss=0.009148, audio_tagging_loss=0.007955, over 15954.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09095, pruned_loss=0.01338, audio_tagging_loss=0.009029, over 3033004.43 frames. ], batch size: 61, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:14:59,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2732473.3333333335, ans=0.1 2023-11-24 07:15:13,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2732540.0, ans=0.125 2023-11-24 07:15:34,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409900 2023-11-24 07:15:34,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2732606.6666666665, ans=0.125 2023-11-24 07:15:49,675 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1100, loss[loss=0.06501, simple_loss=0.09408, pruned_loss=0.01071, audio_tagging_loss=0.00726, over 15631.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09053, pruned_loss=0.01315, audio_tagging_loss=0.009071, over 3037631.69 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:15:52,062 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:15:56,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2732740.0, ans=0.125 2023-11-24 07:16:05,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2732806.6666666665, ans=0.125 2023-11-24 07:16:18,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=22.5 2023-11-24 07:16:28,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2732940.0, ans=0.125 2023-11-24 07:16:35,787 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 409950 2023-11-24 07:16:48,840 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:16:49,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.554e+01 8.985e+01 9.572e+01 1.203e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-24 07:16:50,942 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1150, loss[loss=0.05404, simple_loss=0.06727, pruned_loss=0.01223, audio_tagging_loss=0.008176, over 13583.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.0907, pruned_loss=0.01315, audio_tagging_loss=0.008996, over 3042987.32 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:16:55,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-24 07:17:03,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2733140.0, ans=0.2 2023-11-24 07:17:16,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2733206.6666666665, ans=0.125 2023-11-24 07:17:37,259 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410000 2023-11-24 07:17:50,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733340.0, ans=0.1 2023-11-24 07:17:54,383 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1200, loss[loss=0.08304, simple_loss=0.1092, pruned_loss=0.01748, audio_tagging_loss=0.01094, over 15823.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09088, pruned_loss=0.01327, audio_tagging_loss=0.00896, over 3040508.59 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:18:00,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.22 vs. limit=22.5 2023-11-24 07:18:04,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-11-24 07:18:11,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2023-11-24 07:18:13,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2733473.3333333335, ans=0.0 2023-11-24 07:18:23,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2733540.0, ans=0.2 2023-11-24 07:18:27,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2733540.0, ans=0.07 2023-11-24 07:18:40,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410050 2023-11-24 07:18:55,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.610e+01 9.345e+01 9.890e+01 1.292e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-24 07:18:56,693 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1250, loss[loss=0.05856, simple_loss=0.07821, pruned_loss=0.009829, audio_tagging_loss=0.009626, over 15140.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09034, pruned_loss=0.01324, audio_tagging_loss=0.008961, over 3040290.04 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:19:07,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2733806.6666666665, ans=0.125 2023-11-24 07:19:35,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2733940.0, ans=0.1 2023-11-24 07:19:38,551 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:19:42,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2733940.0, ans=0.125 2023-11-24 07:19:43,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410100 2023-11-24 07:19:47,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2734006.6666666665, ans=0.2 2023-11-24 07:19:48,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2734006.6666666665, ans=0.0 2023-11-24 07:19:58,434 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1300, loss[loss=0.04887, simple_loss=0.06227, pruned_loss=0.006696, audio_tagging_loss=0.01103, over 15450.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.0908, pruned_loss=0.01334, audio_tagging_loss=0.008826, over 3042950.43 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:20:12,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2734140.0, ans=0.125 2023-11-24 07:20:13,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2734140.0, ans=0.0 2023-11-24 07:20:30,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2734206.6666666665, ans=0.2 2023-11-24 07:20:38,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734273.3333333335, ans=0.1 2023-11-24 07:20:44,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-24 07:20:45,094 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410150 2023-11-24 07:20:58,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2734340.0, ans=0.125 2023-11-24 07:21:01,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.312e+01 8.919e+01 9.626e+01 1.151e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-24 07:21:01,980 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1350, loss[loss=0.05441, simple_loss=0.06911, pruned_loss=0.008146, audio_tagging_loss=0.01171, over 14647.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09148, pruned_loss=0.01324, audio_tagging_loss=0.008827, over 3050021.02 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:21:04,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2734406.6666666665, ans=0.125 2023-11-24 07:21:32,822 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:21:47,022 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:21:48,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410200 2023-11-24 07:22:04,501 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1400, loss[loss=0.04915, simple_loss=0.06312, pruned_loss=0.008354, audio_tagging_loss=0.009233, over 14678.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09046, pruned_loss=0.0132, audio_tagging_loss=0.009016, over 3041116.76 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:22:09,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2734740.0, ans=0.0 2023-11-24 07:22:21,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:22:35,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-24 07:22:50,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410250 2023-11-24 07:23:06,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.227e+01 8.989e+01 9.587e+01 1.141e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-24 07:23:06,065 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1450, loss[loss=0.06355, simple_loss=0.09156, pruned_loss=0.009786, audio_tagging_loss=0.007988, over 14549.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09048, pruned_loss=0.01319, audio_tagging_loss=0.00906, over 3040226.44 frames. ], batch size: 55, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:23:29,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2023-11-24 07:23:41,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2735206.6666666665, ans=0.1 2023-11-24 07:23:43,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2735273.3333333335, ans=0.125 2023-11-24 07:23:46,578 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:23:52,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410300 2023-11-24 07:23:52,543 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:24:03,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-24 07:24:09,054 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1500, loss[loss=0.08384, simple_loss=0.1107, pruned_loss=0.01865, audio_tagging_loss=0.009839, over 15035.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09052, pruned_loss=0.0131, audio_tagging_loss=0.00908, over 3036823.83 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:24:09,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2735406.6666666665, ans=0.2 2023-11-24 07:24:19,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2735473.3333333335, ans=0.125 2023-11-24 07:24:42,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2735540.0, ans=0.2 2023-11-24 07:24:51,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2735606.6666666665, ans=0.0 2023-11-24 07:24:54,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735606.6666666665, ans=0.125 2023-11-24 07:24:55,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410350 2023-11-24 07:24:56,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2023-11-24 07:25:01,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2735673.3333333335, ans=0.0 2023-11-24 07:25:10,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.532e+01 9.153e+01 9.693e+01 1.540e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 07:25:10,466 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1550, loss[loss=0.09263, simple_loss=0.1252, pruned_loss=0.02391, audio_tagging_loss=0.006132, over 15352.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09085, pruned_loss=0.01323, audio_tagging_loss=0.009103, over 3037877.62 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:25:26,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-24 07:25:35,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2735873.3333333335, ans=0.125 2023-11-24 07:25:37,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2735873.3333333335, ans=0.125 2023-11-24 07:25:50,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735940.0, ans=0.1 2023-11-24 07:25:57,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410400 2023-11-24 07:26:00,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2736006.6666666665, ans=0.07 2023-11-24 07:26:03,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2736006.6666666665, ans=10.0 2023-11-24 07:26:05,011 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:26:07,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2736006.6666666665, ans=0.0 2023-11-24 07:26:13,370 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1600, loss[loss=0.0548, simple_loss=0.07346, pruned_loss=0.008397, audio_tagging_loss=0.009674, over 16273.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09047, pruned_loss=0.0133, audio_tagging_loss=0.009276, over 3041507.41 frames. ], batch size: 60, lr: 1.95e-03, grad_scale: 32.0 2023-11-24 07:26:34,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-11-24 07:26:46,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2736206.6666666665, ans=0.0 2023-11-24 07:26:47,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-11-24 07:26:57,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2736273.3333333335, ans=0.0 2023-11-24 07:26:59,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410450 2023-11-24 07:26:59,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2736273.3333333335, ans=0.0 2023-11-24 07:27:06,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-24 07:27:15,436 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1650, loss[loss=0.06289, simple_loss=0.0794, pruned_loss=0.01552, audio_tagging_loss=0.007675, over 14988.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09029, pruned_loss=0.01327, audio_tagging_loss=0.009367, over 3042734.94 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:27:17,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.572e+01 9.096e+01 9.732e+01 1.252e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 07:27:28,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2023-11-24 07:27:29,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2736473.3333333335, ans=0.0 2023-11-24 07:27:50,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.04 vs. limit=10.0 2023-11-24 07:28:01,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410500 2023-11-24 07:28:01,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2736606.6666666665, ans=0.125 2023-11-24 07:28:05,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2736673.3333333335, ans=0.125 2023-11-24 07:28:11,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2736673.3333333335, ans=0.125 2023-11-24 07:28:16,959 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1700, loss[loss=0.06368, simple_loss=0.09014, pruned_loss=0.01175, audio_tagging_loss=0.006858, over 16404.00 frames. ], tot_loss[loss=0.06827, simple_loss=0.09146, pruned_loss=0.01324, audio_tagging_loss=0.009296, over 3048560.89 frames. ], batch size: 61, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:28:19,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2736740.0, ans=0.125 2023-11-24 07:28:28,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2736806.6666666665, ans=0.0 2023-11-24 07:28:34,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2736806.6666666665, ans=0.125 2023-11-24 07:28:50,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2736873.3333333335, ans=0.2 2023-11-24 07:28:57,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2736940.0, ans=0.125 2023-11-24 07:29:03,151 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410550 2023-11-24 07:29:15,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2023-11-24 07:29:18,498 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1750, loss[loss=0.07202, simple_loss=0.1084, pruned_loss=0.01146, audio_tagging_loss=0.00635, over 14120.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09138, pruned_loss=0.01315, audio_tagging_loss=0.009196, over 3048032.17 frames. ], batch size: 52, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:29:19,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.559e+01 9.195e+01 9.925e+01 1.188e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 07:29:20,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-24 07:29:44,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2737206.6666666665, ans=0.125 2023-11-24 07:30:03,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-11-24 07:30:04,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410600 2023-11-24 07:30:21,243 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1800, loss[loss=0.07291, simple_loss=0.105, pruned_loss=0.01104, audio_tagging_loss=0.009342, over 14897.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09068, pruned_loss=0.01303, audio_tagging_loss=0.009052, over 3045532.00 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 8.0 2023-11-24 07:30:25,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2737406.6666666665, ans=0.1 2023-11-24 07:30:47,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2737540.0, ans=0.0 2023-11-24 07:30:57,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2737606.6666666665, ans=0.05 2023-11-24 07:31:06,444 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410650 2023-11-24 07:31:10,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737673.3333333335, ans=0.1 2023-11-24 07:31:16,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2737673.3333333335, ans=0.0 2023-11-24 07:31:16,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2737673.3333333335, ans=0.0 2023-11-24 07:31:19,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2737673.3333333335, ans=0.2 2023-11-24 07:31:23,082 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1850, loss[loss=0.06272, simple_loss=0.08518, pruned_loss=0.01186, audio_tagging_loss=0.008275, over 16282.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09112, pruned_loss=0.01326, audio_tagging_loss=0.008958, over 3039129.90 frames. ], batch size: 61, lr: 1.95e-03, grad_scale: 8.0 2023-11-24 07:31:24,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2737740.0, ans=0.2 2023-11-24 07:31:25,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.283e+01 8.932e+01 9.471e+01 1.128e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 07:31:38,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2737806.6666666665, ans=0.1 2023-11-24 07:31:59,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2737940.0, ans=0.035 2023-11-24 07:32:09,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410700 2023-11-24 07:32:18,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2738006.6666666665, ans=0.1 2023-11-24 07:32:24,375 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1900, loss[loss=0.07826, simple_loss=0.09955, pruned_loss=0.01807, audio_tagging_loss=0.01041, over 14312.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09129, pruned_loss=0.0132, audio_tagging_loss=0.008978, over 3045210.37 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 8.0 2023-11-24 07:32:25,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2738073.3333333335, ans=0.2 2023-11-24 07:32:32,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2738073.3333333335, ans=0.1 2023-11-24 07:32:52,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2738206.6666666665, ans=0.2 2023-11-24 07:33:03,532 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:33:03,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738273.3333333335, ans=0.1 2023-11-24 07:33:10,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410750 2023-11-24 07:33:17,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2738340.0, ans=0.0 2023-11-24 07:33:26,600 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 1950, loss[loss=0.068, simple_loss=0.09357, pruned_loss=0.01371, audio_tagging_loss=0.007503, over 15749.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09133, pruned_loss=0.01319, audio_tagging_loss=0.00886, over 3046084.85 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 8.0 2023-11-24 07:33:28,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.505e+01 9.149e+01 9.609e+01 1.191e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 07:33:38,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2738473.3333333335, ans=0.025 2023-11-24 07:33:59,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2738540.0, ans=0.125 2023-11-24 07:34:12,065 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410800 2023-11-24 07:34:27,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2738740.0, ans=0.0 2023-11-24 07:34:28,844 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2000, loss[loss=0.07798, simple_loss=0.09492, pruned_loss=0.01907, audio_tagging_loss=0.01145, over 15279.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09112, pruned_loss=0.01316, audio_tagging_loss=0.008895, over 3039423.41 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:34:29,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2738740.0, ans=0.125 2023-11-24 07:34:31,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2738740.0, ans=0.125 2023-11-24 07:34:36,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-11-24 07:34:42,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2023-11-24 07:35:03,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2738873.3333333335, ans=0.125 2023-11-24 07:35:06,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2738940.0, ans=0.125 2023-11-24 07:35:15,236 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410850 2023-11-24 07:35:30,565 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2050, loss[loss=0.06817, simple_loss=0.0913, pruned_loss=0.01523, audio_tagging_loss=0.007297, over 14920.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09103, pruned_loss=0.01333, audio_tagging_loss=0.008963, over 3038366.89 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:35:30,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2739073.3333333335, ans=0.125 2023-11-24 07:35:32,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.576e+01 8.370e+01 9.057e+01 9.754e+01 1.235e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-24 07:35:33,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2739073.3333333335, ans=0.125 2023-11-24 07:35:40,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2739073.3333333335, ans=0.125 2023-11-24 07:35:41,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-11-24 07:36:04,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2739206.6666666665, ans=0.0 2023-11-24 07:36:05,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2739206.6666666665, ans=0.125 2023-11-24 07:36:16,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410900 2023-11-24 07:36:22,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739340.0, ans=0.1 2023-11-24 07:36:22,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-24 07:36:31,809 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2100, loss[loss=0.05991, simple_loss=0.07653, pruned_loss=0.01226, audio_tagging_loss=0.009382, over 14573.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09161, pruned_loss=0.01348, audio_tagging_loss=0.008863, over 3037769.68 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:36:43,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2739406.6666666665, ans=0.0 2023-11-24 07:36:58,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739540.0, ans=0.1 2023-11-24 07:36:58,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2739540.0, ans=0.0 2023-11-24 07:37:02,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2739540.0, ans=0.1 2023-11-24 07:37:13,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2739606.6666666665, ans=0.125 2023-11-24 07:37:18,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 410950 2023-11-24 07:37:35,226 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2150, loss[loss=0.0853, simple_loss=0.1243, pruned_loss=0.01564, audio_tagging_loss=0.007484, over 15823.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09089, pruned_loss=0.01341, audio_tagging_loss=0.009011, over 3039141.45 frames. ], batch size: 58, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:37:37,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.763e+01 9.326e+01 1.012e+02 1.387e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 07:37:50,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-11-24 07:37:55,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-24 07:38:10,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2739940.0, ans=0.1 2023-11-24 07:38:11,452 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:38:13,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2739940.0, ans=0.1 2023-11-24 07:38:21,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411000 2023-11-24 07:38:24,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2740006.6666666665, ans=0.125 2023-11-24 07:38:37,189 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2200, loss[loss=0.0432, simple_loss=0.05859, pruned_loss=0.005261, audio_tagging_loss=0.00864, over 14148.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.0912, pruned_loss=0.01337, audio_tagging_loss=0.008981, over 3041108.42 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 16.0 2023-11-24 07:38:51,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2740140.0, ans=0.1 2023-11-24 07:39:18,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2740273.3333333335, ans=0.0 2023-11-24 07:39:20,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2740273.3333333335, ans=0.0 2023-11-24 07:39:23,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411050 2023-11-24 07:39:24,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2740273.3333333335, ans=0.1 2023-11-24 07:39:25,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2740340.0, ans=0.07 2023-11-24 07:39:38,461 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2250, loss[loss=0.08162, simple_loss=0.1106, pruned_loss=0.01794, audio_tagging_loss=0.008388, over 13702.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09246, pruned_loss=0.0137, audio_tagging_loss=0.009008, over 3045658.70 frames. ], batch size: 53, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:39:40,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.864e+01 9.346e+01 9.896e+01 1.400e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-24 07:39:51,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2740473.3333333335, ans=0.125 2023-11-24 07:39:55,252 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:39:57,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-24 07:40:11,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-24 07:40:23,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=22.5 2023-11-24 07:40:25,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411100 2023-11-24 07:40:27,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2740673.3333333335, ans=0.07 2023-11-24 07:40:29,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-11-24 07:40:40,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2740673.3333333335, ans=0.125 2023-11-24 07:40:42,283 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2300, loss[loss=0.07195, simple_loss=0.1068, pruned_loss=0.01124, audio_tagging_loss=0.007333, over 15796.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09265, pruned_loss=0.0137, audio_tagging_loss=0.008959, over 3052500.08 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:40:51,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2740740.0, ans=0.1 2023-11-24 07:40:55,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-24 07:40:59,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2023-11-24 07:41:15,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-24 07:41:26,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2740940.0, ans=0.0 2023-11-24 07:41:28,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411150 2023-11-24 07:41:37,507 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:41:41,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2741006.6666666665, ans=0.0 2023-11-24 07:41:44,649 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2350, loss[loss=0.07178, simple_loss=0.1049, pruned_loss=0.012, audio_tagging_loss=0.007346, over 16612.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09261, pruned_loss=0.01369, audio_tagging_loss=0.009011, over 3051021.22 frames. ], batch size: 60, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:41:46,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2741073.3333333335, ans=0.5 2023-11-24 07:41:47,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.408e+01 9.223e+01 9.902e+01 1.512e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-24 07:41:57,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2741140.0, ans=0.0 2023-11-24 07:42:02,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-24 07:42:02,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2741140.0, ans=0.125 2023-11-24 07:42:04,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2023-11-24 07:42:29,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-24 07:42:31,344 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411200 2023-11-24 07:42:37,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2741340.0, ans=10.0 2023-11-24 07:42:44,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2741340.0, ans=0.1 2023-11-24 07:42:46,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2741406.6666666665, ans=0.125 2023-11-24 07:42:46,999 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2400, loss[loss=0.06942, simple_loss=0.09666, pruned_loss=0.01338, audio_tagging_loss=0.007708, over 14977.00 frames. ], tot_loss[loss=0.06932, simple_loss=0.09298, pruned_loss=0.01372, audio_tagging_loss=0.009117, over 3052065.24 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:42:52,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741406.6666666665, ans=0.1 2023-11-24 07:42:55,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2023-11-24 07:42:55,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-24 07:43:04,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2741473.3333333335, ans=0.125 2023-11-24 07:43:25,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-11-24 07:43:33,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411250 2023-11-24 07:43:49,967 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2450, loss[loss=0.07344, simple_loss=0.1087, pruned_loss=0.00967, audio_tagging_loss=0.009409, over 14761.00 frames. ], tot_loss[loss=0.069, simple_loss=0.09272, pruned_loss=0.01347, audio_tagging_loss=0.009176, over 3048508.58 frames. ], batch size: 53, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:43:52,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.523e+01 9.052e+01 9.834e+01 1.481e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 07:44:02,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2741806.6666666665, ans=0.2 2023-11-24 07:44:08,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741806.6666666665, ans=0.1 2023-11-24 07:44:23,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2741873.3333333335, ans=0.0 2023-11-24 07:44:36,413 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411300 2023-11-24 07:44:53,082 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2500, loss[loss=0.06519, simple_loss=0.08665, pruned_loss=0.01367, audio_tagging_loss=0.008192, over 15234.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09189, pruned_loss=0.01334, audio_tagging_loss=0.00918, over 3045867.12 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:45:06,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-24 07:45:33,168 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:45:39,715 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411350 2023-11-24 07:45:48,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2742340.0, ans=0.0 2023-11-24 07:45:55,249 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2550, loss[loss=0.07279, simple_loss=0.09601, pruned_loss=0.01223, audio_tagging_loss=0.01255, over 14505.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09227, pruned_loss=0.01345, audio_tagging_loss=0.009005, over 3049247.29 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:45:57,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.522e+01 9.145e+01 9.945e+01 1.198e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 07:45:59,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2742406.6666666665, ans=0.07 2023-11-24 07:46:22,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2742540.0, ans=0.5 2023-11-24 07:46:37,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2742606.6666666665, ans=0.125 2023-11-24 07:46:42,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411400 2023-11-24 07:46:54,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2742673.3333333335, ans=0.125 2023-11-24 07:46:58,260 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2600, loss[loss=0.06505, simple_loss=0.09379, pruned_loss=0.007927, audio_tagging_loss=0.01023, over 15055.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09187, pruned_loss=0.01333, audio_tagging_loss=0.008985, over 3055302.98 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:47:01,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2742740.0, ans=0.2 2023-11-24 07:47:09,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2742740.0, ans=0.025 2023-11-24 07:47:17,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-24 07:47:20,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2742806.6666666665, ans=0.0 2023-11-24 07:47:29,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2742873.3333333335, ans=0.09899494936611666 2023-11-24 07:47:44,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411450 2023-11-24 07:47:54,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2743006.6666666665, ans=0.2 2023-11-24 07:47:55,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2743006.6666666665, ans=0.125 2023-11-24 07:48:01,088 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2650, loss[loss=0.07418, simple_loss=0.1029, pruned_loss=0.01378, audio_tagging_loss=0.008939, over 14890.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.0916, pruned_loss=0.01331, audio_tagging_loss=0.008838, over 3054187.96 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:48:03,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.476e+01 9.136e+01 9.874e+01 1.198e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 07:48:14,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2023-11-24 07:48:24,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2743206.6666666665, ans=0.125 2023-11-24 07:48:47,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411500 2023-11-24 07:48:47,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2743273.3333333335, ans=0.125 2023-11-24 07:48:48,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-24 07:48:55,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2743340.0, ans=10.0 2023-11-24 07:49:03,295 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2700, loss[loss=0.06074, simple_loss=0.08382, pruned_loss=0.009968, audio_tagging_loss=0.008862, over 15929.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09156, pruned_loss=0.01324, audio_tagging_loss=0.008773, over 3054879.57 frames. ], batch size: 60, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:49:06,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-24 07:49:21,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2743473.3333333335, ans=0.1 2023-11-24 07:49:37,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2743540.0, ans=0.125 2023-11-24 07:49:42,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2743606.6666666665, ans=15.0 2023-11-24 07:49:49,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411550 2023-11-24 07:49:49,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2743606.6666666665, ans=0.125 2023-11-24 07:49:55,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2743673.3333333335, ans=0.09899494936611666 2023-11-24 07:50:05,637 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2750, loss[loss=0.06401, simple_loss=0.08612, pruned_loss=0.01253, audio_tagging_loss=0.008423, over 16150.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09145, pruned_loss=0.01323, audio_tagging_loss=0.008772, over 3054147.35 frames. ], batch size: 61, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:50:09,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.410e+01 9.303e+01 9.931e+01 1.664e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-24 07:50:44,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-24 07:50:45,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2743940.0, ans=0.1 2023-11-24 07:50:46,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2743940.0, ans=0.0 2023-11-24 07:50:46,786 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:50:51,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411600 2023-11-24 07:50:57,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2744006.6666666665, ans=0.04949747468305833 2023-11-24 07:50:59,877 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:51:00,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2744006.6666666665, ans=0.2 2023-11-24 07:51:03,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2744006.6666666665, ans=0.125 2023-11-24 07:51:07,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2744073.3333333335, ans=0.035 2023-11-24 07:51:08,608 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2800, loss[loss=0.08099, simple_loss=0.1064, pruned_loss=0.01862, audio_tagging_loss=0.009164, over 15909.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09194, pruned_loss=0.01337, audio_tagging_loss=0.008824, over 3046079.64 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:51:23,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2744140.0, ans=0.0 2023-11-24 07:51:31,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2744206.6666666665, ans=0.125 2023-11-24 07:51:49,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2744273.3333333335, ans=0.1 2023-11-24 07:51:54,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411650 2023-11-24 07:51:56,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2023-11-24 07:52:10,211 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2850, loss[loss=0.07579, simple_loss=0.1068, pruned_loss=0.01537, audio_tagging_loss=0.00703, over 15684.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09202, pruned_loss=0.01355, audio_tagging_loss=0.008822, over 3042644.90 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:52:14,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.559e+01 8.889e+01 9.637e+01 1.206e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-24 07:52:31,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-11-24 07:52:43,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2744540.0, ans=0.125 2023-11-24 07:52:46,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-24 07:52:47,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2744606.6666666665, ans=0.0 2023-11-24 07:52:53,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.76 vs. limit=10.0 2023-11-24 07:52:56,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411700 2023-11-24 07:53:08,116 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 07:53:12,559 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2900, loss[loss=0.04378, simple_loss=0.06485, pruned_loss=0.004756, audio_tagging_loss=0.006599, over 15485.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09131, pruned_loss=0.01338, audio_tagging_loss=0.008858, over 3042455.91 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:53:14,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2744740.0, ans=0.125 2023-11-24 07:53:37,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-24 07:53:47,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2744873.3333333335, ans=0.125 2023-11-24 07:53:59,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411750 2023-11-24 07:53:59,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-24 07:54:12,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2745006.6666666665, ans=0.125 2023-11-24 07:54:16,229 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 2950, loss[loss=0.06612, simple_loss=0.08693, pruned_loss=0.01289, audio_tagging_loss=0.009762, over 14808.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09182, pruned_loss=0.01351, audio_tagging_loss=0.008826, over 3040514.39 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 07:54:16,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2745073.3333333335, ans=0.125 2023-11-24 07:54:19,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.832e+01 9.434e+01 1.012e+02 1.234e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-24 07:54:34,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2745140.0, ans=0.0 2023-11-24 07:54:37,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2023-11-24 07:54:46,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2745206.6666666665, ans=0.2 2023-11-24 07:54:49,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2745206.6666666665, ans=0.0 2023-11-24 07:55:02,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411800 2023-11-24 07:55:06,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2745340.0, ans=0.0 2023-11-24 07:55:06,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2745340.0, ans=0.125 2023-11-24 07:55:18,154 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3000, loss[loss=0.07919, simple_loss=0.1083, pruned_loss=0.01736, audio_tagging_loss=0.007695, over 14895.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09241, pruned_loss=0.01352, audio_tagging_loss=0.008907, over 3048579.35 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:55:18,155 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 07:56:00,430 INFO [train_asr.py:1253] (1/4) Epoch 35, validation: loss=0.05789, simple_loss=0.05083, pruned_loss=0.005097, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-24 07:56:00,431 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 07:56:27,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2745540.0, ans=0.1 2023-11-24 07:56:39,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2745606.6666666665, ans=0.0 2023-11-24 07:56:46,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411850 2023-11-24 07:57:02,428 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3050, loss[loss=0.06977, simple_loss=0.09649, pruned_loss=0.01203, audio_tagging_loss=0.009498, over 14956.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09244, pruned_loss=0.01358, audio_tagging_loss=0.008927, over 3045922.56 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:57:07,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.656e+01 9.299e+01 1.004e+02 2.054e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-24 07:57:13,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-11-24 07:57:14,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2745806.6666666665, ans=0.125 2023-11-24 07:57:21,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2745806.6666666665, ans=0.1 2023-11-24 07:57:27,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-24 07:57:37,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-24 07:57:38,854 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 07:57:43,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2745940.0, ans=0.0 2023-11-24 07:57:44,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2745940.0, ans=0.1 2023-11-24 07:57:49,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411900 2023-11-24 07:58:01,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-24 07:58:04,296 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3100, loss[loss=0.06085, simple_loss=0.07433, pruned_loss=0.01145, audio_tagging_loss=0.01224, over 13945.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09322, pruned_loss=0.01371, audio_tagging_loss=0.009008, over 3043725.06 frames. ], batch size: 54, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:58:12,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2746073.3333333335, ans=0.0 2023-11-24 07:58:22,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2023-11-24 07:58:31,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2746206.6666666665, ans=0.125 2023-11-24 07:58:37,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2746206.6666666665, ans=0.1 2023-11-24 07:58:42,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2023-11-24 07:58:47,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2023-11-24 07:58:50,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 411950 2023-11-24 07:58:57,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-24 07:58:57,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2746340.0, ans=0.0 2023-11-24 07:59:01,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2746340.0, ans=0.125 2023-11-24 07:59:02,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2746340.0, ans=0.0 2023-11-24 07:59:05,905 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3150, loss[loss=0.06286, simple_loss=0.08215, pruned_loss=0.01196, audio_tagging_loss=0.009822, over 15166.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09274, pruned_loss=0.01354, audio_tagging_loss=0.009043, over 3042362.49 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 07:59:11,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.533e+01 9.313e+01 1.001e+02 1.475e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 07:59:28,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-24 07:59:43,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2746606.6666666665, ans=0.125 2023-11-24 07:59:51,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412000 2023-11-24 07:59:52,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2746606.6666666665, ans=0.125 2023-11-24 08:00:08,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2746673.3333333335, ans=0.125 2023-11-24 08:00:12,752 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3200, loss[loss=0.06788, simple_loss=0.0868, pruned_loss=0.01434, audio_tagging_loss=0.01014, over 15208.00 frames. ], tot_loss[loss=0.06896, simple_loss=0.09266, pruned_loss=0.01352, audio_tagging_loss=0.009107, over 3045675.77 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:00:18,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2746740.0, ans=15.0 2023-11-24 08:00:34,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-11-24 08:00:42,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.54 vs. limit=10.0 2023-11-24 08:00:57,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2746940.0, ans=0.125 2023-11-24 08:00:58,703 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412050 2023-11-24 08:01:06,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-24 08:01:14,288 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3250, loss[loss=0.06108, simple_loss=0.0846, pruned_loss=0.01196, audio_tagging_loss=0.00682, over 14410.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09161, pruned_loss=0.01334, audio_tagging_loss=0.009205, over 3041452.40 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:01:20,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.341e+01 8.971e+01 9.631e+01 1.385e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-24 08:01:32,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747140.0, ans=0.1 2023-11-24 08:01:51,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2747273.3333333335, ans=0.125 2023-11-24 08:01:53,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2747273.3333333335, ans=0.0 2023-11-24 08:01:54,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2747273.3333333335, ans=0.5 2023-11-24 08:02:00,435 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412100 2023-11-24 08:02:06,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2747340.0, ans=0.0 2023-11-24 08:02:11,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2747340.0, ans=0.2 2023-11-24 08:02:15,748 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3300, loss[loss=0.07464, simple_loss=0.1063, pruned_loss=0.0132, audio_tagging_loss=0.008281, over 15523.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.0922, pruned_loss=0.01351, audio_tagging_loss=0.00913, over 3039231.33 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:02:16,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2747406.6666666665, ans=0.1 2023-11-24 08:02:18,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-24 08:02:33,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2747473.3333333335, ans=0.1 2023-11-24 08:02:56,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2747606.6666666665, ans=0.5 2023-11-24 08:03:01,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412150 2023-11-24 08:03:05,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2747673.3333333335, ans=0.125 2023-11-24 08:03:19,329 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3350, loss[loss=0.06749, simple_loss=0.09647, pruned_loss=0.01072, audio_tagging_loss=0.008531, over 15575.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.09276, pruned_loss=0.01364, audio_tagging_loss=0.009094, over 3045048.31 frames. ], batch size: 60, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:03:22,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2023-11-24 08:03:25,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.617e+01 9.240e+01 1.018e+02 1.398e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 08:03:37,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2747806.6666666665, ans=0.0 2023-11-24 08:03:47,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2747873.3333333335, ans=0.0 2023-11-24 08:03:48,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2747873.3333333335, ans=0.125 2023-11-24 08:03:54,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2747940.0, ans=0.0 2023-11-24 08:04:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2747940.0, ans=0.09899494936611666 2023-11-24 08:04:04,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412200 2023-11-24 08:04:04,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2747940.0, ans=0.2 2023-11-24 08:04:09,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2748006.6666666665, ans=0.125 2023-11-24 08:04:10,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2748006.6666666665, ans=0.09899494936611666 2023-11-24 08:04:19,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748073.3333333335, ans=0.1 2023-11-24 08:04:20,959 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3400, loss[loss=0.04107, simple_loss=0.04772, pruned_loss=0.006701, audio_tagging_loss=0.01051, over 14046.00 frames. ], tot_loss[loss=0.0689, simple_loss=0.09267, pruned_loss=0.01362, audio_tagging_loss=0.00895, over 3055606.73 frames. ], batch size: 54, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:04:24,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2748073.3333333335, ans=0.125 2023-11-24 08:04:33,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2748140.0, ans=0.2 2023-11-24 08:04:43,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2748140.0, ans=0.125 2023-11-24 08:05:07,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412250 2023-11-24 08:05:09,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748340.0, ans=0.1 2023-11-24 08:05:15,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-24 08:05:22,625 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3450, loss[loss=0.0512, simple_loss=0.05649, pruned_loss=0.0103, audio_tagging_loss=0.01265, over 14400.00 frames. ], tot_loss[loss=0.06913, simple_loss=0.09323, pruned_loss=0.01369, audio_tagging_loss=0.008825, over 3053355.20 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:05:28,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.590e+01 9.237e+01 1.003e+02 1.214e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 08:05:30,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2748406.6666666665, ans=15.0 2023-11-24 08:05:37,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2748473.3333333335, ans=0.125 2023-11-24 08:05:38,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2748473.3333333335, ans=0.125 2023-11-24 08:06:06,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2748606.6666666665, ans=0.1 2023-11-24 08:06:07,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2748606.6666666665, ans=0.125 2023-11-24 08:06:08,528 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412300 2023-11-24 08:06:10,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-24 08:06:23,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2748673.3333333335, ans=0.0 2023-11-24 08:06:25,691 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3500, loss[loss=0.07866, simple_loss=0.1033, pruned_loss=0.01778, audio_tagging_loss=0.009232, over 15084.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09298, pruned_loss=0.01358, audio_tagging_loss=0.008794, over 3053888.49 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:06:46,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=22.5 2023-11-24 08:06:57,199 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:07:12,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412350 2023-11-24 08:07:17,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2749006.6666666665, ans=0.125 2023-11-24 08:07:21,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-24 08:07:27,850 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3550, loss[loss=0.07244, simple_loss=0.1016, pruned_loss=0.01276, audio_tagging_loss=0.008885, over 15423.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09259, pruned_loss=0.01336, audio_tagging_loss=0.008769, over 3053002.17 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:07:33,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.276e+01 8.953e+01 9.644e+01 1.310e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-24 08:07:34,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2749073.3333333335, ans=0.0 2023-11-24 08:07:51,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2749206.6666666665, ans=0.125 2023-11-24 08:08:06,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2749273.3333333335, ans=0.125 2023-11-24 08:08:14,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412400 2023-11-24 08:08:30,062 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3600, loss[loss=0.04903, simple_loss=0.0667, pruned_loss=0.005706, audio_tagging_loss=0.009974, over 15036.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09207, pruned_loss=0.01325, audio_tagging_loss=0.008801, over 3051389.18 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:08:54,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2749540.0, ans=0.1 2023-11-24 08:09:16,453 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412450 2023-11-24 08:09:21,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2749673.3333333335, ans=0.2 2023-11-24 08:09:33,399 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3650, loss[loss=0.07684, simple_loss=0.09978, pruned_loss=0.01896, audio_tagging_loss=0.007987, over 15380.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.0927, pruned_loss=0.0135, audio_tagging_loss=0.008711, over 3052873.29 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:09:38,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2749740.0, ans=0.0 2023-11-24 08:09:39,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.298e+01 8.982e+01 9.715e+01 1.238e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-24 08:10:19,967 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412500 2023-11-24 08:10:23,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2750006.6666666665, ans=0.0 2023-11-24 08:10:35,356 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3700, loss[loss=0.07353, simple_loss=0.1, pruned_loss=0.0153, audio_tagging_loss=0.008207, over 14981.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09165, pruned_loss=0.01342, audio_tagging_loss=0.00875, over 3054132.57 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:11:21,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412550 2023-11-24 08:11:21,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2750273.3333333335, ans=0.2 2023-11-24 08:11:25,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2023-11-24 08:11:33,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2750340.0, ans=0.125 2023-11-24 08:11:34,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2750340.0, ans=0.125 2023-11-24 08:11:37,807 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3750, loss[loss=0.07711, simple_loss=0.09786, pruned_loss=0.01868, audio_tagging_loss=0.009497, over 15603.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09185, pruned_loss=0.01352, audio_tagging_loss=0.008884, over 3054613.14 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:11:44,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.598e+01 9.101e+01 9.921e+01 1.299e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 08:11:53,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2750473.3333333335, ans=0.0 2023-11-24 08:11:57,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2750473.3333333335, ans=0.1 2023-11-24 08:11:58,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2750473.3333333335, ans=0.125 2023-11-24 08:11:59,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:12:07,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-11-24 08:12:15,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-24 08:12:20,794 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:12:24,311 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412600 2023-11-24 08:12:31,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2750673.3333333335, ans=0.1 2023-11-24 08:12:40,944 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3800, loss[loss=0.06257, simple_loss=0.07522, pruned_loss=0.01559, audio_tagging_loss=0.00937, over 15026.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09174, pruned_loss=0.01349, audio_tagging_loss=0.008933, over 3057563.46 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:12:57,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-24 08:13:07,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-24 08:13:20,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2023-11-24 08:13:27,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412650 2023-11-24 08:13:35,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-11-24 08:13:43,141 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3850, loss[loss=0.06898, simple_loss=0.09982, pruned_loss=0.01205, audio_tagging_loss=0.007021, over 15711.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09143, pruned_loss=0.01325, audio_tagging_loss=0.00904, over 3048753.99 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:13:50,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.505e+01 9.405e+01 9.842e+01 1.302e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-24 08:14:00,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2751140.0, ans=0.0 2023-11-24 08:14:08,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2751206.6666666665, ans=0.125 2023-11-24 08:14:19,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2751206.6666666665, ans=15.0 2023-11-24 08:14:21,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2751273.3333333335, ans=0.125 2023-11-24 08:14:29,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412700 2023-11-24 08:14:36,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2751340.0, ans=0.2 2023-11-24 08:14:38,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-24 08:14:45,018 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3900, loss[loss=0.0972, simple_loss=0.1296, pruned_loss=0.0255, audio_tagging_loss=0.006924, over 14800.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09088, pruned_loss=0.01298, audio_tagging_loss=0.009096, over 3049162.12 frames. ], batch size: 53, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:15:04,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2751473.3333333335, ans=0.0 2023-11-24 08:15:04,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2751473.3333333335, ans=0.0 2023-11-24 08:15:04,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2751473.3333333335, ans=0.0 2023-11-24 08:15:08,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2751473.3333333335, ans=0.125 2023-11-24 08:15:09,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2751540.0, ans=0.1 2023-11-24 08:15:20,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.07 vs. limit=10.0 2023-11-24 08:15:30,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412750 2023-11-24 08:15:37,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-24 08:15:47,555 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 3950, loss[loss=0.06593, simple_loss=0.09783, pruned_loss=0.009621, audio_tagging_loss=0.007393, over 16508.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09144, pruned_loss=0.01323, audio_tagging_loss=0.009134, over 3050303.44 frames. ], batch size: 61, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:15:55,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.460e+01 9.070e+01 9.889e+01 1.315e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 08:16:13,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2751873.3333333335, ans=0.125 2023-11-24 08:16:29,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2751940.0, ans=0.2 2023-11-24 08:16:29,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2751940.0, ans=0.0 2023-11-24 08:16:33,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412800 2023-11-24 08:16:49,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2752073.3333333335, ans=0.125 2023-11-24 08:16:50,256 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4000, loss[loss=0.06002, simple_loss=0.08218, pruned_loss=0.008168, audio_tagging_loss=0.01076, over 15142.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09126, pruned_loss=0.01326, audio_tagging_loss=0.009178, over 3044998.00 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:16:55,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2752073.3333333335, ans=0.0 2023-11-24 08:17:03,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2752140.0, ans=0.0 2023-11-24 08:17:10,338 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:17:20,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2752206.6666666665, ans=0.125 2023-11-24 08:17:31,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=22.5 2023-11-24 08:17:36,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412850 2023-11-24 08:17:50,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2752406.6666666665, ans=0.2 2023-11-24 08:17:51,856 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4050, loss[loss=0.07736, simple_loss=0.09857, pruned_loss=0.01695, audio_tagging_loss=0.01112, over 14753.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09074, pruned_loss=0.01312, audio_tagging_loss=0.009147, over 3039664.70 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:17:54,227 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:17:58,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.603e+01 9.245e+01 9.992e+01 1.368e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 08:18:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2752540.0, ans=0.125 2023-11-24 08:18:36,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752606.6666666665, ans=0.1 2023-11-24 08:18:38,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412900 2023-11-24 08:18:54,264 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4100, loss[loss=0.06096, simple_loss=0.07813, pruned_loss=0.01201, audio_tagging_loss=0.009887, over 15548.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09148, pruned_loss=0.01329, audio_tagging_loss=0.00907, over 3042600.33 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:19:07,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2752806.6666666665, ans=0.125 2023-11-24 08:19:17,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2752806.6666666665, ans=0.125 2023-11-24 08:19:24,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752873.3333333335, ans=0.1 2023-11-24 08:19:33,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2752940.0, ans=0.125 2023-11-24 08:19:40,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2752940.0, ans=0.125 2023-11-24 08:19:41,107 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 412950 2023-11-24 08:19:57,690 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4150, loss[loss=0.07429, simple_loss=0.1028, pruned_loss=0.01678, audio_tagging_loss=0.006088, over 15810.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09135, pruned_loss=0.01332, audio_tagging_loss=0.009056, over 3046446.94 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:20:00,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2753073.3333333335, ans=0.05 2023-11-24 08:20:04,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.354e+01 9.149e+01 1.001e+02 1.406e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 08:20:13,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-11-24 08:20:26,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-24 08:20:43,174 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:20:44,461 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413000 2023-11-24 08:20:45,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2753273.3333333335, ans=0.0 2023-11-24 08:20:50,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2753340.0, ans=0.1 2023-11-24 08:21:00,131 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4200, loss[loss=0.06371, simple_loss=0.09065, pruned_loss=0.009258, audio_tagging_loss=0.009125, over 15033.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09071, pruned_loss=0.01328, audio_tagging_loss=0.008888, over 3042566.66 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:21:05,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2753406.6666666665, ans=0.0 2023-11-24 08:21:15,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-24 08:21:19,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2753473.3333333335, ans=0.04949747468305833 2023-11-24 08:21:26,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2753540.0, ans=0.125 2023-11-24 08:21:31,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2753540.0, ans=0.125 2023-11-24 08:21:32,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2023-11-24 08:21:46,638 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413050 2023-11-24 08:21:48,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-24 08:22:01,968 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4250, loss[loss=0.05828, simple_loss=0.07716, pruned_loss=0.01259, audio_tagging_loss=0.007106, over 16913.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09202, pruned_loss=0.01349, audio_tagging_loss=0.008732, over 3044553.19 frames. ], batch size: 63, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:22:11,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.341e+01 8.897e+01 9.785e+01 1.363e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-24 08:22:32,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2753873.3333333335, ans=0.0 2023-11-24 08:22:37,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2753873.3333333335, ans=0.2 2023-11-24 08:22:48,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413100 2023-11-24 08:23:05,786 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4300, loss[loss=0.07299, simple_loss=0.1019, pruned_loss=0.01437, audio_tagging_loss=0.00768, over 15626.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09186, pruned_loss=0.01337, audio_tagging_loss=0.008674, over 3054726.26 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:23:09,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754073.3333333335, ans=0.1 2023-11-24 08:23:14,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-24 08:23:25,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2754140.0, ans=0.125 2023-11-24 08:23:25,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-24 08:23:28,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-24 08:23:43,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2754273.3333333335, ans=0.125 2023-11-24 08:23:52,250 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413150 2023-11-24 08:24:07,364 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4350, loss[loss=0.08473, simple_loss=0.1189, pruned_loss=0.01586, audio_tagging_loss=0.009399, over 15825.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09207, pruned_loss=0.01349, audio_tagging_loss=0.008601, over 3054012.87 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:24:15,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.729e+01 9.185e+01 1.017e+02 1.222e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-24 08:24:17,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2754406.6666666665, ans=0.0 2023-11-24 08:24:24,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2754473.3333333335, ans=0.125 2023-11-24 08:24:30,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2754473.3333333335, ans=0.2 2023-11-24 08:24:40,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2754540.0, ans=0.0 2023-11-24 08:24:41,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2023-11-24 08:24:53,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413200 2023-11-24 08:25:09,259 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4400, loss[loss=0.0687, simple_loss=0.1012, pruned_loss=0.01164, audio_tagging_loss=0.006471, over 15702.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09218, pruned_loss=0.0135, audio_tagging_loss=0.008617, over 3048661.76 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:25:29,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2754806.6666666665, ans=0.0 2023-11-24 08:25:42,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2754873.3333333335, ans=0.125 2023-11-24 08:25:55,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413250 2023-11-24 08:26:12,566 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4450, loss[loss=0.04562, simple_loss=0.05334, pruned_loss=0.008003, audio_tagging_loss=0.01094, over 14337.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09202, pruned_loss=0.01343, audio_tagging_loss=0.008638, over 3045645.20 frames. ], batch size: 54, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:26:18,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2755073.3333333335, ans=0.1 2023-11-24 08:26:20,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.576e+01 8.416e+01 8.955e+01 9.921e+01 1.330e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-24 08:26:23,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2023-11-24 08:26:25,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=22.5 2023-11-24 08:26:33,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-11-24 08:26:46,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=22.5 2023-11-24 08:26:58,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413300 2023-11-24 08:27:05,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2755340.0, ans=10.0 2023-11-24 08:27:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2755340.0, ans=0.02 2023-11-24 08:27:14,667 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4500, loss[loss=0.07203, simple_loss=0.08898, pruned_loss=0.01768, audio_tagging_loss=0.009858, over 15360.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09187, pruned_loss=0.01342, audio_tagging_loss=0.008623, over 3048391.75 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:27:19,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2755406.6666666665, ans=0.2 2023-11-24 08:27:27,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=8.0 2023-11-24 08:27:28,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755473.3333333335, ans=0.1 2023-11-24 08:27:30,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-24 08:27:54,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-24 08:28:00,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413350 2023-11-24 08:28:12,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2755673.3333333335, ans=0.0 2023-11-24 08:28:15,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2755740.0, ans=0.125 2023-11-24 08:28:15,964 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4550, loss[loss=0.06218, simple_loss=0.08318, pruned_loss=0.01106, audio_tagging_loss=0.009532, over 15322.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09157, pruned_loss=0.01336, audio_tagging_loss=0.008752, over 3046441.62 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:28:18,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2755740.0, ans=0.2 2023-11-24 08:28:19,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-24 08:28:24,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.588e+01 9.174e+01 1.003e+02 1.201e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-24 08:28:47,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2755873.3333333335, ans=0.1 2023-11-24 08:28:57,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-24 08:29:02,844 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413400 2023-11-24 08:29:03,983 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:29:05,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2756006.6666666665, ans=0.125 2023-11-24 08:29:07,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2756006.6666666665, ans=0.0 2023-11-24 08:29:16,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-11-24 08:29:19,844 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4600, loss[loss=0.05721, simple_loss=0.07504, pruned_loss=0.01125, audio_tagging_loss=0.008443, over 13935.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09112, pruned_loss=0.01334, audio_tagging_loss=0.008791, over 3037574.08 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:29:39,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2756140.0, ans=0.0 2023-11-24 08:30:04,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2756273.3333333335, ans=0.1 2023-11-24 08:30:06,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413450 2023-11-24 08:30:15,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2756340.0, ans=0.125 2023-11-24 08:30:22,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2756406.6666666665, ans=0.125 2023-11-24 08:30:22,930 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4650, loss[loss=0.07384, simple_loss=0.1017, pruned_loss=0.01449, audio_tagging_loss=0.008509, over 14433.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09094, pruned_loss=0.01333, audio_tagging_loss=0.008887, over 3044306.49 frames. ], batch size: 54, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:30:29,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2756406.6666666665, ans=0.125 2023-11-24 08:30:29,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2756406.6666666665, ans=6.0 2023-11-24 08:30:31,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.404e+01 8.988e+01 9.502e+01 1.265e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-24 08:30:37,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2756473.3333333335, ans=0.2 2023-11-24 08:30:40,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2756473.3333333335, ans=0.1 2023-11-24 08:31:08,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413500 2023-11-24 08:31:20,783 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:31:24,022 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4700, loss[loss=0.08154, simple_loss=0.107, pruned_loss=0.01818, audio_tagging_loss=0.009857, over 14705.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09153, pruned_loss=0.01354, audio_tagging_loss=0.00895, over 3051924.13 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:31:25,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-24 08:31:30,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2756740.0, ans=0.05 2023-11-24 08:31:41,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-24 08:31:43,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2756806.6666666665, ans=0.125 2023-11-24 08:32:10,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413550 2023-11-24 08:32:14,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2757006.6666666665, ans=0.05 2023-11-24 08:32:18,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2757006.6666666665, ans=0.0 2023-11-24 08:32:26,980 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4750, loss[loss=0.07802, simple_loss=0.1019, pruned_loss=0.018, audio_tagging_loss=0.009083, over 15148.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09074, pruned_loss=0.01345, audio_tagging_loss=0.009125, over 3051015.39 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 08:32:32,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2757073.3333333335, ans=0.09899494936611666 2023-11-24 08:32:38,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.531e+01 9.142e+01 9.868e+01 1.183e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 08:32:58,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-24 08:33:00,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2757206.6666666665, ans=0.0 2023-11-24 08:33:07,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2757273.3333333335, ans=0.125 2023-11-24 08:33:10,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2757273.3333333335, ans=0.125 2023-11-24 08:33:13,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413600 2023-11-24 08:33:13,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2757273.3333333335, ans=0.125 2023-11-24 08:33:16,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2757340.0, ans=0.125 2023-11-24 08:33:29,351 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4800, loss[loss=0.06135, simple_loss=0.07701, pruned_loss=0.01457, audio_tagging_loss=0.008277, over 15759.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09129, pruned_loss=0.01353, audio_tagging_loss=0.009243, over 3052057.25 frames. ], batch size: 61, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:33:43,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757473.3333333335, ans=0.1 2023-11-24 08:33:50,739 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:34:04,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2757540.0, ans=0.125 2023-11-24 08:34:10,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2757606.6666666665, ans=0.125 2023-11-24 08:34:11,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-24 08:34:15,815 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413650 2023-11-24 08:34:20,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2757673.3333333335, ans=0.125 2023-11-24 08:34:31,805 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4850, loss[loss=0.07225, simple_loss=0.0911, pruned_loss=0.01386, audio_tagging_loss=0.01285, over 15155.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09226, pruned_loss=0.01358, audio_tagging_loss=0.009286, over 3054237.24 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:34:32,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2757740.0, ans=0.2 2023-11-24 08:34:35,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2757740.0, ans=0.0 2023-11-24 08:34:42,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.773e+01 9.268e+01 1.019e+02 1.463e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 08:34:51,982 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:35:10,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757940.0, ans=0.1 2023-11-24 08:35:13,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2757940.0, ans=0.125 2023-11-24 08:35:18,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413700 2023-11-24 08:35:22,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-24 08:35:29,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2758006.6666666665, ans=0.125 2023-11-24 08:35:34,415 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4900, loss[loss=0.06223, simple_loss=0.09059, pruned_loss=0.01184, audio_tagging_loss=0.005093, over 16221.00 frames. ], tot_loss[loss=0.06861, simple_loss=0.09216, pruned_loss=0.01339, audio_tagging_loss=0.00914, over 3046726.14 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:35:56,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-24 08:36:08,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2758206.6666666665, ans=0.125 2023-11-24 08:36:10,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2758273.3333333335, ans=0.125 2023-11-24 08:36:20,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413750 2023-11-24 08:36:27,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2758340.0, ans=0.125 2023-11-24 08:36:33,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2758340.0, ans=0.0 2023-11-24 08:36:35,235 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:36:37,225 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 4950, loss[loss=0.04835, simple_loss=0.05745, pruned_loss=0.01039, audio_tagging_loss=0.009234, over 14232.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09148, pruned_loss=0.01334, audio_tagging_loss=0.008973, over 3047869.16 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:36:38,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2758406.6666666665, ans=0.04949747468305833 2023-11-24 08:36:47,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.295e+01 8.995e+01 9.901e+01 1.240e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-24 08:36:50,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2758473.3333333335, ans=0.125 2023-11-24 08:36:53,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2758473.3333333335, ans=0.5 2023-11-24 08:37:24,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413800 2023-11-24 08:37:28,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-24 08:37:39,928 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5000, loss[loss=0.05666, simple_loss=0.08316, pruned_loss=0.006206, audio_tagging_loss=0.008871, over 15074.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09108, pruned_loss=0.01318, audio_tagging_loss=0.008857, over 3043546.86 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:38:12,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2758873.3333333335, ans=0.0 2023-11-24 08:38:15,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2758873.3333333335, ans=0.125 2023-11-24 08:38:19,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-24 08:38:24,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2758940.0, ans=0.125 2023-11-24 08:38:26,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413850 2023-11-24 08:38:42,455 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5050, loss[loss=0.05508, simple_loss=0.07708, pruned_loss=0.01005, audio_tagging_loss=0.006489, over 15614.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09162, pruned_loss=0.01327, audio_tagging_loss=0.008731, over 3041561.39 frames. ], batch size: 63, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:38:54,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.375e+01 9.094e+01 9.590e+01 1.351e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 08:39:04,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2759140.0, ans=0.125 2023-11-24 08:39:26,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-24 08:39:27,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2759273.3333333335, ans=0.1 2023-11-24 08:39:28,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413900 2023-11-24 08:39:45,560 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5100, loss[loss=0.07248, simple_loss=0.09919, pruned_loss=0.01389, audio_tagging_loss=0.008995, over 15458.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09136, pruned_loss=0.01328, audio_tagging_loss=0.008723, over 3037501.72 frames. ], batch size: 60, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:40:31,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 413950 2023-11-24 08:40:46,912 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5150, loss[loss=0.04526, simple_loss=0.05745, pruned_loss=0.007346, audio_tagging_loss=0.009187, over 15053.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09047, pruned_loss=0.01311, audio_tagging_loss=0.008768, over 3045371.43 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:40:48,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2759740.0, ans=0.125 2023-11-24 08:40:53,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2759740.0, ans=0.0 2023-11-24 08:40:57,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2759740.0, ans=0.0 2023-11-24 08:40:58,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.417e+01 9.000e+01 9.770e+01 1.432e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-24 08:41:11,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2759873.3333333335, ans=0.125 2023-11-24 08:41:13,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2759873.3333333335, ans=0.0 2023-11-24 08:41:33,129 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414000 2023-11-24 08:41:43,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2760006.6666666665, ans=0.1 2023-11-24 08:41:49,226 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5200, loss[loss=0.0557, simple_loss=0.07448, pruned_loss=0.007466, audio_tagging_loss=0.01099, over 16909.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09076, pruned_loss=0.01324, audio_tagging_loss=0.008793, over 3048311.27 frames. ], batch size: 66, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:41:51,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-24 08:42:23,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2760206.6666666665, ans=0.1 2023-11-24 08:42:29,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2760273.3333333335, ans=0.125 2023-11-24 08:42:35,973 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414050 2023-11-24 08:42:41,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2760340.0, ans=0.95 2023-11-24 08:42:52,953 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5250, loss[loss=0.05291, simple_loss=0.07184, pruned_loss=0.007423, audio_tagging_loss=0.009571, over 15128.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09084, pruned_loss=0.01328, audio_tagging_loss=0.008858, over 3047805.62 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:42:53,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-24 08:42:54,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2760406.6666666665, ans=0.0 2023-11-24 08:43:01,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2760406.6666666665, ans=0.125 2023-11-24 08:43:03,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.495e+01 9.165e+01 9.828e+01 1.218e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 08:43:39,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414100 2023-11-24 08:43:47,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2760673.3333333335, ans=0.2 2023-11-24 08:43:54,371 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5300, loss[loss=0.06496, simple_loss=0.0856, pruned_loss=0.01484, audio_tagging_loss=0.007318, over 15261.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09072, pruned_loss=0.01324, audio_tagging_loss=0.008823, over 3047758.41 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:44:06,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-24 08:44:09,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2760806.6666666665, ans=0.1 2023-11-24 08:44:39,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2760940.0, ans=0.125 2023-11-24 08:44:40,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414150 2023-11-24 08:44:41,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2760940.0, ans=0.125 2023-11-24 08:44:44,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2761006.6666666665, ans=0.125 2023-11-24 08:44:56,017 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5350, loss[loss=0.06732, simple_loss=0.09118, pruned_loss=0.01293, audio_tagging_loss=0.008799, over 16298.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09054, pruned_loss=0.01309, audio_tagging_loss=0.008883, over 3040570.15 frames. ], batch size: 63, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:45:04,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2761073.3333333335, ans=0.02 2023-11-24 08:45:07,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.463e+01 9.143e+01 9.837e+01 1.383e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 08:45:14,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2761140.0, ans=0.125 2023-11-24 08:45:15,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-24 08:45:41,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414200 2023-11-24 08:45:43,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-24 08:45:58,639 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5400, loss[loss=0.07448, simple_loss=0.1077, pruned_loss=0.01338, audio_tagging_loss=0.007265, over 14783.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08951, pruned_loss=0.0128, audio_tagging_loss=0.008947, over 3031683.19 frames. ], batch size: 52, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:46:12,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2761473.3333333335, ans=0.125 2023-11-24 08:46:16,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2761473.3333333335, ans=0.1 2023-11-24 08:46:30,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2761540.0, ans=0.0 2023-11-24 08:46:43,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2761606.6666666665, ans=0.1 2023-11-24 08:46:44,157 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414250 2023-11-24 08:46:54,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-11-24 08:47:00,092 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5450, loss[loss=0.07149, simple_loss=0.1009, pruned_loss=0.01292, audio_tagging_loss=0.008109, over 14225.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.0907, pruned_loss=0.01306, audio_tagging_loss=0.008934, over 3036220.58 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:47:01,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2761740.0, ans=0.1 2023-11-24 08:47:04,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-24 08:47:11,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.777e+01 9.198e+01 9.795e+01 1.303e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-24 08:47:31,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2761873.3333333335, ans=0.0 2023-11-24 08:47:35,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2761873.3333333335, ans=0.125 2023-11-24 08:47:42,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2761940.0, ans=0.125 2023-11-24 08:47:46,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414300 2023-11-24 08:47:51,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2762006.6666666665, ans=0.2 2023-11-24 08:48:01,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2762073.3333333335, ans=0.0 2023-11-24 08:48:01,964 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5500, loss[loss=0.04446, simple_loss=0.0569, pruned_loss=0.007547, audio_tagging_loss=0.008465, over 14553.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09145, pruned_loss=0.0134, audio_tagging_loss=0.008915, over 3039650.08 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:48:02,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2762073.3333333335, ans=0.0 2023-11-24 08:48:40,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2762273.3333333335, ans=0.2 2023-11-24 08:48:43,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-24 08:48:43,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2762273.3333333335, ans=0.0 2023-11-24 08:48:47,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2762273.3333333335, ans=0.125 2023-11-24 08:48:48,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414350 2023-11-24 08:48:53,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2762340.0, ans=0.0 2023-11-24 08:49:05,643 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5550, loss[loss=0.08817, simple_loss=0.1107, pruned_loss=0.02132, audio_tagging_loss=0.01147, over 15039.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09152, pruned_loss=0.01356, audio_tagging_loss=0.009056, over 3036978.96 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:49:06,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.23 vs. limit=10.0 2023-11-24 08:49:17,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.369e+01 9.115e+01 9.992e+01 1.269e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 08:49:20,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2762473.3333333335, ans=0.125 2023-11-24 08:49:51,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414400 2023-11-24 08:49:58,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:50:04,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-24 08:50:08,632 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5600, loss[loss=0.0571, simple_loss=0.07605, pruned_loss=0.0118, audio_tagging_loss=0.00727, over 15433.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09172, pruned_loss=0.0135, audio_tagging_loss=0.009111, over 3043977.41 frames. ], batch size: 58, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:50:28,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2762806.6666666665, ans=0.1 2023-11-24 08:50:37,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2762873.3333333335, ans=0.0 2023-11-24 08:50:38,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2762873.3333333335, ans=0.125 2023-11-24 08:50:44,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2762940.0, ans=0.0 2023-11-24 08:50:50,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2762940.0, ans=0.125 2023-11-24 08:50:53,446 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:50:54,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414450 2023-11-24 08:51:09,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2763073.3333333335, ans=0.125 2023-11-24 08:51:10,076 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5650, loss[loss=0.06682, simple_loss=0.08467, pruned_loss=0.01324, audio_tagging_loss=0.01125, over 14613.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09143, pruned_loss=0.01335, audio_tagging_loss=0.009188, over 3041359.32 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 32.0 2023-11-24 08:51:11,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-24 08:51:22,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.298e+01 8.960e+01 9.539e+01 1.200e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-24 08:51:29,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2763140.0, ans=0.0 2023-11-24 08:51:32,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2763140.0, ans=0.125 2023-11-24 08:51:42,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2763206.6666666665, ans=0.1 2023-11-24 08:51:45,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2023-11-24 08:51:56,278 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414500 2023-11-24 08:51:56,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2763273.3333333335, ans=0.0 2023-11-24 08:52:12,848 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5700, loss[loss=0.08317, simple_loss=0.1174, pruned_loss=0.01724, audio_tagging_loss=0.007209, over 16222.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09098, pruned_loss=0.01338, audio_tagging_loss=0.009169, over 3041473.09 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:52:53,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2763606.6666666665, ans=0.125 2023-11-24 08:52:58,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414550 2023-11-24 08:53:04,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-24 08:53:07,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2763673.3333333335, ans=0.95 2023-11-24 08:53:15,295 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5750, loss[loss=0.05048, simple_loss=0.06404, pruned_loss=0.00977, audio_tagging_loss=0.008691, over 14550.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09056, pruned_loss=0.01334, audio_tagging_loss=0.009124, over 3044262.20 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:53:28,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.483e+01 9.031e+01 9.613e+01 1.136e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 08:53:35,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2763806.6666666665, ans=0.015 2023-11-24 08:53:43,733 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 08:53:45,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-24 08:54:01,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414600 2023-11-24 08:54:09,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2764006.6666666665, ans=22.5 2023-11-24 08:54:17,427 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5800, loss[loss=0.06735, simple_loss=0.0976, pruned_loss=0.01222, audio_tagging_loss=0.006327, over 14601.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09094, pruned_loss=0.0135, audio_tagging_loss=0.008951, over 3048236.45 frames. ], batch size: 55, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 08:54:17,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2764073.3333333335, ans=0.0 2023-11-24 08:54:35,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2023-11-24 08:55:00,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2764273.3333333335, ans=0.0 2023-11-24 08:55:03,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414650 2023-11-24 08:55:05,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2764273.3333333335, ans=0.0 2023-11-24 08:55:17,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2764340.0, ans=0.125 2023-11-24 08:55:20,525 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5850, loss[loss=0.07294, simple_loss=0.1082, pruned_loss=0.01205, audio_tagging_loss=0.006803, over 15712.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09069, pruned_loss=0.01338, audio_tagging_loss=0.008923, over 3055272.22 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 08:55:26,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-11-24 08:55:29,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2764406.6666666665, ans=0.2 2023-11-24 08:55:36,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.653e+01 9.084e+01 9.959e+01 1.265e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 08:55:40,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-11-24 08:55:52,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2764540.0, ans=0.125 2023-11-24 08:55:57,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2764606.6666666665, ans=0.125 2023-11-24 08:56:05,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2764606.6666666665, ans=0.125 2023-11-24 08:56:07,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414700 2023-11-24 08:56:10,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2764673.3333333335, ans=0.125 2023-11-24 08:56:15,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2764673.3333333335, ans=0.0 2023-11-24 08:56:22,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2764740.0, ans=0.125 2023-11-24 08:56:23,847 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5900, loss[loss=0.06164, simple_loss=0.07896, pruned_loss=0.01026, audio_tagging_loss=0.0119, over 16671.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09173, pruned_loss=0.01356, audio_tagging_loss=0.008806, over 3052883.24 frames. ], batch size: 64, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 08:56:26,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2764740.0, ans=0.2 2023-11-24 08:56:49,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2764873.3333333335, ans=0.125 2023-11-24 08:57:10,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414750 2023-11-24 08:57:26,291 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 5950, loss[loss=0.06575, simple_loss=0.09135, pruned_loss=0.01324, audio_tagging_loss=0.006842, over 13895.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09143, pruned_loss=0.01344, audio_tagging_loss=0.008863, over 3056878.69 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 08:57:27,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2765073.3333333335, ans=0.07 2023-11-24 08:57:40,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.356e+01 8.865e+01 9.769e+01 1.184e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-24 08:57:47,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2765140.0, ans=0.125 2023-11-24 08:58:12,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414800 2023-11-24 08:58:23,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2765340.0, ans=0.125 2023-11-24 08:58:27,956 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6000, loss[loss=0.06843, simple_loss=0.09205, pruned_loss=0.01426, audio_tagging_loss=0.008145, over 15093.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09186, pruned_loss=0.01351, audio_tagging_loss=0.008751, over 3054590.76 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 08:58:27,957 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 08:58:55,301 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9145, 1.4211, 3.4775, 3.0033, 3.0127, 3.0901, 3.0741, 3.1656], device='cuda:1') 2023-11-24 08:59:07,261 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9770, 3.1746, 2.8774, 3.2266, 3.3853, 2.8288, 3.4517, 2.7341], device='cuda:1') 2023-11-24 08:59:09,814 INFO [train_asr.py:1253] (1/4) Epoch 35, validation: loss=0.05756, simple_loss=0.05084, pruned_loss=0.005093, audio_tagging_loss=0.02705, over 4681554.00 frames. 2023-11-24 08:59:09,815 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 08:59:12,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2765406.6666666665, ans=0.1 2023-11-24 08:59:26,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2765473.3333333335, ans=0.2 2023-11-24 08:59:27,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2765473.3333333335, ans=0.1 2023-11-24 08:59:35,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2765540.0, ans=0.1 2023-11-24 08:59:54,887 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 08:59:56,137 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414850 2023-11-24 09:00:02,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-24 09:00:04,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2765673.3333333335, ans=0.125 2023-11-24 09:00:11,858 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6050, loss[loss=0.06464, simple_loss=0.08659, pruned_loss=0.01337, audio_tagging_loss=0.00797, over 14588.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09166, pruned_loss=0.01334, audio_tagging_loss=0.008809, over 3051741.22 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 09:00:21,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2765740.0, ans=0.0 2023-11-24 09:00:26,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.401e+01 9.139e+01 9.786e+01 1.325e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 09:00:39,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2765873.3333333335, ans=0.125 2023-11-24 09:00:46,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2765873.3333333335, ans=0.1 2023-11-24 09:00:51,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2765940.0, ans=0.125 2023-11-24 09:00:58,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414900 2023-11-24 09:01:04,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2766006.6666666665, ans=0.2 2023-11-24 09:01:13,968 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6100, loss[loss=0.05655, simple_loss=0.07908, pruned_loss=0.007856, audio_tagging_loss=0.009152, over 15402.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09144, pruned_loss=0.01344, audio_tagging_loss=0.008783, over 3048120.55 frames. ], batch size: 57, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 09:01:14,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2766073.3333333335, ans=0.125 2023-11-24 09:01:59,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2766273.3333333335, ans=0.0 2023-11-24 09:02:00,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 414950 2023-11-24 09:02:16,787 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6150, loss[loss=0.07105, simple_loss=0.1054, pruned_loss=0.01044, audio_tagging_loss=0.00789, over 14614.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09062, pruned_loss=0.01327, audio_tagging_loss=0.008831, over 3054089.85 frames. ], batch size: 54, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 09:02:23,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2766406.6666666665, ans=0.125 2023-11-24 09:02:29,135 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:02:31,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.378e+01 9.060e+01 9.653e+01 1.339e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 09:02:45,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-11-24 09:03:03,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415000 2023-11-24 09:03:08,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2766673.3333333335, ans=0.125 2023-11-24 09:03:10,635 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:03:18,777 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6200, loss[loss=0.05116, simple_loss=0.07122, pruned_loss=0.004427, audio_tagging_loss=0.01112, over 14833.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09039, pruned_loss=0.01327, audio_tagging_loss=0.008846, over 3054222.42 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 09:03:25,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2766740.0, ans=0.025 2023-11-24 09:03:55,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2766940.0, ans=0.125 2023-11-24 09:04:00,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2766940.0, ans=0.025 2023-11-24 09:04:04,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415050 2023-11-24 09:04:21,354 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6250, loss[loss=0.06143, simple_loss=0.07086, pruned_loss=0.01375, audio_tagging_loss=0.01226, over 14838.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09023, pruned_loss=0.0131, audio_tagging_loss=0.008953, over 3054816.26 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 09:04:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2767073.3333333335, ans=0.0 2023-11-24 09:04:37,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2767140.0, ans=0.2 2023-11-24 09:04:38,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.234e+01 8.847e+01 9.550e+01 1.216e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-24 09:05:07,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415100 2023-11-24 09:05:18,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2767340.0, ans=0.125 2023-11-24 09:05:24,274 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6300, loss[loss=0.06447, simple_loss=0.08916, pruned_loss=0.01102, audio_tagging_loss=0.008875, over 13916.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09044, pruned_loss=0.01316, audio_tagging_loss=0.009028, over 3056500.62 frames. ], batch size: 56, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 09:05:37,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2767473.3333333335, ans=0.1 2023-11-24 09:05:45,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-11-24 09:05:52,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-24 09:06:01,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2023-11-24 09:06:10,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415150 2023-11-24 09:06:16,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2767673.3333333335, ans=0.2 2023-11-24 09:06:26,032 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6350, loss[loss=0.0716, simple_loss=0.1106, pruned_loss=0.01008, audio_tagging_loss=0.006207, over 16125.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09063, pruned_loss=0.01335, audio_tagging_loss=0.009175, over 3054141.75 frames. ], batch size: 60, lr: 1.94e-03, grad_scale: 8.0 2023-11-24 09:06:41,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.597e+01 9.129e+01 9.711e+01 1.215e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 09:06:41,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2767806.6666666665, ans=0.125 2023-11-24 09:06:58,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2767873.3333333335, ans=0.125 2023-11-24 09:06:59,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2767873.3333333335, ans=0.125 2023-11-24 09:07:11,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415200 2023-11-24 09:07:19,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2768006.6666666665, ans=0.125 2023-11-24 09:07:23,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:07:27,184 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6400, loss[loss=0.06112, simple_loss=0.08246, pruned_loss=0.01153, audio_tagging_loss=0.00837, over 14777.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09041, pruned_loss=0.01331, audio_tagging_loss=0.009185, over 3051873.52 frames. ], batch size: 53, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 09:07:28,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2768073.3333333335, ans=0.05 2023-11-24 09:07:35,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-24 09:07:40,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-24 09:07:53,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2768206.6666666665, ans=0.125 2023-11-24 09:07:56,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2768206.6666666665, ans=0.125 2023-11-24 09:08:00,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2023-11-24 09:08:12,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415250 2023-11-24 09:08:13,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2768273.3333333335, ans=0.04949747468305833 2023-11-24 09:08:13,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2768273.3333333335, ans=0.125 2023-11-24 09:08:30,551 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6450, loss[loss=0.06672, simple_loss=0.08759, pruned_loss=0.01044, audio_tagging_loss=0.01248, over 15674.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09001, pruned_loss=0.01314, audio_tagging_loss=0.009361, over 3051678.99 frames. ], batch size: 59, lr: 1.94e-03, grad_scale: 16.0 2023-11-24 09:08:41,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2768473.3333333335, ans=0.1 2023-11-24 09:08:46,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.512e+01 9.076e+01 9.568e+01 1.780e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 09:09:03,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2768540.0, ans=0.2 2023-11-24 09:09:10,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-24 09:09:17,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415300 2023-11-24 09:09:32,756 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6500, loss[loss=0.08808, simple_loss=0.1192, pruned_loss=0.01881, audio_tagging_loss=0.00968, over 15368.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09037, pruned_loss=0.0132, audio_tagging_loss=0.009295, over 3051904.16 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:09:36,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2768740.0, ans=0.125 2023-11-24 09:09:44,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-24 09:09:50,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2768806.6666666665, ans=0.125 2023-11-24 09:09:57,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2768873.3333333335, ans=0.125 2023-11-24 09:10:02,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2768873.3333333335, ans=0.2 2023-11-24 09:10:06,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2768873.3333333335, ans=0.0 2023-11-24 09:10:10,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2768940.0, ans=0.2 2023-11-24 09:10:17,977 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415350 2023-11-24 09:10:33,290 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6550, loss[loss=0.07461, simple_loss=0.1035, pruned_loss=0.01342, audio_tagging_loss=0.009451, over 15388.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09113, pruned_loss=0.01319, audio_tagging_loss=0.00912, over 3059016.92 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:10:34,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-24 09:10:50,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.510e+01 8.986e+01 9.947e+01 1.296e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-24 09:10:59,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2769206.6666666665, ans=0.125 2023-11-24 09:11:19,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415400 2023-11-24 09:11:20,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2769273.3333333335, ans=0.0 2023-11-24 09:11:37,601 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6600, loss[loss=0.07001, simple_loss=0.09225, pruned_loss=0.01177, audio_tagging_loss=0.01212, over 14471.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09118, pruned_loss=0.0131, audio_tagging_loss=0.008964, over 3053831.68 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:11:46,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2769406.6666666665, ans=0.0 2023-11-24 09:11:51,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2769473.3333333335, ans=0.1 2023-11-24 09:11:54,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2769473.3333333335, ans=0.0 2023-11-24 09:11:57,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2769473.3333333335, ans=0.0 2023-11-24 09:12:08,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2769540.0, ans=0.2 2023-11-24 09:12:23,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415450 2023-11-24 09:12:39,122 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6650, loss[loss=0.07188, simple_loss=0.08934, pruned_loss=0.01619, audio_tagging_loss=0.01101, over 16317.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09062, pruned_loss=0.01313, audio_tagging_loss=0.008911, over 3048352.19 frames. ], batch size: 61, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:12:51,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2769806.6666666665, ans=0.1 2023-11-24 09:12:54,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.429e+01 8.957e+01 9.773e+01 1.353e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-24 09:13:17,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2023-11-24 09:13:25,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415500 2023-11-24 09:13:27,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2769940.0, ans=0.125 2023-11-24 09:13:41,131 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6700, loss[loss=0.05457, simple_loss=0.07831, pruned_loss=0.006942, audio_tagging_loss=0.008473, over 14955.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09054, pruned_loss=0.01319, audio_tagging_loss=0.00886, over 3047166.14 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:13:42,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2770073.3333333335, ans=0.125 2023-11-24 09:13:47,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2770073.3333333335, ans=0.0 2023-11-24 09:14:18,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-24 09:14:27,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415550 2023-11-24 09:14:29,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2023-11-24 09:14:38,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2770340.0, ans=0.125 2023-11-24 09:14:39,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770340.0, ans=0.1 2023-11-24 09:14:43,940 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6750, loss[loss=0.06821, simple_loss=0.08962, pruned_loss=0.01492, audio_tagging_loss=0.008483, over 15443.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0905, pruned_loss=0.01306, audio_tagging_loss=0.008796, over 3038993.11 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:14:57,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.01 vs. limit=15.0 2023-11-24 09:15:00,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.550e+01 9.289e+01 9.814e+01 1.264e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-24 09:15:07,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2770540.0, ans=0.0 2023-11-24 09:15:12,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770540.0, ans=0.1 2023-11-24 09:15:15,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2770540.0, ans=0.05 2023-11-24 09:15:30,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415600 2023-11-24 09:15:44,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2770673.3333333335, ans=0.025 2023-11-24 09:15:47,262 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6800, loss[loss=0.07468, simple_loss=0.1003, pruned_loss=0.01738, audio_tagging_loss=0.007161, over 14444.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08934, pruned_loss=0.01287, audio_tagging_loss=0.008774, over 3035582.73 frames. ], batch size: 53, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:16:23,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2770940.0, ans=0.0 2023-11-24 09:16:26,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2770940.0, ans=0.125 2023-11-24 09:16:33,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415650 2023-11-24 09:16:47,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2771073.3333333335, ans=0.2 2023-11-24 09:16:48,697 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6850, loss[loss=0.08317, simple_loss=0.1143, pruned_loss=0.01781, audio_tagging_loss=0.008216, over 16122.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08957, pruned_loss=0.01289, audio_tagging_loss=0.008763, over 3034115.72 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:17:01,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2771140.0, ans=0.125 2023-11-24 09:17:05,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771140.0, ans=0.1 2023-11-24 09:17:06,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.373e+01 8.924e+01 9.619e+01 1.192e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-24 09:17:29,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2771273.3333333335, ans=0.0 2023-11-24 09:17:34,738 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415700 2023-11-24 09:17:41,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-24 09:17:50,600 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6900, loss[loss=0.08023, simple_loss=0.1041, pruned_loss=0.01864, audio_tagging_loss=0.009563, over 15289.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08972, pruned_loss=0.01281, audio_tagging_loss=0.008864, over 3047097.52 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:18:12,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2771473.3333333335, ans=0.125 2023-11-24 09:18:12,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2771473.3333333335, ans=0.125 2023-11-24 09:18:24,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-24 09:18:27,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2771606.6666666665, ans=0.125 2023-11-24 09:18:28,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-24 09:18:36,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415750 2023-11-24 09:18:37,840 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 09:18:40,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2023-11-24 09:18:52,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771740.0, ans=0.1 2023-11-24 09:18:53,280 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 6950, loss[loss=0.07043, simple_loss=0.09662, pruned_loss=0.01258, audio_tagging_loss=0.009542, over 15195.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0902, pruned_loss=0.01299, audio_tagging_loss=0.008888, over 3044690.61 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:19:02,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-24 09:19:10,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.481e+01 9.077e+01 9.564e+01 1.596e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 09:19:19,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2771873.3333333335, ans=0.0 2023-11-24 09:19:26,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2771873.3333333335, ans=0.2 2023-11-24 09:19:39,847 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415800 2023-11-24 09:19:55,974 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7000, loss[loss=0.09762, simple_loss=0.1287, pruned_loss=0.02355, audio_tagging_loss=0.009738, over 15043.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09051, pruned_loss=0.01311, audio_tagging_loss=0.008946, over 3045993.78 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:19:56,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2772073.3333333335, ans=0.0 2023-11-24 09:19:56,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2772073.3333333335, ans=0.125 2023-11-24 09:20:01,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-24 09:20:10,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2772140.0, ans=0.05 2023-11-24 09:20:13,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2772140.0, ans=0.0 2023-11-24 09:20:13,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2772140.0, ans=0.125 2023-11-24 09:20:42,077 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415850 2023-11-24 09:20:56,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2772406.6666666665, ans=0.125 2023-11-24 09:20:57,811 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7050, loss[loss=0.0667, simple_loss=0.09437, pruned_loss=0.01191, audio_tagging_loss=0.007598, over 14526.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09027, pruned_loss=0.01317, audio_tagging_loss=0.008976, over 3051261.59 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:21:15,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.341e+01 9.106e+01 9.596e+01 1.490e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 09:21:27,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2772540.0, ans=0.125 2023-11-24 09:21:31,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2772540.0, ans=0.0 2023-11-24 09:21:39,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2023-11-24 09:21:44,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415900 2023-11-24 09:22:00,636 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7100, loss[loss=0.06615, simple_loss=0.09323, pruned_loss=0.01079, audio_tagging_loss=0.008745, over 15951.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.08998, pruned_loss=0.01323, audio_tagging_loss=0.009087, over 3049675.12 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:22:12,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2772806.6666666665, ans=0.0 2023-11-24 09:22:12,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2772806.6666666665, ans=0.0 2023-11-24 09:22:37,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2772940.0, ans=0.0 2023-11-24 09:22:39,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2772940.0, ans=0.0 2023-11-24 09:22:46,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 415950 2023-11-24 09:23:02,689 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7150, loss[loss=0.07448, simple_loss=0.1013, pruned_loss=0.01456, audio_tagging_loss=0.009266, over 14862.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09065, pruned_loss=0.01321, audio_tagging_loss=0.009074, over 3047453.06 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:23:04,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2773073.3333333335, ans=0.125 2023-11-24 09:23:19,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.644e+01 9.348e+01 1.022e+02 1.334e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-24 09:23:42,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2773273.3333333335, ans=0.0 2023-11-24 09:23:44,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2773273.3333333335, ans=0.125 2023-11-24 09:23:48,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416000 2023-11-24 09:24:09,374 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7200, loss[loss=0.06891, simple_loss=0.08876, pruned_loss=0.01545, audio_tagging_loss=0.009085, over 14246.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09179, pruned_loss=0.01334, audio_tagging_loss=0.009061, over 3049719.50 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:24:22,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2773473.3333333335, ans=0.125 2023-11-24 09:24:24,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2773473.3333333335, ans=0.035 2023-11-24 09:24:25,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2773473.3333333335, ans=0.025 2023-11-24 09:24:34,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2773540.0, ans=0.025 2023-11-24 09:24:38,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-24 09:24:55,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416050 2023-11-24 09:25:12,604 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7250, loss[loss=0.07649, simple_loss=0.1101, pruned_loss=0.01369, audio_tagging_loss=0.007778, over 15930.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09146, pruned_loss=0.01336, audio_tagging_loss=0.009212, over 3047669.92 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:25:21,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-24 09:25:25,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-24 09:25:29,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.813e+01 9.334e+01 1.034e+02 1.372e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-24 09:25:29,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-24 09:25:32,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-11-24 09:25:44,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2773873.3333333335, ans=0.125 2023-11-24 09:25:58,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416100 2023-11-24 09:26:13,971 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7300, loss[loss=0.05389, simple_loss=0.0677, pruned_loss=0.009699, audio_tagging_loss=0.01034, over 14055.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09213, pruned_loss=0.01339, audio_tagging_loss=0.009015, over 3045512.37 frames. ], batch size: 53, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:26:19,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2774073.3333333335, ans=0.2 2023-11-24 09:26:26,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2774140.0, ans=0.125 2023-11-24 09:26:48,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2774206.6666666665, ans=0.0 2023-11-24 09:26:53,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2774273.3333333335, ans=0.07 2023-11-24 09:26:58,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2774273.3333333335, ans=0.015 2023-11-24 09:27:00,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416150 2023-11-24 09:27:06,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2774340.0, ans=0.0 2023-11-24 09:27:11,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2774340.0, ans=0.5 2023-11-24 09:27:15,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2774406.6666666665, ans=0.0 2023-11-24 09:27:15,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2774406.6666666665, ans=0.125 2023-11-24 09:27:16,238 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7350, loss[loss=0.06132, simple_loss=0.08663, pruned_loss=0.01031, audio_tagging_loss=0.007702, over 16674.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09202, pruned_loss=0.01316, audio_tagging_loss=0.008875, over 3047356.73 frames. ], batch size: 64, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:27:29,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2774473.3333333335, ans=0.2 2023-11-24 09:27:31,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2774473.3333333335, ans=0.0 2023-11-24 09:27:34,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.604e+01 9.108e+01 1.008e+02 1.406e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-24 09:27:39,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2774473.3333333335, ans=0.1 2023-11-24 09:27:39,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-24 09:28:02,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416200 2023-11-24 09:28:02,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2774606.6666666665, ans=0.0 2023-11-24 09:28:06,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-11-24 09:28:18,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774740.0, ans=0.125 2023-11-24 09:28:19,620 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7400, loss[loss=0.06119, simple_loss=0.07689, pruned_loss=0.01301, audio_tagging_loss=0.009734, over 14180.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09114, pruned_loss=0.01301, audio_tagging_loss=0.008836, over 3042308.05 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:28:31,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2774806.6666666665, ans=0.125 2023-11-24 09:28:37,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2774806.6666666665, ans=0.2 2023-11-24 09:28:43,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2774873.3333333335, ans=0.0 2023-11-24 09:28:53,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-11-24 09:28:57,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2774940.0, ans=0.125 2023-11-24 09:29:05,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416250 2023-11-24 09:29:21,073 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7450, loss[loss=0.04759, simple_loss=0.06087, pruned_loss=0.007554, audio_tagging_loss=0.009602, over 15488.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09092, pruned_loss=0.01301, audio_tagging_loss=0.008818, over 3045726.15 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:29:23,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-24 09:29:33,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2775140.0, ans=0.2 2023-11-24 09:29:33,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-24 09:29:38,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.229e+01 9.115e+01 9.575e+01 1.366e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 09:29:54,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2775206.6666666665, ans=0.2 2023-11-24 09:30:07,441 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416300 2023-11-24 09:30:14,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-24 09:30:22,752 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7500, loss[loss=0.06557, simple_loss=0.08744, pruned_loss=0.01463, audio_tagging_loss=0.007219, over 15032.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.0913, pruned_loss=0.01329, audio_tagging_loss=0.008687, over 3055870.62 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:30:23,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2775406.6666666665, ans=0.125 2023-11-24 09:30:24,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2775406.6666666665, ans=0.125 2023-11-24 09:30:47,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-24 09:30:52,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2775540.0, ans=10.0 2023-11-24 09:31:03,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2775606.6666666665, ans=0.125 2023-11-24 09:31:07,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2775606.6666666665, ans=0.125 2023-11-24 09:31:08,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416350 2023-11-24 09:31:18,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2775673.3333333335, ans=0.0 2023-11-24 09:31:25,165 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7550, loss[loss=0.06047, simple_loss=0.08195, pruned_loss=0.01187, audio_tagging_loss=0.007624, over 15014.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09035, pruned_loss=0.01324, audio_tagging_loss=0.00883, over 3052410.15 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:31:25,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2775740.0, ans=0.125 2023-11-24 09:31:33,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2775740.0, ans=0.125 2023-11-24 09:31:41,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.548e+01 9.078e+01 9.572e+01 1.867e+02, threshold=1.816e+02, percent-clipped=1.0 2023-11-24 09:32:07,261 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:32:08,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2775940.0, ans=0.1 2023-11-24 09:32:11,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416400 2023-11-24 09:32:23,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2776006.6666666665, ans=0.125 2023-11-24 09:32:26,765 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7600, loss[loss=0.0711, simple_loss=0.1001, pruned_loss=0.014, audio_tagging_loss=0.00706, over 14811.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09104, pruned_loss=0.01335, audio_tagging_loss=0.008756, over 3053478.05 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:32:28,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2776073.3333333335, ans=0.2 2023-11-24 09:32:31,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2776073.3333333335, ans=0.0 2023-11-24 09:32:31,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2776073.3333333335, ans=0.2 2023-11-24 09:32:53,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2023-11-24 09:32:55,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2776206.6666666665, ans=0.125 2023-11-24 09:33:08,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2776273.3333333335, ans=0.07 2023-11-24 09:33:12,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416450 2023-11-24 09:33:27,742 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7650, loss[loss=0.07748, simple_loss=0.1153, pruned_loss=0.0157, audio_tagging_loss=0.004138, over 14158.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09159, pruned_loss=0.01336, audio_tagging_loss=0.008711, over 3048937.20 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:33:39,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2776473.3333333335, ans=0.2 2023-11-24 09:33:42,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2776473.3333333335, ans=15.0 2023-11-24 09:33:47,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.15 vs. limit=10.0 2023-11-24 09:33:48,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.355e+01 9.042e+01 9.809e+01 1.325e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 09:34:03,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2776540.0, ans=0.0 2023-11-24 09:34:08,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2776606.6666666665, ans=0.125 2023-11-24 09:34:11,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2776606.6666666665, ans=0.125 2023-11-24 09:34:12,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2776606.6666666665, ans=0.125 2023-11-24 09:34:13,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416500 2023-11-24 09:34:16,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2776673.3333333335, ans=0.1 2023-11-24 09:34:21,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2776673.3333333335, ans=0.0 2023-11-24 09:34:27,448 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:34:30,124 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7700, loss[loss=0.09072, simple_loss=0.1214, pruned_loss=0.02177, audio_tagging_loss=0.008242, over 16605.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09165, pruned_loss=0.01336, audio_tagging_loss=0.00876, over 3047793.88 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:34:44,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776806.6666666665, ans=0.125 2023-11-24 09:34:45,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-24 09:35:04,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2776940.0, ans=0.5 2023-11-24 09:35:11,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2776940.0, ans=0.0 2023-11-24 09:35:14,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416550 2023-11-24 09:35:26,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2777006.6666666665, ans=0.07 2023-11-24 09:35:27,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2777006.6666666665, ans=0.0 2023-11-24 09:35:31,201 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7750, loss[loss=0.04565, simple_loss=0.05948, pruned_loss=0.003977, audio_tagging_loss=0.01193, over 14654.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09241, pruned_loss=0.0134, audio_tagging_loss=0.008793, over 3052675.43 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:35:33,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-24 09:35:44,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2777140.0, ans=0.125 2023-11-24 09:35:49,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-24 09:35:50,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.304e+01 9.008e+01 9.844e+01 1.304e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-24 09:36:17,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416600 2023-11-24 09:36:17,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2777273.3333333335, ans=0.125 2023-11-24 09:36:22,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2777340.0, ans=0.125 2023-11-24 09:36:33,267 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7800, loss[loss=0.06037, simple_loss=0.08526, pruned_loss=0.009791, audio_tagging_loss=0.007951, over 14896.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09187, pruned_loss=0.01328, audio_tagging_loss=0.008843, over 3045270.28 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:36:34,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2777406.6666666665, ans=10.0 2023-11-24 09:36:39,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2777406.6666666665, ans=0.125 2023-11-24 09:37:17,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2777606.6666666665, ans=0.2 2023-11-24 09:37:19,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416650 2023-11-24 09:37:30,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2777673.3333333335, ans=0.125 2023-11-24 09:37:35,050 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7850, loss[loss=0.07319, simple_loss=0.1011, pruned_loss=0.01439, audio_tagging_loss=0.008247, over 15529.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09172, pruned_loss=0.01331, audio_tagging_loss=0.008927, over 3044843.64 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:37:39,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777740.0, ans=0.1 2023-11-24 09:37:47,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2777806.6666666665, ans=0.125 2023-11-24 09:37:51,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2777806.6666666665, ans=0.125 2023-11-24 09:37:55,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.418e+01 9.005e+01 9.795e+01 1.227e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-24 09:37:56,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2777806.6666666665, ans=15.0 2023-11-24 09:38:11,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2777940.0, ans=0.2 2023-11-24 09:38:18,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-11-24 09:38:21,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416700 2023-11-24 09:38:37,565 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7900, loss[loss=0.06657, simple_loss=0.09048, pruned_loss=0.01142, audio_tagging_loss=0.009903, over 15318.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09312, pruned_loss=0.01368, audio_tagging_loss=0.008935, over 3044043.31 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:38:39,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2778073.3333333335, ans=0.125 2023-11-24 09:38:50,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2778140.0, ans=0.1 2023-11-24 09:39:15,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2778273.3333333335, ans=0.2 2023-11-24 09:39:17,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2778273.3333333335, ans=0.0 2023-11-24 09:39:18,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2778273.3333333335, ans=0.0 2023-11-24 09:39:23,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416750 2023-11-24 09:39:24,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2778273.3333333335, ans=0.125 2023-11-24 09:39:29,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2778340.0, ans=0.125 2023-11-24 09:39:38,826 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 7950, loss[loss=0.05797, simple_loss=0.06903, pruned_loss=0.01357, audio_tagging_loss=0.00988, over 16710.00 frames. ], tot_loss[loss=0.06906, simple_loss=0.09285, pruned_loss=0.01356, audio_tagging_loss=0.009075, over 3042159.32 frames. ], batch size: 65, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:39:51,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2778473.3333333335, ans=0.0 2023-11-24 09:39:53,581 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 09:39:58,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.587e+01 9.115e+01 9.680e+01 1.233e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 09:40:08,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2778540.0, ans=0.125 2023-11-24 09:40:15,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2778606.6666666665, ans=0.2 2023-11-24 09:40:18,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2778606.6666666665, ans=0.0 2023-11-24 09:40:25,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416800 2023-11-24 09:40:31,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2778673.3333333335, ans=0.2 2023-11-24 09:40:41,544 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8000, loss[loss=0.07237, simple_loss=0.08933, pruned_loss=0.01602, audio_tagging_loss=0.01169, over 15201.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09206, pruned_loss=0.01349, audio_tagging_loss=0.009223, over 3044110.16 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:41:03,916 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:41:19,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2778940.0, ans=0.2 2023-11-24 09:41:25,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2778940.0, ans=0.2 2023-11-24 09:41:27,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416850 2023-11-24 09:41:44,420 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8050, loss[loss=0.06625, simple_loss=0.09428, pruned_loss=0.009941, audio_tagging_loss=0.00917, over 15225.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09034, pruned_loss=0.01304, audio_tagging_loss=0.009305, over 3039019.74 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:41:54,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2779073.3333333335, ans=0.125 2023-11-24 09:42:04,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2779140.0, ans=0.125 2023-11-24 09:42:05,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.535e+01 9.115e+01 9.754e+01 2.166e+02, threshold=1.823e+02, percent-clipped=1.0 2023-11-24 09:42:30,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416900 2023-11-24 09:42:34,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2779340.0, ans=0.125 2023-11-24 09:42:46,444 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8100, loss[loss=0.05801, simple_loss=0.07698, pruned_loss=0.01107, audio_tagging_loss=0.008448, over 14669.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09128, pruned_loss=0.01327, audio_tagging_loss=0.009248, over 3038809.86 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:42:49,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-24 09:43:12,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2779540.0, ans=0.125 2023-11-24 09:43:21,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-24 09:43:31,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-24 09:43:32,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 416950 2023-11-24 09:43:41,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-24 09:43:47,834 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8150, loss[loss=0.07757, simple_loss=0.1105, pruned_loss=0.01469, audio_tagging_loss=0.007656, over 15921.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09051, pruned_loss=0.01317, audio_tagging_loss=0.009142, over 3038352.49 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:44:08,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2779806.6666666665, ans=0.125 2023-11-24 09:44:09,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.620e+01 9.310e+01 1.002e+02 1.330e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-24 09:44:33,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779940.0, ans=0.1 2023-11-24 09:44:34,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417000 2023-11-24 09:44:43,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-24 09:44:50,740 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8200, loss[loss=0.07744, simple_loss=0.1024, pruned_loss=0.01931, audio_tagging_loss=0.006937, over 14641.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09027, pruned_loss=0.01305, audio_tagging_loss=0.00897, over 3045768.95 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:44:51,959 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 09:45:30,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2780273.3333333335, ans=0.125 2023-11-24 09:45:31,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2780273.3333333335, ans=0.125 2023-11-24 09:45:37,023 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417050 2023-11-24 09:45:52,453 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8250, loss[loss=0.04381, simple_loss=0.0527, pruned_loss=0.00676, audio_tagging_loss=0.0107, over 14733.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09009, pruned_loss=0.01306, audio_tagging_loss=0.009044, over 3049712.89 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:46:07,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-24 09:46:13,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.658e+01 9.148e+01 9.760e+01 1.278e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 09:46:22,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-24 09:46:38,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417100 2023-11-24 09:46:54,586 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8300, loss[loss=0.096, simple_loss=0.1231, pruned_loss=0.02503, audio_tagging_loss=0.00942, over 15129.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09071, pruned_loss=0.01295, audio_tagging_loss=0.009003, over 3051425.16 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:46:54,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2780740.0, ans=0.2 2023-11-24 09:47:31,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2780940.0, ans=0.0 2023-11-24 09:47:41,282 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417150 2023-11-24 09:47:58,464 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8350, loss[loss=0.05322, simple_loss=0.07229, pruned_loss=0.007681, audio_tagging_loss=0.009397, over 16405.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09047, pruned_loss=0.01298, audio_tagging_loss=0.008958, over 3050317.80 frames. ], batch size: 62, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:48:17,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2781140.0, ans=0.0 2023-11-24 09:48:18,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.379e+01 9.010e+01 9.696e+01 1.228e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-24 09:48:29,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2781206.6666666665, ans=0.125 2023-11-24 09:48:44,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417200 2023-11-24 09:48:56,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-24 09:48:57,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2781340.0, ans=0.0 2023-11-24 09:49:00,072 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8400, loss[loss=0.07931, simple_loss=0.1132, pruned_loss=0.01505, audio_tagging_loss=0.007646, over 15626.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09083, pruned_loss=0.01317, audio_tagging_loss=0.008809, over 3051350.04 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:49:19,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2781473.3333333335, ans=0.125 2023-11-24 09:49:24,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-24 09:49:46,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417250 2023-11-24 09:49:46,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2781606.6666666665, ans=0.125 2023-11-24 09:49:56,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781673.3333333335, ans=0.1 2023-11-24 09:49:59,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2781673.3333333335, ans=0.125 2023-11-24 09:50:02,427 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8450, loss[loss=0.04765, simple_loss=0.05711, pruned_loss=0.009509, audio_tagging_loss=0.009587, over 13975.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09078, pruned_loss=0.01302, audio_tagging_loss=0.008866, over 3055299.23 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:50:09,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2781740.0, ans=0.125 2023-11-24 09:50:24,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.593e+01 9.155e+01 9.677e+01 1.308e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 09:50:41,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2781940.0, ans=0.2 2023-11-24 09:50:49,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417300 2023-11-24 09:51:05,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2782073.3333333335, ans=0.0 2023-11-24 09:51:06,315 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8500, loss[loss=0.07148, simple_loss=0.1072, pruned_loss=0.009635, audio_tagging_loss=0.008225, over 15120.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.0914, pruned_loss=0.01315, audio_tagging_loss=0.008839, over 3056778.99 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:51:07,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2782073.3333333335, ans=0.125 2023-11-24 09:51:39,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2782206.6666666665, ans=0.125 2023-11-24 09:51:42,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2782273.3333333335, ans=0.125 2023-11-24 09:51:52,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417350 2023-11-24 09:52:07,817 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8550, loss[loss=0.06445, simple_loss=0.08878, pruned_loss=0.01327, audio_tagging_loss=0.006787, over 14502.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09138, pruned_loss=0.01308, audio_tagging_loss=0.008947, over 3059519.50 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:52:10,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-24 09:52:26,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2782473.3333333335, ans=0.125 2023-11-24 09:52:29,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.685e+01 9.326e+01 1.025e+02 1.164e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 09:52:49,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2023-11-24 09:52:54,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417400 2023-11-24 09:53:02,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.67 vs. limit=10.0 2023-11-24 09:53:10,015 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8600, loss[loss=0.07219, simple_loss=0.09688, pruned_loss=0.01358, audio_tagging_loss=0.01017, over 15525.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09152, pruned_loss=0.01313, audio_tagging_loss=0.008986, over 3059401.04 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:53:35,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2782873.3333333335, ans=0.125 2023-11-24 09:53:36,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2782873.3333333335, ans=0.1 2023-11-24 09:53:51,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2782940.0, ans=0.0 2023-11-24 09:53:56,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417450 2023-11-24 09:54:12,796 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8650, loss[loss=0.0789, simple_loss=0.11, pruned_loss=0.01581, audio_tagging_loss=0.008106, over 14696.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09165, pruned_loss=0.01319, audio_tagging_loss=0.009005, over 3067764.16 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:54:27,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2783140.0, ans=0.125 2023-11-24 09:54:35,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.560e+01 9.076e+01 1.003e+02 1.218e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 09:54:39,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-24 09:54:43,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2783206.6666666665, ans=0.125 2023-11-24 09:54:59,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417500 2023-11-24 09:55:13,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2783340.0, ans=0.125 2023-11-24 09:55:15,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2783406.6666666665, ans=0.0 2023-11-24 09:55:16,149 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8700, loss[loss=0.0361, simple_loss=0.03209, pruned_loss=0.00598, audio_tagging_loss=0.01407, over 12995.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09187, pruned_loss=0.01326, audio_tagging_loss=0.009025, over 3066625.14 frames. ], batch size: 52, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:55:52,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2783540.0, ans=0.125 2023-11-24 09:56:02,420 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417550 2023-11-24 09:56:13,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2783673.3333333335, ans=0.09899494936611666 2023-11-24 09:56:17,912 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8750, loss[loss=0.0922, simple_loss=0.1321, pruned_loss=0.02032, audio_tagging_loss=0.00583, over 15116.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.09257, pruned_loss=0.01332, audio_tagging_loss=0.008996, over 3061642.76 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 09:56:20,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783740.0, ans=0.1 2023-11-24 09:56:29,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2783806.6666666665, ans=0.125 2023-11-24 09:56:29,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-11-24 09:56:40,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.767e+01 9.281e+01 1.037e+02 1.287e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-24 09:57:03,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417600 2023-11-24 09:57:05,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2784006.6666666665, ans=0.125 2023-11-24 09:57:16,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2784006.6666666665, ans=0.0 2023-11-24 09:57:20,227 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8800, loss[loss=0.06297, simple_loss=0.08956, pruned_loss=0.0124, audio_tagging_loss=0.005796, over 14061.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09183, pruned_loss=0.01318, audio_tagging_loss=0.009081, over 3054397.16 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:57:35,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2784140.0, ans=0.125 2023-11-24 09:57:48,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2784206.6666666665, ans=0.0 2023-11-24 09:58:01,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2784273.3333333335, ans=0.125 2023-11-24 09:58:04,474 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 09:58:05,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417650 2023-11-24 09:58:21,981 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8850, loss[loss=0.06432, simple_loss=0.07829, pruned_loss=0.01324, audio_tagging_loss=0.01193, over 15537.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09046, pruned_loss=0.01281, audio_tagging_loss=0.009107, over 3054853.55 frames. ], batch size: 59, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:58:32,833 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 09:58:38,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2784473.3333333335, ans=0.2 2023-11-24 09:58:39,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2784473.3333333335, ans=0.0 2023-11-24 09:58:43,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.550e+01 9.128e+01 9.722e+01 1.464e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 09:58:47,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2784540.0, ans=0.0 2023-11-24 09:58:56,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2784540.0, ans=0.125 2023-11-24 09:59:07,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417700 2023-11-24 09:59:19,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2784673.3333333335, ans=0.125 2023-11-24 09:59:22,984 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8900, loss[loss=0.06869, simple_loss=0.09332, pruned_loss=0.01141, audio_tagging_loss=0.01062, over 15348.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.0916, pruned_loss=0.01315, audio_tagging_loss=0.009027, over 3057299.24 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 09:59:36,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784806.6666666665, ans=0.1 2023-11-24 09:59:37,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784806.6666666665, ans=0.1 2023-11-24 09:59:55,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2784873.3333333335, ans=0.2 2023-11-24 09:59:56,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-24 10:00:08,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417750 2023-11-24 10:00:10,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2785006.6666666665, ans=0.1 2023-11-24 10:00:15,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2785006.6666666665, ans=0.0 2023-11-24 10:00:19,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2785006.6666666665, ans=0.125 2023-11-24 10:00:24,361 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 8950, loss[loss=0.0936, simple_loss=0.1359, pruned_loss=0.01833, audio_tagging_loss=0.007337, over 15459.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09182, pruned_loss=0.01319, audio_tagging_loss=0.008853, over 3057247.92 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:00:34,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2785073.3333333335, ans=0.125 2023-11-24 10:00:37,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2785140.0, ans=0.125 2023-11-24 10:00:47,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.510e+01 9.195e+01 1.012e+02 1.504e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 10:00:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2785206.6666666665, ans=0.0 2023-11-24 10:01:05,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2785273.3333333335, ans=0.125 2023-11-24 10:01:09,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417800 2023-11-24 10:01:14,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-24 10:01:24,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2785340.0, ans=0.125 2023-11-24 10:01:26,217 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9000, loss[loss=0.07076, simple_loss=0.09114, pruned_loss=0.01791, audio_tagging_loss=0.007285, over 15969.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.0924, pruned_loss=0.01347, audio_tagging_loss=0.008751, over 3063415.51 frames. ], batch size: 61, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:01:26,218 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 10:02:10,229 INFO [train_asr.py:1253] (1/4) Epoch 35, validation: loss=0.05875, simple_loss=0.05079, pruned_loss=0.005122, audio_tagging_loss=0.02823, over 4681554.00 frames. 2023-11-24 10:02:10,231 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 10:02:15,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2785406.6666666665, ans=0.04949747468305833 2023-11-24 10:02:15,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2023-11-24 10:02:56,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417850 2023-11-24 10:02:58,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2785673.3333333335, ans=0.125 2023-11-24 10:03:11,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2785740.0, ans=0.125 2023-11-24 10:03:12,417 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9050, loss[loss=0.05156, simple_loss=0.06148, pruned_loss=0.0101, audio_tagging_loss=0.01072, over 15094.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09166, pruned_loss=0.01334, audio_tagging_loss=0.008713, over 3063789.33 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:03:12,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2785740.0, ans=0.125 2023-11-24 10:03:18,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2785740.0, ans=0.125 2023-11-24 10:03:24,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-24 10:03:30,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2785806.6666666665, ans=0.0 2023-11-24 10:03:36,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.575e+01 9.161e+01 9.743e+01 1.218e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 10:03:54,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-11-24 10:03:54,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2785940.0, ans=0.0 2023-11-24 10:03:57,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2785940.0, ans=0.0 2023-11-24 10:03:58,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417900 2023-11-24 10:04:12,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2786006.6666666665, ans=0.125 2023-11-24 10:04:14,921 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9100, loss[loss=0.05979, simple_loss=0.08409, pruned_loss=0.009048, audio_tagging_loss=0.0087, over 16343.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.0911, pruned_loss=0.01333, audio_tagging_loss=0.008828, over 3068121.28 frames. ], batch size: 61, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:04:48,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-24 10:04:49,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2786206.6666666665, ans=0.2 2023-11-24 10:04:58,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-24 10:04:59,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2786273.3333333335, ans=0.0 2023-11-24 10:05:00,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 417950 2023-11-24 10:05:13,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2786340.0, ans=0.0 2023-11-24 10:05:15,914 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9150, loss[loss=0.0666, simple_loss=0.09262, pruned_loss=0.009873, audio_tagging_loss=0.01042, over 15619.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.0911, pruned_loss=0.01327, audio_tagging_loss=0.008805, over 3061661.59 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:05:34,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2786473.3333333335, ans=0.125 2023-11-24 10:05:39,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.404e+01 9.029e+01 9.911e+01 1.244e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 10:05:50,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2786540.0, ans=0.035 2023-11-24 10:06:00,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2786606.6666666665, ans=0.125 2023-11-24 10:06:01,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418000 2023-11-24 10:06:06,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-11-24 10:06:18,308 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9200, loss[loss=0.06085, simple_loss=0.08282, pruned_loss=0.01192, audio_tagging_loss=0.007518, over 14896.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09062, pruned_loss=0.01319, audio_tagging_loss=0.008821, over 3057183.85 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:06:24,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2786740.0, ans=0.125 2023-11-24 10:06:28,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2786740.0, ans=0.125 2023-11-24 10:06:37,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2786806.6666666665, ans=0.2 2023-11-24 10:06:49,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2786873.3333333335, ans=0.2 2023-11-24 10:07:03,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2023-11-24 10:07:04,195 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418050 2023-11-24 10:07:20,421 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9250, loss[loss=0.07493, simple_loss=0.09954, pruned_loss=0.01563, audio_tagging_loss=0.009527, over 15609.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09076, pruned_loss=0.01321, audio_tagging_loss=0.008852, over 3055979.57 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:07:28,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2787073.3333333335, ans=0.0 2023-11-24 10:07:30,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2787073.3333333335, ans=0.125 2023-11-24 10:07:43,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.443e+01 8.965e+01 9.843e+01 1.266e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 10:07:43,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2787206.6666666665, ans=0.125 2023-11-24 10:07:43,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2787206.6666666665, ans=0.125 2023-11-24 10:07:56,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2787273.3333333335, ans=0.1 2023-11-24 10:08:06,255 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418100 2023-11-24 10:08:06,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2023-11-24 10:08:22,351 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9300, loss[loss=0.0816, simple_loss=0.1158, pruned_loss=0.01724, audio_tagging_loss=0.006464, over 14503.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09072, pruned_loss=0.01304, audio_tagging_loss=0.008835, over 3048658.94 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:08:28,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2787406.6666666665, ans=0.125 2023-11-24 10:08:36,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2787473.3333333335, ans=0.0 2023-11-24 10:08:48,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2787540.0, ans=0.125 2023-11-24 10:08:50,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787540.0, ans=0.1 2023-11-24 10:08:53,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2787540.0, ans=0.125 2023-11-24 10:09:02,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2787606.6666666665, ans=0.2 2023-11-24 10:09:03,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-11-24 10:09:04,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2787606.6666666665, ans=0.125 2023-11-24 10:09:08,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418150 2023-11-24 10:09:23,910 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9350, loss[loss=0.06592, simple_loss=0.08902, pruned_loss=0.01375, audio_tagging_loss=0.007654, over 14719.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09077, pruned_loss=0.01304, audio_tagging_loss=0.00891, over 3051778.56 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:09:35,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2787806.6666666665, ans=0.125 2023-11-24 10:09:48,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.452e+01 8.975e+01 9.526e+01 2.573e+02, threshold=1.795e+02, percent-clipped=1.0 2023-11-24 10:09:50,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2787873.3333333335, ans=0.125 2023-11-24 10:10:09,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418200 2023-11-24 10:10:26,214 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9400, loss[loss=0.07578, simple_loss=0.1067, pruned_loss=0.0144, audio_tagging_loss=0.008026, over 14685.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09046, pruned_loss=0.01308, audio_tagging_loss=0.008943, over 3042515.37 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:10:27,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2788073.3333333335, ans=22.5 2023-11-24 10:10:28,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2788073.3333333335, ans=0.0 2023-11-24 10:11:11,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2788273.3333333335, ans=0.125 2023-11-24 10:11:12,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418250 2023-11-24 10:11:22,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2788340.0, ans=0.0 2023-11-24 10:11:25,993 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:11:28,914 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9450, loss[loss=0.05352, simple_loss=0.07292, pruned_loss=0.00752, audio_tagging_loss=0.009543, over 13887.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.0912, pruned_loss=0.01334, audio_tagging_loss=0.008986, over 3041417.06 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:11:29,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2788406.6666666665, ans=0.125 2023-11-24 10:11:44,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-24 10:11:54,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.573e+01 9.142e+01 1.013e+02 1.307e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 10:12:01,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-24 10:12:14,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418300 2023-11-24 10:12:15,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788606.6666666665, ans=0.1 2023-11-24 10:12:25,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2788673.3333333335, ans=0.125 2023-11-24 10:12:30,829 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9500, loss[loss=0.08302, simple_loss=0.1147, pruned_loss=0.01723, audio_tagging_loss=0.008425, over 14833.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.09207, pruned_loss=0.0136, audio_tagging_loss=0.00902, over 3042790.51 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:12:34,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2788740.0, ans=0.0 2023-11-24 10:12:36,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2788740.0, ans=0.125 2023-11-24 10:12:58,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2788873.3333333335, ans=0.2 2023-11-24 10:13:15,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2788940.0, ans=0.125 2023-11-24 10:13:17,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418350 2023-11-24 10:13:23,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2789006.6666666665, ans=0.125 2023-11-24 10:13:26,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2789006.6666666665, ans=0.0 2023-11-24 10:13:33,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2789073.3333333335, ans=0.125 2023-11-24 10:13:34,259 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9550, loss[loss=0.04873, simple_loss=0.0605, pruned_loss=0.00562, audio_tagging_loss=0.01286, over 16014.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09147, pruned_loss=0.01337, audio_tagging_loss=0.009121, over 3044864.57 frames. ], batch size: 62, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:13:39,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2789073.3333333335, ans=0.125 2023-11-24 10:13:46,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-11-24 10:13:47,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2789140.0, ans=0.0 2023-11-24 10:13:58,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.574e+01 9.293e+01 1.001e+02 1.205e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-24 10:14:12,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2789273.3333333335, ans=15.0 2023-11-24 10:14:20,445 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418400 2023-11-24 10:14:33,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-24 10:14:33,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2789340.0, ans=0.0 2023-11-24 10:14:36,074 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9600, loss[loss=0.07907, simple_loss=0.1092, pruned_loss=0.01392, audio_tagging_loss=0.01056, over 15640.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09043, pruned_loss=0.01322, audio_tagging_loss=0.009179, over 3038476.35 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:14:45,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2789406.6666666665, ans=0.125 2023-11-24 10:14:53,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2789473.3333333335, ans=0.125 2023-11-24 10:15:05,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789540.0, ans=0.1 2023-11-24 10:15:12,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2023-11-24 10:15:18,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2789606.6666666665, ans=0.0 2023-11-24 10:15:21,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2789606.6666666665, ans=0.015 2023-11-24 10:15:22,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418450 2023-11-24 10:15:38,340 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9650, loss[loss=0.0719, simple_loss=0.09221, pruned_loss=0.01636, audio_tagging_loss=0.009433, over 15616.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09063, pruned_loss=0.01321, audio_tagging_loss=0.009201, over 3043247.04 frames. ], batch size: 59, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:15:43,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-24 10:16:04,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.395e+01 8.995e+01 9.967e+01 1.274e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-24 10:16:17,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-24 10:16:21,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2789940.0, ans=0.125 2023-11-24 10:16:25,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418500 2023-11-24 10:16:27,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2790006.6666666665, ans=0.0 2023-11-24 10:16:42,581 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9700, loss[loss=0.07369, simple_loss=0.1001, pruned_loss=0.01453, audio_tagging_loss=0.009108, over 14780.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09, pruned_loss=0.01306, audio_tagging_loss=0.009021, over 3040745.99 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:16:45,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2790073.3333333335, ans=0.125 2023-11-24 10:16:45,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2790073.3333333335, ans=0.125 2023-11-24 10:17:05,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2790206.6666666665, ans=0.0 2023-11-24 10:17:15,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2790206.6666666665, ans=0.125 2023-11-24 10:17:18,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-24 10:17:24,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-24 10:17:27,863 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418550 2023-11-24 10:17:30,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2790340.0, ans=0.05 2023-11-24 10:17:31,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-24 10:17:36,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2790340.0, ans=0.2 2023-11-24 10:17:40,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2790340.0, ans=0.125 2023-11-24 10:17:43,644 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9750, loss[loss=0.06075, simple_loss=0.08588, pruned_loss=0.008858, audio_tagging_loss=0.008958, over 15479.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09068, pruned_loss=0.01309, audio_tagging_loss=0.008965, over 3046024.38 frames. ], batch size: 59, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:17:45,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2790406.6666666665, ans=0.0 2023-11-24 10:18:02,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2023-11-24 10:18:05,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2790473.3333333335, ans=0.125 2023-11-24 10:18:08,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.345e+01 8.930e+01 9.592e+01 1.228e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 10:18:09,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2790540.0, ans=0.95 2023-11-24 10:18:16,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790540.0, ans=0.1 2023-11-24 10:18:29,687 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418600 2023-11-24 10:18:30,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2790606.6666666665, ans=0.0 2023-11-24 10:18:34,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2790673.3333333335, ans=10.0 2023-11-24 10:18:45,249 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9800, loss[loss=0.05941, simple_loss=0.07585, pruned_loss=0.01269, audio_tagging_loss=0.008791, over 14988.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09154, pruned_loss=0.01315, audio_tagging_loss=0.008809, over 3042109.66 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:18:47,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2790740.0, ans=0.0 2023-11-24 10:18:49,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790740.0, ans=0.1 2023-11-24 10:19:01,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2790806.6666666665, ans=0.125 2023-11-24 10:19:12,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2790873.3333333335, ans=0.125 2023-11-24 10:19:31,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418650 2023-11-24 10:19:40,635 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:19:47,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-24 10:19:48,371 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9850, loss[loss=0.08471, simple_loss=0.1134, pruned_loss=0.02042, audio_tagging_loss=0.007576, over 15165.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09137, pruned_loss=0.01318, audio_tagging_loss=0.008767, over 3048736.06 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:20:01,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791140.0, ans=0.1 2023-11-24 10:20:12,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.621e+01 9.085e+01 9.764e+01 1.357e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 10:20:21,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-24 10:20:34,216 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418700 2023-11-24 10:20:40,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2791340.0, ans=0.07 2023-11-24 10:20:50,725 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9900, loss[loss=0.05279, simple_loss=0.06622, pruned_loss=0.01017, audio_tagging_loss=0.009518, over 14354.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09183, pruned_loss=0.01313, audio_tagging_loss=0.008788, over 3047049.42 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:21:05,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2791473.3333333335, ans=0.0 2023-11-24 10:21:09,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2791473.3333333335, ans=0.0 2023-11-24 10:21:11,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2791473.3333333335, ans=0.1 2023-11-24 10:21:24,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-24 10:21:36,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418750 2023-11-24 10:21:38,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791606.6666666665, ans=0.1 2023-11-24 10:21:52,129 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 9950, loss[loss=0.04962, simple_loss=0.0707, pruned_loss=0.005978, audio_tagging_loss=0.008291, over 15954.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09156, pruned_loss=0.01319, audio_tagging_loss=0.008774, over 3053026.76 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:21:59,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2791740.0, ans=0.125 2023-11-24 10:22:11,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-24 10:22:18,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.444e+01 9.082e+01 9.757e+01 1.187e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 10:22:18,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2791873.3333333335, ans=0.125 2023-11-24 10:22:24,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2791873.3333333335, ans=0.125 2023-11-24 10:22:39,130 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418800 2023-11-24 10:22:43,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-24 10:22:52,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2792006.6666666665, ans=0.125 2023-11-24 10:22:55,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2792073.3333333335, ans=0.0 2023-11-24 10:22:55,828 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10000, loss[loss=0.04647, simple_loss=0.05521, pruned_loss=0.007187, audio_tagging_loss=0.01168, over 14139.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09116, pruned_loss=0.01312, audio_tagging_loss=0.008824, over 3054230.15 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:22:56,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2792073.3333333335, ans=0.125 2023-11-24 10:23:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2792206.6666666665, ans=0.0 2023-11-24 10:23:39,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2792273.3333333335, ans=0.1 2023-11-24 10:23:42,087 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418850 2023-11-24 10:23:59,882 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10050, loss[loss=0.0714, simple_loss=0.09428, pruned_loss=0.01715, audio_tagging_loss=0.007112, over 14826.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09013, pruned_loss=0.0129, audio_tagging_loss=0.008897, over 3049397.77 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:24:14,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-24 10:24:22,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2792540.0, ans=0.125 2023-11-24 10:24:25,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.144e+01 8.576e+01 9.106e+01 9.929e+01 1.520e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 10:24:28,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2792540.0, ans=0.125 2023-11-24 10:24:29,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2792540.0, ans=0.0 2023-11-24 10:24:30,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2792540.0, ans=0.07 2023-11-24 10:24:33,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2792540.0, ans=0.125 2023-11-24 10:24:36,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792606.6666666665, ans=0.1 2023-11-24 10:24:41,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792606.6666666665, ans=0.1 2023-11-24 10:24:45,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2792606.6666666665, ans=0.1 2023-11-24 10:24:47,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418900 2023-11-24 10:25:02,627 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10100, loss[loss=0.06194, simple_loss=0.08079, pruned_loss=0.01313, audio_tagging_loss=0.008416, over 14070.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09052, pruned_loss=0.01296, audio_tagging_loss=0.008744, over 3053115.77 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:25:08,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2792740.0, ans=0.0 2023-11-24 10:25:11,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2792740.0, ans=0.0 2023-11-24 10:25:12,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2792740.0, ans=0.125 2023-11-24 10:25:16,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2792806.6666666665, ans=0.125 2023-11-24 10:25:44,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-11-24 10:25:48,990 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 418950 2023-11-24 10:25:51,288 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:25:56,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2793006.6666666665, ans=0.125 2023-11-24 10:26:02,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2793006.6666666665, ans=0.125 2023-11-24 10:26:04,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2793073.3333333335, ans=0.125 2023-11-24 10:26:05,127 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10150, loss[loss=0.05753, simple_loss=0.07549, pruned_loss=0.009269, audio_tagging_loss=0.01052, over 16031.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09043, pruned_loss=0.01293, audio_tagging_loss=0.008912, over 3052126.55 frames. ], batch size: 62, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:26:09,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2793073.3333333335, ans=0.2 2023-11-24 10:26:10,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2793073.3333333335, ans=0.125 2023-11-24 10:26:20,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2793140.0, ans=0.0 2023-11-24 10:26:31,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.585e+01 9.069e+01 9.759e+01 1.192e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 10:26:33,576 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:26:50,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419000 2023-11-24 10:26:57,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-24 10:27:08,901 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10200, loss[loss=0.07573, simple_loss=0.1151, pruned_loss=0.009915, audio_tagging_loss=0.00826, over 15870.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09073, pruned_loss=0.01308, audio_tagging_loss=0.009012, over 3053908.65 frames. ], batch size: 56, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:27:11,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2793406.6666666665, ans=0.07 2023-11-24 10:27:20,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2793473.3333333335, ans=0.125 2023-11-24 10:27:30,982 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:27:55,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419050 2023-11-24 10:27:57,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2793606.6666666665, ans=0.125 2023-11-24 10:28:08,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2793673.3333333335, ans=0.125 2023-11-24 10:28:10,829 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10250, loss[loss=0.06911, simple_loss=0.09295, pruned_loss=0.01458, audio_tagging_loss=0.008052, over 15696.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08913, pruned_loss=0.01289, audio_tagging_loss=0.009233, over 3046852.99 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:28:16,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-24 10:28:18,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2793740.0, ans=0.0 2023-11-24 10:28:20,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=2793740.0, ans=12.0 2023-11-24 10:28:37,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.437e+01 9.083e+01 9.683e+01 1.798e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 10:28:57,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419100 2023-11-24 10:28:58,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2793940.0, ans=0.0 2023-11-24 10:29:02,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2794006.6666666665, ans=0.0 2023-11-24 10:29:13,528 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10300, loss[loss=0.06099, simple_loss=0.07764, pruned_loss=0.01176, audio_tagging_loss=0.0104, over 15866.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09017, pruned_loss=0.01315, audio_tagging_loss=0.00927, over 3048902.54 frames. ], batch size: 59, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:29:13,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794073.3333333335, ans=0.1 2023-11-24 10:29:28,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2794140.0, ans=0.04949747468305833 2023-11-24 10:29:28,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-24 10:29:30,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2794140.0, ans=0.125 2023-11-24 10:29:59,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2794273.3333333335, ans=0.125 2023-11-24 10:30:00,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419150 2023-11-24 10:30:06,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-11-24 10:30:16,900 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10350, loss[loss=0.06581, simple_loss=0.08895, pruned_loss=0.01078, audio_tagging_loss=0.01056, over 14101.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09149, pruned_loss=0.01323, audio_tagging_loss=0.009255, over 3046178.97 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:30:42,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.615e+01 9.196e+01 1.008e+02 1.353e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 10:30:52,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2794606.6666666665, ans=10.0 2023-11-24 10:31:03,022 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419200 2023-11-24 10:31:04,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2794606.6666666665, ans=0.125 2023-11-24 10:31:19,391 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10400, loss[loss=0.06127, simple_loss=0.08405, pruned_loss=0.01123, audio_tagging_loss=0.008014, over 14912.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09083, pruned_loss=0.01323, audio_tagging_loss=0.009381, over 3046914.59 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:31:57,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2794940.0, ans=0.0 2023-11-24 10:32:05,612 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419250 2023-11-24 10:32:21,632 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10450, loss[loss=0.07308, simple_loss=0.09117, pruned_loss=0.01959, audio_tagging_loss=0.007902, over 15424.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.09187, pruned_loss=0.01345, audio_tagging_loss=0.009215, over 3049305.38 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:32:49,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.462e+01 9.061e+01 9.658e+01 1.208e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 10:32:53,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2795206.6666666665, ans=0.125 2023-11-24 10:33:07,155 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419300 2023-11-24 10:33:11,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2795340.0, ans=0.125 2023-11-24 10:33:13,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2023-11-24 10:33:15,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-24 10:33:24,649 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10500, loss[loss=0.07642, simple_loss=0.1145, pruned_loss=0.0126, audio_tagging_loss=0.006584, over 15545.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09115, pruned_loss=0.01329, audio_tagging_loss=0.009155, over 3052709.89 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:33:42,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2795473.3333333335, ans=0.125 2023-11-24 10:33:52,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2795540.0, ans=0.125 2023-11-24 10:34:03,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2795606.6666666665, ans=0.125 2023-11-24 10:34:11,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419350 2023-11-24 10:34:15,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2795673.3333333335, ans=0.0 2023-11-24 10:34:19,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-24 10:34:21,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2023-11-24 10:34:27,457 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10550, loss[loss=0.07013, simple_loss=0.08901, pruned_loss=0.01467, audio_tagging_loss=0.01096, over 15356.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09085, pruned_loss=0.01324, audio_tagging_loss=0.008987, over 3047517.04 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:34:31,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2795740.0, ans=0.2 2023-11-24 10:34:31,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2795740.0, ans=12.0 2023-11-24 10:34:35,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2023-11-24 10:34:43,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2795806.6666666665, ans=0.125 2023-11-24 10:34:44,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2795806.6666666665, ans=0.125 2023-11-24 10:34:46,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2795806.6666666665, ans=0.125 2023-11-24 10:34:54,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.781e+01 9.299e+01 1.006e+02 1.266e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-24 10:35:10,790 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:35:13,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419400 2023-11-24 10:35:16,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2796006.6666666665, ans=0.2 2023-11-24 10:35:23,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2796006.6666666665, ans=0.125 2023-11-24 10:35:29,248 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10600, loss[loss=0.04765, simple_loss=0.06204, pruned_loss=0.0063, audio_tagging_loss=0.01034, over 15242.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09056, pruned_loss=0.01318, audio_tagging_loss=0.008971, over 3050465.41 frames. ], batch size: 58, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:35:29,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2796073.3333333335, ans=0.0 2023-11-24 10:35:50,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2796140.0, ans=0.125 2023-11-24 10:36:04,998 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:36:15,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419450 2023-11-24 10:36:23,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2796340.0, ans=0.0 2023-11-24 10:36:32,455 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10650, loss[loss=0.05522, simple_loss=0.07908, pruned_loss=0.008326, audio_tagging_loss=0.00735, over 15216.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09135, pruned_loss=0.01338, audio_tagging_loss=0.008888, over 3047496.70 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:36:59,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.535e+01 9.205e+01 9.977e+01 1.279e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-24 10:37:18,827 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419500 2023-11-24 10:37:21,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2796673.3333333335, ans=0.02 2023-11-24 10:37:27,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2796673.3333333335, ans=0.09899494936611666 2023-11-24 10:37:34,634 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10700, loss[loss=0.06297, simple_loss=0.07404, pruned_loss=0.01569, audio_tagging_loss=0.01026, over 14062.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09045, pruned_loss=0.01318, audio_tagging_loss=0.008851, over 3048493.47 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:37:38,436 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:37:45,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2796740.0, ans=0.125 2023-11-24 10:37:51,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-11-24 10:37:56,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2796806.6666666665, ans=0.125 2023-11-24 10:38:21,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419550 2023-11-24 10:38:26,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2797006.6666666665, ans=0.0 2023-11-24 10:38:37,174 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10750, loss[loss=0.07975, simple_loss=0.1064, pruned_loss=0.02104, audio_tagging_loss=0.00551, over 15546.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08991, pruned_loss=0.0131, audio_tagging_loss=0.008859, over 3052404.65 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 16.0 2023-11-24 10:38:41,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2797073.3333333335, ans=0.0 2023-11-24 10:38:51,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797140.0, ans=0.1 2023-11-24 10:39:05,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.423e+01 9.138e+01 9.778e+01 1.311e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 10:39:24,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419600 2023-11-24 10:39:33,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2797340.0, ans=0.2 2023-11-24 10:39:37,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2797340.0, ans=0.0 2023-11-24 10:39:40,625 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10800, loss[loss=0.0802, simple_loss=0.1119, pruned_loss=0.01507, audio_tagging_loss=0.009173, over 16241.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09029, pruned_loss=0.01304, audio_tagging_loss=0.008797, over 3056836.35 frames. ], batch size: 60, lr: 1.93e-03, grad_scale: 32.0 2023-11-24 10:39:54,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2797473.3333333335, ans=0.0 2023-11-24 10:40:11,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2797540.0, ans=0.125 2023-11-24 10:40:26,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419650 2023-11-24 10:40:35,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2797673.3333333335, ans=0.125 2023-11-24 10:40:42,942 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10850, loss[loss=0.06642, simple_loss=0.08804, pruned_loss=0.01334, audio_tagging_loss=0.00906, over 14533.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09082, pruned_loss=0.01312, audio_tagging_loss=0.008791, over 3058509.68 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:40:43,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797740.0, ans=0.1 2023-11-24 10:40:46,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2797740.0, ans=0.0 2023-11-24 10:40:48,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2797740.0, ans=0.0 2023-11-24 10:40:51,555 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:41:09,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.591e+01 9.237e+01 1.014e+02 1.325e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 10:41:28,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-11-24 10:41:29,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419700 2023-11-24 10:41:38,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2798006.6666666665, ans=0.125 2023-11-24 10:41:39,725 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:41:44,405 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10900, loss[loss=0.08134, simple_loss=0.1144, pruned_loss=0.01836, audio_tagging_loss=0.005781, over 14596.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.0912, pruned_loss=0.01304, audio_tagging_loss=0.008881, over 3061644.27 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:41:55,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=2798073.3333333335, ans=12.0 2023-11-24 10:42:09,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2798206.6666666665, ans=0.025 2023-11-24 10:42:30,844 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419750 2023-11-24 10:42:37,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2798340.0, ans=0.125 2023-11-24 10:42:47,347 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 10950, loss[loss=0.04746, simple_loss=0.05769, pruned_loss=0.008673, audio_tagging_loss=0.009938, over 15492.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09042, pruned_loss=0.01295, audio_tagging_loss=0.009049, over 3055997.83 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:43:08,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-24 10:43:14,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.263e+01 9.239e+01 9.893e+01 2.160e+02, threshold=1.848e+02, percent-clipped=1.0 2023-11-24 10:43:16,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2798540.0, ans=0.0 2023-11-24 10:43:33,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419800 2023-11-24 10:43:50,662 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11000, loss[loss=0.04279, simple_loss=0.05283, pruned_loss=0.004246, audio_tagging_loss=0.01213, over 15268.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09148, pruned_loss=0.01304, audio_tagging_loss=0.009087, over 3053869.88 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:43:58,909 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:44:05,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-24 10:44:11,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2023-11-24 10:44:31,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2798940.0, ans=0.125 2023-11-24 10:44:37,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419850 2023-11-24 10:44:47,814 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:44:52,414 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11050, loss[loss=0.0613, simple_loss=0.08199, pruned_loss=0.01137, audio_tagging_loss=0.008935, over 14995.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09201, pruned_loss=0.01317, audio_tagging_loss=0.009081, over 3063424.91 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:44:53,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2799073.3333333335, ans=0.2 2023-11-24 10:44:55,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-24 10:45:21,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.396e+01 8.466e+01 9.230e+01 9.960e+01 1.939e+02, threshold=1.846e+02, percent-clipped=1.0 2023-11-24 10:45:33,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2799273.3333333335, ans=0.125 2023-11-24 10:45:38,879 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419900 2023-11-24 10:45:46,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2799340.0, ans=0.1 2023-11-24 10:45:55,572 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11100, loss[loss=0.07372, simple_loss=0.09355, pruned_loss=0.0191, audio_tagging_loss=0.007841, over 14094.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.0921, pruned_loss=0.01328, audio_tagging_loss=0.009098, over 3066747.01 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:46:11,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2799473.3333333335, ans=0.125 2023-11-24 10:46:21,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2799540.0, ans=0.125 2023-11-24 10:46:41,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 419950 2023-11-24 10:46:44,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-11-24 10:46:46,681 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:46:52,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2799673.3333333335, ans=0.125 2023-11-24 10:46:56,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2799673.3333333335, ans=0.125 2023-11-24 10:46:58,124 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11150, loss[loss=0.07242, simple_loss=0.09538, pruned_loss=0.01502, audio_tagging_loss=0.009712, over 15391.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.0924, pruned_loss=0.0133, audio_tagging_loss=0.009235, over 3071038.64 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:46:59,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2799740.0, ans=0.07 2023-11-24 10:47:04,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-24 10:47:07,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=2799740.0, ans=8.0 2023-11-24 10:47:10,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2799806.6666666665, ans=0.0 2023-11-24 10:47:13,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2799806.6666666665, ans=0.025 2023-11-24 10:47:26,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.463e+01 9.073e+01 9.645e+01 1.170e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 10:47:28,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2799873.3333333335, ans=0.2 2023-11-24 10:47:44,413 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420000 2023-11-24 10:47:50,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800006.6666666665, ans=0.1 2023-11-24 10:48:03,613 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11200, loss[loss=0.06902, simple_loss=0.09536, pruned_loss=0.01332, audio_tagging_loss=0.008017, over 13616.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.0928, pruned_loss=0.01331, audio_tagging_loss=0.009281, over 3064242.60 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:48:03,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2800073.3333333335, ans=0.125 2023-11-24 10:48:06,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2800073.3333333335, ans=0.125 2023-11-24 10:48:07,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2800073.3333333335, ans=0.2 2023-11-24 10:48:16,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2800140.0, ans=0.0 2023-11-24 10:48:16,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2800140.0, ans=0.125 2023-11-24 10:48:50,453 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420050 2023-11-24 10:48:54,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2800340.0, ans=0.125 2023-11-24 10:48:54,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=22.5 2023-11-24 10:48:55,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2800340.0, ans=0.025 2023-11-24 10:49:06,593 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11250, loss[loss=0.0647, simple_loss=0.0795, pruned_loss=0.01237, audio_tagging_loss=0.01259, over 14312.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09146, pruned_loss=0.01319, audio_tagging_loss=0.009302, over 3061739.41 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:49:07,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-24 10:49:10,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2800406.6666666665, ans=0.2 2023-11-24 10:49:25,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2023-11-24 10:49:32,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2800540.0, ans=0.05 2023-11-24 10:49:35,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.479e+01 9.057e+01 9.752e+01 1.909e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-24 10:49:38,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2800540.0, ans=0.04949747468305833 2023-11-24 10:49:52,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420100 2023-11-24 10:50:04,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2800673.3333333335, ans=0.0 2023-11-24 10:50:09,686 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11300, loss[loss=0.05298, simple_loss=0.07301, pruned_loss=0.007857, audio_tagging_loss=0.008618, over 15813.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09155, pruned_loss=0.01309, audio_tagging_loss=0.009065, over 3064239.70 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:50:24,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-24 10:50:55,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420150 2023-11-24 10:51:02,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2801006.6666666665, ans=0.015 2023-11-24 10:51:10,853 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11350, loss[loss=0.06326, simple_loss=0.08927, pruned_loss=0.008291, audio_tagging_loss=0.01033, over 15387.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09102, pruned_loss=0.01307, audio_tagging_loss=0.008987, over 3048832.58 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:51:20,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2801073.3333333335, ans=0.1 2023-11-24 10:51:20,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-24 10:51:25,811 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 10:51:40,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.545e+01 9.244e+01 9.787e+01 1.195e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 10:51:44,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-24 10:51:56,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420200 2023-11-24 10:52:11,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=22.5 2023-11-24 10:52:13,278 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11400, loss[loss=0.08694, simple_loss=0.1128, pruned_loss=0.02039, audio_tagging_loss=0.01014, over 15129.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09125, pruned_loss=0.01312, audio_tagging_loss=0.008882, over 3047556.93 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:52:19,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2801406.6666666665, ans=0.125 2023-11-24 10:52:20,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2801406.6666666665, ans=0.125 2023-11-24 10:52:41,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801540.0, ans=0.1 2023-11-24 10:52:43,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2801540.0, ans=0.125 2023-11-24 10:52:44,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801540.0, ans=0.1 2023-11-24 10:52:50,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2801606.6666666665, ans=0.125 2023-11-24 10:52:58,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2801606.6666666665, ans=0.1 2023-11-24 10:53:00,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420250 2023-11-24 10:53:04,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2801673.3333333335, ans=0.125 2023-11-24 10:53:17,797 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11450, loss[loss=0.06003, simple_loss=0.07656, pruned_loss=0.01084, audio_tagging_loss=0.01091, over 15806.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09196, pruned_loss=0.01335, audio_tagging_loss=0.00878, over 3047856.68 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:53:32,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2801806.6666666665, ans=0.09899494936611666 2023-11-24 10:53:33,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2801806.6666666665, ans=0.0 2023-11-24 10:53:46,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.489e+01 9.216e+01 1.000e+02 1.338e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 10:54:03,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420300 2023-11-24 10:54:16,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802006.6666666665, ans=0.1 2023-11-24 10:54:19,457 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11500, loss[loss=0.08162, simple_loss=0.1133, pruned_loss=0.01546, audio_tagging_loss=0.009542, over 16074.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09171, pruned_loss=0.01336, audio_tagging_loss=0.008778, over 3047231.05 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:54:25,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=12.0 2023-11-24 10:54:35,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2802140.0, ans=0.0 2023-11-24 10:55:01,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-24 10:55:05,378 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420350 2023-11-24 10:55:17,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2023-11-24 10:55:20,735 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11550, loss[loss=0.08236, simple_loss=0.1106, pruned_loss=0.02128, audio_tagging_loss=0.005753, over 15962.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09203, pruned_loss=0.01349, audio_tagging_loss=0.008829, over 3045478.31 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 16.0 2023-11-24 10:55:49,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-24 10:55:50,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2802540.0, ans=0.0 2023-11-24 10:55:50,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.625e+01 9.353e+01 9.944e+01 1.426e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-24 10:55:56,967 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 10:56:03,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2802606.6666666665, ans=0.125 2023-11-24 10:56:03,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802606.6666666665, ans=0.1 2023-11-24 10:56:05,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2802606.6666666665, ans=0.125 2023-11-24 10:56:06,441 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420400 2023-11-24 10:56:15,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-11-24 10:56:22,928 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11600, loss[loss=0.06243, simple_loss=0.08133, pruned_loss=0.01263, audio_tagging_loss=0.009136, over 15779.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09108, pruned_loss=0.01327, audio_tagging_loss=0.008852, over 3046377.06 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:56:30,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-24 10:56:39,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2802806.6666666665, ans=0.0 2023-11-24 10:56:39,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2802806.6666666665, ans=10.0 2023-11-24 10:56:42,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2802806.6666666665, ans=0.04949747468305833 2023-11-24 10:56:47,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2802873.3333333335, ans=0.125 2023-11-24 10:57:08,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420450 2023-11-24 10:57:14,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2803006.6666666665, ans=0.1 2023-11-24 10:57:23,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2803073.3333333335, ans=0.125 2023-11-24 10:57:24,764 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11650, loss[loss=0.08071, simple_loss=0.1202, pruned_loss=0.0137, audio_tagging_loss=0.00691, over 15722.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09227, pruned_loss=0.01332, audio_tagging_loss=0.008878, over 3052598.53 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:57:34,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-24 10:57:38,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-11-24 10:57:54,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.520e+01 9.268e+01 1.018e+02 1.416e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 10:57:58,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2803206.6666666665, ans=0.1 2023-11-24 10:58:03,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2803273.3333333335, ans=0.0 2023-11-24 10:58:10,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420500 2023-11-24 10:58:14,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2803340.0, ans=0.125 2023-11-24 10:58:14,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2803340.0, ans=0.1 2023-11-24 10:58:25,798 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11700, loss[loss=0.07945, simple_loss=0.1151, pruned_loss=0.01542, audio_tagging_loss=0.006498, over 16040.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09113, pruned_loss=0.01312, audio_tagging_loss=0.008923, over 3046903.67 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:58:27,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2803406.6666666665, ans=0.07 2023-11-24 10:58:53,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2803540.0, ans=0.125 2023-11-24 10:59:11,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-11-24 10:59:11,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420550 2023-11-24 10:59:19,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2023-11-24 10:59:28,511 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11750, loss[loss=0.05909, simple_loss=0.08735, pruned_loss=0.009496, audio_tagging_loss=0.005921, over 15588.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09049, pruned_loss=0.01299, audio_tagging_loss=0.008955, over 3050057.82 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 10:59:31,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-24 10:59:57,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.255e+01 8.962e+01 9.569e+01 1.238e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-24 10:59:59,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2803873.3333333335, ans=0.1 2023-11-24 11:00:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2803873.3333333335, ans=0.125 2023-11-24 11:00:07,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2803940.0, ans=0.0 2023-11-24 11:00:13,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420600 2023-11-24 11:00:29,777 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11800, loss[loss=0.06696, simple_loss=0.09323, pruned_loss=0.01183, audio_tagging_loss=0.008516, over 15217.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09067, pruned_loss=0.01294, audio_tagging_loss=0.008975, over 3052297.48 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 11:00:52,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2804140.0, ans=0.0 2023-11-24 11:00:57,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2804206.6666666665, ans=0.125 2023-11-24 11:01:16,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420650 2023-11-24 11:01:21,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2804340.0, ans=0.0 2023-11-24 11:01:29,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2804340.0, ans=0.125 2023-11-24 11:01:32,362 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11850, loss[loss=0.07343, simple_loss=0.09746, pruned_loss=0.01487, audio_tagging_loss=0.00982, over 13546.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09009, pruned_loss=0.0129, audio_tagging_loss=0.009005, over 3044457.43 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 11:01:55,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2804473.3333333335, ans=0.125 2023-11-24 11:02:02,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.352e+01 9.070e+01 9.952e+01 1.461e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 11:02:16,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2804606.6666666665, ans=0.125 2023-11-24 11:02:18,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420700 2023-11-24 11:02:23,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2804673.3333333335, ans=0.125 2023-11-24 11:02:34,397 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11900, loss[loss=0.07158, simple_loss=0.09672, pruned_loss=0.01175, audio_tagging_loss=0.01146, over 15249.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09026, pruned_loss=0.01284, audio_tagging_loss=0.009062, over 3041599.27 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 11:02:41,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2804740.0, ans=0.07 2023-11-24 11:02:50,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2804806.6666666665, ans=0.0 2023-11-24 11:02:53,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2804806.6666666665, ans=0.0 2023-11-24 11:02:53,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-24 11:03:20,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420750 2023-11-24 11:03:32,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2805006.6666666665, ans=0.125 2023-11-24 11:03:35,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:03:36,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-24 11:03:37,208 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 11950, loss[loss=0.08402, simple_loss=0.1138, pruned_loss=0.01839, audio_tagging_loss=0.008749, over 15174.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09003, pruned_loss=0.01281, audio_tagging_loss=0.009232, over 3044273.03 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 11:03:54,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-11-24 11:04:07,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.367e+01 9.062e+01 9.589e+01 1.237e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 11:04:13,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=22.5 2023-11-24 11:04:20,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2805273.3333333335, ans=0.0 2023-11-24 11:04:22,825 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420800 2023-11-24 11:04:37,828 INFO [train_asr.py:1221] (1/4) Epoch 35, batch 12000, loss[loss=0.08036, simple_loss=0.1058, pruned_loss=0.01712, audio_tagging_loss=0.01036, over 16375.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.0909, pruned_loss=0.01298, audio_tagging_loss=0.009299, over 3053893.27 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 32.0 2023-11-24 11:04:37,829 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 11:05:18,378 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.3973, 3.5423, 4.4084, 3.1948], device='cuda:1') 2023-11-24 11:05:19,862 INFO [train_asr.py:1253] (1/4) Epoch 35, validation: loss=0.0585, simple_loss=0.05078, pruned_loss=0.005085, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-24 11:05:19,863 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 11:05:23,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2805406.6666666665, ans=0.2 2023-11-24 11:05:26,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-24 11:05:30,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2805473.3333333335, ans=0.125 2023-11-24 11:05:36,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805473.3333333335, ans=0.1 2023-11-24 11:05:39,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2805473.3333333335, ans=0.125 2023-11-24 11:06:25,319 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 0, loss[loss=0.07667, simple_loss=0.1022, pruned_loss=0.008481, audio_tagging_loss=0.01709, over 14985.00 frames. ], tot_loss[loss=0.07667, simple_loss=0.1022, pruned_loss=0.008481, audio_tagging_loss=0.01709, over 14985.00 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:06:25,320 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 11:06:44,403 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7536, 4.8038, 4.7441, 4.7545], device='cuda:1') 2023-11-24 11:07:04,355 INFO [train_asr.py:1253] (1/4) Epoch 36, validation: loss=0.05761, simple_loss=0.0508, pruned_loss=0.005078, audio_tagging_loss=0.02713, over 4681554.00 frames. 2023-11-24 11:07:04,355 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 11:07:08,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2805553.3333333335, ans=0.125 2023-11-24 11:07:08,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2805553.3333333335, ans=0.125 2023-11-24 11:07:14,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2805553.3333333335, ans=0.125 2023-11-24 11:07:22,267 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420850 2023-11-24 11:07:58,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2805820.0, ans=0.1 2023-11-24 11:08:06,407 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 50, loss[loss=0.07765, simple_loss=0.09495, pruned_loss=0.01297, audio_tagging_loss=0.01721, over 15539.00 frames. ], tot_loss[loss=0.07491, simple_loss=0.08968, pruned_loss=0.013, audio_tagging_loss=0.01707, over 697276.78 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:08:07,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2805886.6666666665, ans=0.2 2023-11-24 11:08:09,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.077e+01 9.826e+01 1.067e+02 1.319e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-24 11:08:14,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2805886.6666666665, ans=0.0 2023-11-24 11:08:15,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2805886.6666666665, ans=15.0 2023-11-24 11:08:18,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2805953.3333333335, ans=0.05 2023-11-24 11:08:25,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420900 2023-11-24 11:08:39,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806020.0, ans=0.1 2023-11-24 11:08:42,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2806020.0, ans=0.125 2023-11-24 11:08:43,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806086.6666666665, ans=0.1 2023-11-24 11:08:45,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806086.6666666665, ans=0.1 2023-11-24 11:09:08,029 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 100, loss[loss=0.09698, simple_loss=0.125, pruned_loss=0.02195, audio_tagging_loss=0.01251, over 15236.00 frames. ], tot_loss[loss=0.07437, simple_loss=0.09058, pruned_loss=0.01282, audio_tagging_loss=0.01625, over 1213471.17 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:09:11,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2806220.0, ans=0.0 2023-11-24 11:09:12,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806220.0, ans=0.1 2023-11-24 11:09:16,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-24 11:09:24,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2806286.6666666665, ans=0.125 2023-11-24 11:09:27,617 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 420950 2023-11-24 11:09:41,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2806353.3333333335, ans=0.125 2023-11-24 11:09:46,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806420.0, ans=0.1 2023-11-24 11:10:11,664 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 150, loss[loss=0.0591, simple_loss=0.07506, pruned_loss=0.00895, audio_tagging_loss=0.01262, over 14832.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09099, pruned_loss=0.01285, audio_tagging_loss=0.01466, over 1621014.60 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:10:15,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.014e+01 9.024e+01 9.585e+01 1.046e+02 2.115e+02, threshold=1.917e+02, percent-clipped=1.0 2023-11-24 11:10:21,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2806553.3333333335, ans=0.125 2023-11-24 11:10:21,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2806553.3333333335, ans=0.2 2023-11-24 11:10:29,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421000 2023-11-24 11:10:32,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2806620.0, ans=0.0 2023-11-24 11:10:32,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-24 11:11:04,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806820.0, ans=0.1 2023-11-24 11:11:07,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2023-11-24 11:11:12,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2806886.6666666665, ans=0.0 2023-11-24 11:11:13,387 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 200, loss[loss=0.07172, simple_loss=0.08252, pruned_loss=0.01966, audio_tagging_loss=0.0108, over 15220.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.0901, pruned_loss=0.01286, audio_tagging_loss=0.01317, over 1932393.75 frames. ], batch size: 60, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:11:31,263 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421050 2023-11-24 11:11:43,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2807020.0, ans=0.125 2023-11-24 11:12:02,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-24 11:12:14,918 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 250, loss[loss=0.06259, simple_loss=0.0865, pruned_loss=0.0115, audio_tagging_loss=0.00784, over 14886.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.08949, pruned_loss=0.01273, audio_tagging_loss=0.01208, over 2185476.69 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:12:17,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2807220.0, ans=0.125 2023-11-24 11:12:18,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.652e+01 9.389e+01 9.985e+01 1.276e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-24 11:12:34,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421100 2023-11-24 11:12:44,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-24 11:13:03,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-24 11:13:12,658 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:13:17,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2807553.3333333335, ans=0.2 2023-11-24 11:13:18,266 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 300, loss[loss=0.08038, simple_loss=0.1063, pruned_loss=0.01914, audio_tagging_loss=0.008081, over 16005.00 frames. ], tot_loss[loss=0.06993, simple_loss=0.09149, pruned_loss=0.01312, audio_tagging_loss=0.01106, over 2379322.39 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:13:36,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421150 2023-11-24 11:13:38,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2807620.0, ans=0.0 2023-11-24 11:13:40,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2807686.6666666665, ans=0.2 2023-11-24 11:13:59,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2807753.3333333335, ans=0.125 2023-11-24 11:14:19,896 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 350, loss[loss=0.0573, simple_loss=0.07766, pruned_loss=0.00895, audio_tagging_loss=0.009517, over 15635.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09067, pruned_loss=0.01302, audio_tagging_loss=0.01056, over 2524809.69 frames. ], batch size: 60, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:14:23,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.480e+01 9.093e+01 9.744e+01 1.336e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 11:14:28,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2807886.6666666665, ans=0.125 2023-11-24 11:14:29,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2807886.6666666665, ans=0.0 2023-11-24 11:14:38,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421200 2023-11-24 11:14:45,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2808020.0, ans=0.0 2023-11-24 11:15:05,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2808086.6666666665, ans=0.0 2023-11-24 11:15:21,392 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 400, loss[loss=0.05125, simple_loss=0.06402, pruned_loss=0.01045, audio_tagging_loss=0.008785, over 15078.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.08992, pruned_loss=0.01285, audio_tagging_loss=0.0102, over 2643563.54 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:15:34,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2808286.6666666665, ans=0.1 2023-11-24 11:15:39,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2808286.6666666665, ans=0.0 2023-11-24 11:15:41,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421250 2023-11-24 11:15:58,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-24 11:16:23,731 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 450, loss[loss=0.06504, simple_loss=0.08906, pruned_loss=0.01252, audio_tagging_loss=0.007993, over 14448.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.08977, pruned_loss=0.01274, audio_tagging_loss=0.009796, over 2736700.55 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:16:27,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.299e+01 9.052e+01 9.986e+01 1.410e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 11:16:39,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2808620.0, ans=0.0 2023-11-24 11:16:42,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421300 2023-11-24 11:16:47,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2808686.6666666665, ans=0.125 2023-11-24 11:17:02,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2808753.3333333335, ans=0.125 2023-11-24 11:17:06,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-24 11:17:12,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2808820.0, ans=0.125 2023-11-24 11:17:26,704 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 500, loss[loss=0.06161, simple_loss=0.08164, pruned_loss=0.01282, audio_tagging_loss=0.007973, over 14186.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.08875, pruned_loss=0.01269, audio_tagging_loss=0.009651, over 2796246.83 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:17:34,086 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:17:37,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2808953.3333333335, ans=0.125 2023-11-24 11:17:44,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421350 2023-11-24 11:17:52,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-24 11:18:09,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2809086.6666666665, ans=0.125 2023-11-24 11:18:28,539 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 550, loss[loss=0.0778, simple_loss=0.1161, pruned_loss=0.01475, audio_tagging_loss=0.00502, over 14494.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09016, pruned_loss=0.01296, audio_tagging_loss=0.009432, over 2849346.49 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:18:28,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2809220.0, ans=0.0 2023-11-24 11:18:32,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.386e+01 8.873e+01 9.796e+01 1.246e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-24 11:18:41,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2809286.6666666665, ans=0.5 2023-11-24 11:18:43,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2809286.6666666665, ans=0.125 2023-11-24 11:18:47,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421400 2023-11-24 11:18:56,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2809353.3333333335, ans=0.125 2023-11-24 11:19:14,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2809420.0, ans=0.0 2023-11-24 11:19:24,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2809486.6666666665, ans=0.2 2023-11-24 11:19:31,263 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 600, loss[loss=0.08916, simple_loss=0.1238, pruned_loss=0.02129, audio_tagging_loss=0.005954, over 16283.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08935, pruned_loss=0.01272, audio_tagging_loss=0.00938, over 2894423.67 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:19:42,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2023-11-24 11:19:48,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2809620.0, ans=0.0 2023-11-24 11:19:50,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421450 2023-11-24 11:19:56,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2809686.6666666665, ans=0.125 2023-11-24 11:20:00,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2809686.6666666665, ans=0.0 2023-11-24 11:20:04,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2809686.6666666665, ans=0.125 2023-11-24 11:20:07,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-24 11:20:13,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2809753.3333333335, ans=0.2 2023-11-24 11:20:24,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-24 11:20:33,872 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 650, loss[loss=0.07649, simple_loss=0.1087, pruned_loss=0.01467, audio_tagging_loss=0.007475, over 15337.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08867, pruned_loss=0.01266, audio_tagging_loss=0.009391, over 2929802.76 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:20:37,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.541e+01 9.260e+01 1.007e+02 1.291e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 11:20:47,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2809953.3333333335, ans=0.125 2023-11-24 11:20:51,726 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421500 2023-11-24 11:20:57,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2810020.0, ans=0.1 2023-11-24 11:21:30,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2810153.3333333335, ans=0.1 2023-11-24 11:21:35,320 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 700, loss[loss=0.0668, simple_loss=0.09598, pruned_loss=0.01202, audio_tagging_loss=0.006785, over 16288.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.08955, pruned_loss=0.01276, audio_tagging_loss=0.009213, over 2968925.51 frames. ], batch size: 61, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:21:38,073 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:21:50,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-24 11:21:54,222 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421550 2023-11-24 11:21:55,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2810286.6666666665, ans=0.025 2023-11-24 11:22:06,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2810353.3333333335, ans=0.0 2023-11-24 11:22:09,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2023-11-24 11:22:12,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2810420.0, ans=0.0 2023-11-24 11:22:27,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2810486.6666666665, ans=0.125 2023-11-24 11:22:37,453 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 750, loss[loss=0.0651, simple_loss=0.08594, pruned_loss=0.009654, audio_tagging_loss=0.01247, over 15714.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08958, pruned_loss=0.01259, audio_tagging_loss=0.009273, over 2990130.59 frames. ], batch size: 60, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:22:41,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.522e+01 9.060e+01 9.732e+01 1.237e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 11:22:56,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421600 2023-11-24 11:22:59,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810620.0, ans=0.1 2023-11-24 11:23:13,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2810753.3333333335, ans=0.0 2023-11-24 11:23:23,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2810753.3333333335, ans=0.125 2023-11-24 11:23:38,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2810820.0, ans=0.015 2023-11-24 11:23:40,407 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 800, loss[loss=0.06949, simple_loss=0.08874, pruned_loss=0.01406, audio_tagging_loss=0.01105, over 14245.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.0909, pruned_loss=0.0129, audio_tagging_loss=0.00927, over 2990322.79 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:23:51,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2810953.3333333335, ans=0.0 2023-11-24 11:23:57,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-24 11:23:58,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421650 2023-11-24 11:24:26,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2811086.6666666665, ans=0.0 2023-11-24 11:24:31,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-24 11:24:33,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2811153.3333333335, ans=0.125 2023-11-24 11:24:41,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2811220.0, ans=0.04949747468305833 2023-11-24 11:24:42,429 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 850, loss[loss=0.06896, simple_loss=0.09823, pruned_loss=0.01315, audio_tagging_loss=0.006693, over 14894.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09131, pruned_loss=0.01302, audio_tagging_loss=0.009295, over 3006938.98 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:24:45,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-24 11:24:47,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.758e+01 9.246e+01 1.015e+02 2.108e+02, threshold=1.849e+02, percent-clipped=1.0 2023-11-24 11:25:01,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421700 2023-11-24 11:25:03,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-24 11:25:03,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=15.0 2023-11-24 11:25:33,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-11-24 11:25:45,248 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 900, loss[loss=0.06876, simple_loss=0.09875, pruned_loss=0.01174, audio_tagging_loss=0.00765, over 15198.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09191, pruned_loss=0.01304, audio_tagging_loss=0.009311, over 3012209.82 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:26:04,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421750 2023-11-24 11:26:20,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2811686.6666666665, ans=0.2 2023-11-24 11:26:36,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-24 11:26:47,783 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 950, loss[loss=0.08148, simple_loss=0.1204, pruned_loss=0.01523, audio_tagging_loss=0.006073, over 14739.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09254, pruned_loss=0.0131, audio_tagging_loss=0.009196, over 3019696.02 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:26:49,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2811886.6666666665, ans=0.125 2023-11-24 11:26:52,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.504e+01 9.122e+01 9.832e+01 1.245e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 11:27:06,046 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421800 2023-11-24 11:27:11,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-11-24 11:27:13,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2812020.0, ans=0.1 2023-11-24 11:27:16,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2812020.0, ans=0.0 2023-11-24 11:27:38,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2812153.3333333335, ans=0.2 2023-11-24 11:27:49,561 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1000, loss[loss=0.08085, simple_loss=0.1142, pruned_loss=0.0185, audio_tagging_loss=0.00526, over 15168.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09205, pruned_loss=0.013, audio_tagging_loss=0.008979, over 3028640.77 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:27:56,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-24 11:28:08,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421850 2023-11-24 11:28:15,017 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 11:28:18,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2812353.3333333335, ans=0.125 2023-11-24 11:28:24,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2812353.3333333335, ans=0.07 2023-11-24 11:28:25,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-11-24 11:28:51,786 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1050, loss[loss=0.0766, simple_loss=0.105, pruned_loss=0.01792, audio_tagging_loss=0.006174, over 15324.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.0916, pruned_loss=0.01297, audio_tagging_loss=0.008952, over 3039348.23 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:28:57,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.446e+01 9.134e+01 9.863e+01 1.540e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 11:29:05,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2812620.0, ans=0.0 2023-11-24 11:29:11,704 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421900 2023-11-24 11:29:39,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2812753.3333333335, ans=0.0 2023-11-24 11:29:41,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2023-11-24 11:29:54,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:29:55,476 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1100, loss[loss=0.06489, simple_loss=0.09184, pruned_loss=0.01145, audio_tagging_loss=0.00752, over 14743.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09193, pruned_loss=0.01309, audio_tagging_loss=0.008852, over 3043344.63 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:29:56,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2812886.6666666665, ans=0.0 2023-11-24 11:29:56,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2812886.6666666665, ans=0.125 2023-11-24 11:29:57,911 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 11:30:13,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 421950 2023-11-24 11:30:24,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2813020.0, ans=15.0 2023-11-24 11:30:47,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2813153.3333333335, ans=0.125 2023-11-24 11:30:49,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2813153.3333333335, ans=0.125 2023-11-24 11:30:53,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2813153.3333333335, ans=0.0 2023-11-24 11:30:56,384 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1150, loss[loss=0.067, simple_loss=0.09396, pruned_loss=0.01156, audio_tagging_loss=0.00846, over 14570.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09184, pruned_loss=0.01309, audio_tagging_loss=0.008774, over 3035983.95 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:30:56,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2813220.0, ans=0.125 2023-11-24 11:31:01,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.442e+01 9.003e+01 9.685e+01 1.147e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-24 11:31:05,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2813220.0, ans=0.09899494936611666 2023-11-24 11:31:10,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2023-11-24 11:31:13,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2813286.6666666665, ans=0.0 2023-11-24 11:31:15,491 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422000 2023-11-24 11:31:19,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2813286.6666666665, ans=0.125 2023-11-24 11:31:33,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2813420.0, ans=0.2 2023-11-24 11:31:42,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2813420.0, ans=0.125 2023-11-24 11:31:55,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-24 11:31:58,427 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1200, loss[loss=0.06868, simple_loss=0.09559, pruned_loss=0.01273, audio_tagging_loss=0.008154, over 15738.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09152, pruned_loss=0.01297, audio_tagging_loss=0.008888, over 3041900.63 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:31:58,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2813553.3333333335, ans=0.125 2023-11-24 11:32:02,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2813553.3333333335, ans=0.0 2023-11-24 11:32:17,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422050 2023-11-24 11:32:28,601 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:32:36,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2813753.3333333335, ans=0.0 2023-11-24 11:32:37,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2813753.3333333335, ans=0.2 2023-11-24 11:32:40,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2023-11-24 11:33:00,663 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1250, loss[loss=0.06396, simple_loss=0.08668, pruned_loss=0.01011, audio_tagging_loss=0.01052, over 14225.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09126, pruned_loss=0.01298, audio_tagging_loss=0.008811, over 3043608.91 frames. ], batch size: 53, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:33:07,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.565e+01 9.272e+01 1.023e+02 1.182e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 11:33:18,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422100 2023-11-24 11:33:19,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2813953.3333333335, ans=0.125 2023-11-24 11:33:36,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2814086.6666666665, ans=0.125 2023-11-24 11:33:39,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2814086.6666666665, ans=0.125 2023-11-24 11:33:41,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2814086.6666666665, ans=0.125 2023-11-24 11:33:52,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2814153.3333333335, ans=0.125 2023-11-24 11:34:01,838 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1300, loss[loss=0.07035, simple_loss=0.096, pruned_loss=0.01432, audio_tagging_loss=0.008025, over 15655.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09113, pruned_loss=0.01306, audio_tagging_loss=0.008798, over 3043504.00 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:34:19,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422150 2023-11-24 11:34:31,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2814353.3333333335, ans=0.0 2023-11-24 11:34:38,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2814420.0, ans=0.125 2023-11-24 11:34:47,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-24 11:34:56,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=22.5 2023-11-24 11:35:00,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814486.6666666665, ans=0.1 2023-11-24 11:35:04,212 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1350, loss[loss=0.08251, simple_loss=0.1082, pruned_loss=0.01979, audio_tagging_loss=0.008589, over 15731.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09081, pruned_loss=0.01286, audio_tagging_loss=0.008847, over 3046139.01 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:35:08,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2814553.3333333335, ans=0.0 2023-11-24 11:35:10,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.582e+01 9.240e+01 9.907e+01 1.176e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 11:35:24,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422200 2023-11-24 11:35:24,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2814620.0, ans=0.125 2023-11-24 11:35:25,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814620.0, ans=0.1 2023-11-24 11:35:31,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-24 11:35:45,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2814753.3333333335, ans=0.2 2023-11-24 11:35:49,793 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 11:36:04,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2814820.0, ans=0.125 2023-11-24 11:36:08,401 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1400, loss[loss=0.06253, simple_loss=0.07589, pruned_loss=0.01356, audio_tagging_loss=0.01103, over 14493.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09015, pruned_loss=0.01272, audio_tagging_loss=0.009015, over 3045153.86 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:36:09,944 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:36:24,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2814953.3333333335, ans=0.2 2023-11-24 11:36:26,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422250 2023-11-24 11:36:28,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2814953.3333333335, ans=0.125 2023-11-24 11:37:04,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2815153.3333333335, ans=0.0 2023-11-24 11:37:10,637 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1450, loss[loss=0.06333, simple_loss=0.08576, pruned_loss=0.0129, audio_tagging_loss=0.007544, over 14701.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08923, pruned_loss=0.01268, audio_tagging_loss=0.009006, over 3040659.41 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:37:11,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2815220.0, ans=0.2 2023-11-24 11:37:15,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2815220.0, ans=0.1 2023-11-24 11:37:16,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.536e+01 9.281e+01 1.016e+02 1.384e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-24 11:37:21,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2815286.6666666665, ans=0.0 2023-11-24 11:37:28,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422300 2023-11-24 11:37:32,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2815286.6666666665, ans=0.04949747468305833 2023-11-24 11:37:38,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2815353.3333333335, ans=0.125 2023-11-24 11:37:42,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2815353.3333333335, ans=10.0 2023-11-24 11:38:11,598 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1500, loss[loss=0.04959, simple_loss=0.06834, pruned_loss=0.00849, audio_tagging_loss=0.006929, over 15920.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08919, pruned_loss=0.0126, audio_tagging_loss=0.008999, over 3048395.80 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:38:15,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2815553.3333333335, ans=0.125 2023-11-24 11:38:20,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2815553.3333333335, ans=0.0 2023-11-24 11:38:20,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815553.3333333335, ans=0.1 2023-11-24 11:38:31,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422350 2023-11-24 11:38:35,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-24 11:38:40,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2815686.6666666665, ans=0.2 2023-11-24 11:38:49,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2815753.3333333335, ans=0.125 2023-11-24 11:38:52,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-24 11:39:13,894 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1550, loss[loss=0.07174, simple_loss=0.08966, pruned_loss=0.01541, audio_tagging_loss=0.0115, over 16133.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09021, pruned_loss=0.01286, audio_tagging_loss=0.009069, over 3051578.24 frames. ], batch size: 61, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:39:19,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2815886.6666666665, ans=0.025 2023-11-24 11:39:20,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-24 11:39:20,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.494e+01 9.231e+01 9.838e+01 1.696e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 11:39:32,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422400 2023-11-24 11:39:34,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-24 11:39:53,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2816086.6666666665, ans=0.125 2023-11-24 11:39:57,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2816086.6666666665, ans=0.2 2023-11-24 11:40:02,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2816086.6666666665, ans=0.125 2023-11-24 11:40:10,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2816153.3333333335, ans=0.2 2023-11-24 11:40:15,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2816153.3333333335, ans=0.0 2023-11-24 11:40:17,309 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1600, loss[loss=0.06142, simple_loss=0.09214, pruned_loss=0.008022, audio_tagging_loss=0.007325, over 15525.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08967, pruned_loss=0.01283, audio_tagging_loss=0.009129, over 3049001.93 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:40:27,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2023-11-24 11:40:35,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422450 2023-11-24 11:41:06,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2816486.6666666665, ans=0.125 2023-11-24 11:41:18,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2816553.3333333335, ans=0.125 2023-11-24 11:41:18,985 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1650, loss[loss=0.05818, simple_loss=0.07129, pruned_loss=0.009606, audio_tagging_loss=0.01294, over 14660.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09078, pruned_loss=0.01304, audio_tagging_loss=0.009185, over 3053116.40 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:41:25,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.620e+01 9.407e+01 1.041e+02 1.351e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-24 11:41:37,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422500 2023-11-24 11:41:51,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2816686.6666666665, ans=0.0 2023-11-24 11:42:17,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2816820.0, ans=0.0 2023-11-24 11:42:20,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2816886.6666666665, ans=0.125 2023-11-24 11:42:20,964 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1700, loss[loss=0.06697, simple_loss=0.08206, pruned_loss=0.01066, audio_tagging_loss=0.01528, over 15349.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09058, pruned_loss=0.01294, audio_tagging_loss=0.00923, over 3049119.53 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:42:33,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2816953.3333333335, ans=0.0 2023-11-24 11:42:40,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422550 2023-11-24 11:42:40,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2816953.3333333335, ans=0.125 2023-11-24 11:42:55,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2817020.0, ans=0.125 2023-11-24 11:43:01,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2817086.6666666665, ans=0.07 2023-11-24 11:43:10,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2817153.3333333335, ans=0.125 2023-11-24 11:43:11,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2817153.3333333335, ans=0.125 2023-11-24 11:43:23,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.50 vs. limit=10.0 2023-11-24 11:43:24,388 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1750, loss[loss=0.06971, simple_loss=0.1007, pruned_loss=0.01261, audio_tagging_loss=0.006743, over 15328.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09189, pruned_loss=0.01315, audio_tagging_loss=0.009153, over 3047433.76 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:43:25,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2817220.0, ans=0.0 2023-11-24 11:43:29,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2817220.0, ans=0.0 2023-11-24 11:43:31,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.743e+01 9.239e+01 9.812e+01 1.237e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 11:43:35,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2817286.6666666665, ans=0.125 2023-11-24 11:43:42,394 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422600 2023-11-24 11:43:46,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2817286.6666666665, ans=0.2 2023-11-24 11:44:16,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2817486.6666666665, ans=0.125 2023-11-24 11:44:18,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2817486.6666666665, ans=0.125 2023-11-24 11:44:21,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2817486.6666666665, ans=0.125 2023-11-24 11:44:26,823 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1800, loss[loss=0.07087, simple_loss=0.0992, pruned_loss=0.009993, audio_tagging_loss=0.01127, over 14585.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.0915, pruned_loss=0.01307, audio_tagging_loss=0.009048, over 3048430.12 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:44:27,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2817553.3333333335, ans=0.07 2023-11-24 11:44:35,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2817553.3333333335, ans=0.125 2023-11-24 11:44:45,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422650 2023-11-24 11:44:46,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2817620.0, ans=0.0 2023-11-24 11:44:48,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-24 11:44:56,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2817686.6666666665, ans=0.0 2023-11-24 11:45:01,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-11-24 11:45:06,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-11-24 11:45:23,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2817820.0, ans=0.125 2023-11-24 11:45:28,827 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1850, loss[loss=0.0612, simple_loss=0.08214, pruned_loss=0.01305, audio_tagging_loss=0.007074, over 14898.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09147, pruned_loss=0.01309, audio_tagging_loss=0.008945, over 3045755.35 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:45:33,252 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:45:36,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.376e+01 8.960e+01 9.835e+01 1.450e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-24 11:45:41,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2817953.3333333335, ans=0.0 2023-11-24 11:45:48,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422700 2023-11-24 11:45:59,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2818020.0, ans=0.0 2023-11-24 11:46:00,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=22.5 2023-11-24 11:46:09,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2818086.6666666665, ans=0.125 2023-11-24 11:46:14,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=12.0 2023-11-24 11:46:26,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2818153.3333333335, ans=0.2 2023-11-24 11:46:29,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2818153.3333333335, ans=0.1 2023-11-24 11:46:31,721 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1900, loss[loss=0.06071, simple_loss=0.07612, pruned_loss=0.01261, audio_tagging_loss=0.01004, over 15843.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09116, pruned_loss=0.01297, audio_tagging_loss=0.008827, over 3054978.08 frames. ], batch size: 61, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 11:46:50,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422750 2023-11-24 11:46:52,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2818286.6666666665, ans=0.0 2023-11-24 11:47:01,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-11-24 11:47:02,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2818353.3333333335, ans=0.1 2023-11-24 11:47:31,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2818486.6666666665, ans=0.125 2023-11-24 11:47:32,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2818553.3333333335, ans=0.2 2023-11-24 11:47:33,685 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 1950, loss[loss=0.06255, simple_loss=0.08369, pruned_loss=0.008902, audio_tagging_loss=0.0118, over 16226.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.0909, pruned_loss=0.0129, audio_tagging_loss=0.008855, over 3054780.34 frames. ], batch size: 60, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 11:47:39,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2818553.3333333335, ans=0.07 2023-11-24 11:47:41,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.719e+01 9.311e+01 9.932e+01 1.598e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-24 11:47:51,887 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422800 2023-11-24 11:47:54,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2818620.0, ans=0.0 2023-11-24 11:47:56,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2818620.0, ans=0.125 2023-11-24 11:48:04,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2818686.6666666665, ans=0.125 2023-11-24 11:48:19,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2818753.3333333335, ans=0.125 2023-11-24 11:48:26,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2818820.0, ans=0.1 2023-11-24 11:48:34,866 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2000, loss[loss=0.08156, simple_loss=0.1019, pruned_loss=0.02194, audio_tagging_loss=0.008671, over 14754.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09023, pruned_loss=0.01294, audio_tagging_loss=0.008909, over 3046643.22 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:48:47,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-24 11:48:54,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422850 2023-11-24 11:49:12,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 11:49:25,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2819153.3333333335, ans=0.125 2023-11-24 11:49:38,132 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2050, loss[loss=0.07219, simple_loss=0.09356, pruned_loss=0.01505, audio_tagging_loss=0.01036, over 14387.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09115, pruned_loss=0.01306, audio_tagging_loss=0.008875, over 3046805.76 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:49:41,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819220.0, ans=0.1 2023-11-24 11:49:46,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 8.908e+01 9.390e+01 1.017e+02 1.320e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-24 11:49:56,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422900 2023-11-24 11:50:28,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819486.6666666665, ans=0.125 2023-11-24 11:50:29,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-24 11:50:30,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2819486.6666666665, ans=0.0 2023-11-24 11:50:34,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819486.6666666665, ans=0.1 2023-11-24 11:50:40,025 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2100, loss[loss=0.07031, simple_loss=0.09553, pruned_loss=0.01333, audio_tagging_loss=0.009216, over 14408.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09117, pruned_loss=0.01317, audio_tagging_loss=0.008897, over 3047002.74 frames. ], batch size: 53, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:50:41,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2023-11-24 11:50:46,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-24 11:50:53,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819620.0, ans=0.1 2023-11-24 11:50:58,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 422950 2023-11-24 11:51:03,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2819686.6666666665, ans=0.0 2023-11-24 11:51:17,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2023-11-24 11:51:19,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2819753.3333333335, ans=0.0 2023-11-24 11:51:19,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2819753.3333333335, ans=0.2 2023-11-24 11:51:19,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2819753.3333333335, ans=0.125 2023-11-24 11:51:24,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819753.3333333335, ans=0.1 2023-11-24 11:51:27,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819753.3333333335, ans=0.1 2023-11-24 11:51:35,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2819820.0, ans=0.125 2023-11-24 11:51:42,231 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2150, loss[loss=0.07447, simple_loss=0.08733, pruned_loss=0.01977, audio_tagging_loss=0.01103, over 14150.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09142, pruned_loss=0.01325, audio_tagging_loss=0.008917, over 3049449.93 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:51:49,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2819886.6666666665, ans=0.2 2023-11-24 11:51:51,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.463e+01 9.382e+01 1.013e+02 1.297e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-24 11:51:54,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2023-11-24 11:52:01,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423000 2023-11-24 11:52:18,811 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 11:52:19,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2820086.6666666665, ans=0.125 2023-11-24 11:52:22,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-24 11:52:45,868 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2200, loss[loss=0.0658, simple_loss=0.09279, pruned_loss=0.01008, audio_tagging_loss=0.009326, over 14693.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09097, pruned_loss=0.01316, audio_tagging_loss=0.008829, over 3055000.08 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:52:47,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2820220.0, ans=0.0 2023-11-24 11:53:03,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423050 2023-11-24 11:53:13,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2820353.3333333335, ans=0.07 2023-11-24 11:53:13,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2820353.3333333335, ans=0.125 2023-11-24 11:53:47,544 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2250, loss[loss=0.05864, simple_loss=0.06744, pruned_loss=0.01251, audio_tagging_loss=0.01241, over 16453.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.0912, pruned_loss=0.01319, audio_tagging_loss=0.008894, over 3050280.68 frames. ], batch size: 64, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:53:55,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.644e+01 9.254e+01 1.001e+02 1.291e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 11:53:59,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820620.0, ans=0.1 2023-11-24 11:54:01,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2820620.0, ans=0.0 2023-11-24 11:54:06,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423100 2023-11-24 11:54:21,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820686.6666666665, ans=0.1 2023-11-24 11:54:49,829 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2300, loss[loss=0.07102, simple_loss=0.09432, pruned_loss=0.01449, audio_tagging_loss=0.009371, over 15387.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.0916, pruned_loss=0.01335, audio_tagging_loss=0.008826, over 3048427.51 frames. ], batch size: 60, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:54:55,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=2820886.6666666665, ans=0.02 2023-11-24 11:54:59,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-24 11:55:09,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423150 2023-11-24 11:55:09,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2820953.3333333335, ans=0.125 2023-11-24 11:55:12,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2023-11-24 11:55:27,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2821086.6666666665, ans=0.125 2023-11-24 11:55:30,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821086.6666666665, ans=0.125 2023-11-24 11:55:31,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2821086.6666666665, ans=0.2 2023-11-24 11:55:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2821153.3333333335, ans=0.125 2023-11-24 11:55:45,404 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 11:55:52,930 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2350, loss[loss=0.07128, simple_loss=0.09528, pruned_loss=0.01431, audio_tagging_loss=0.009331, over 14360.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.0919, pruned_loss=0.01332, audio_tagging_loss=0.00898, over 3039475.90 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 11:55:58,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2821220.0, ans=0.2 2023-11-24 11:56:01,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.333e+01 8.924e+01 9.646e+01 1.536e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-24 11:56:09,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2023-11-24 11:56:11,493 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423200 2023-11-24 11:56:29,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-24 11:56:42,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2821486.6666666665, ans=0.0 2023-11-24 11:56:51,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2023-11-24 11:56:53,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2821486.6666666665, ans=0.0 2023-11-24 11:56:55,517 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2400, loss[loss=0.06126, simple_loss=0.08056, pruned_loss=0.01294, audio_tagging_loss=0.008046, over 14786.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.0916, pruned_loss=0.01321, audio_tagging_loss=0.008957, over 3046433.54 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:57:06,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821620.0, ans=0.125 2023-11-24 11:57:13,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423250 2023-11-24 11:57:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2821686.6666666665, ans=0.2 2023-11-24 11:57:29,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2821686.6666666665, ans=0.125 2023-11-24 11:57:50,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2821820.0, ans=0.125 2023-11-24 11:57:57,322 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2450, loss[loss=0.05599, simple_loss=0.07104, pruned_loss=0.00785, audio_tagging_loss=0.01262, over 14769.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.0914, pruned_loss=0.01302, audio_tagging_loss=0.009019, over 3052899.65 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:58:06,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.359e+01 8.901e+01 9.611e+01 1.267e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-24 11:58:16,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423300 2023-11-24 11:58:28,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2822020.0, ans=0.125 2023-11-24 11:59:00,582 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2500, loss[loss=0.06734, simple_loss=0.08958, pruned_loss=0.01524, audio_tagging_loss=0.007316, over 14495.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09179, pruned_loss=0.01314, audio_tagging_loss=0.008993, over 3053773.33 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 11:59:17,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2822286.6666666665, ans=0.0 2023-11-24 11:59:18,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423350 2023-11-24 11:59:19,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2822286.6666666665, ans=0.125 2023-11-24 11:59:41,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2822420.0, ans=0.1 2023-11-24 11:59:55,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2822486.6666666665, ans=0.1 2023-11-24 12:00:00,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2822486.6666666665, ans=0.025 2023-11-24 12:00:02,393 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2550, loss[loss=0.05683, simple_loss=0.06816, pruned_loss=0.01287, audio_tagging_loss=0.009879, over 14941.00 frames. ], tot_loss[loss=0.0686, simple_loss=0.09247, pruned_loss=0.0134, audio_tagging_loss=0.008961, over 3043614.32 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:00:07,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2822553.3333333335, ans=0.025 2023-11-24 12:00:10,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.497e+01 9.131e+01 9.713e+01 1.199e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 12:00:20,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423400 2023-11-24 12:00:32,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2822686.6666666665, ans=0.1 2023-11-24 12:00:39,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2023-11-24 12:00:50,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2822753.3333333335, ans=0.0 2023-11-24 12:01:04,229 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2600, loss[loss=0.06682, simple_loss=0.08971, pruned_loss=0.01346, audio_tagging_loss=0.008503, over 16490.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09205, pruned_loss=0.01333, audio_tagging_loss=0.008838, over 3043906.33 frames. ], batch size: 63, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:01:23,849 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423450 2023-11-24 12:01:25,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2822953.3333333335, ans=0.2 2023-11-24 12:01:48,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2823086.6666666665, ans=0.125 2023-11-24 12:01:55,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-11-24 12:02:02,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2823153.3333333335, ans=0.07 2023-11-24 12:02:07,932 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2650, loss[loss=0.07363, simple_loss=0.09374, pruned_loss=0.01775, audio_tagging_loss=0.009019, over 15435.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09214, pruned_loss=0.01323, audio_tagging_loss=0.008799, over 3047946.89 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:02:09,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2823220.0, ans=0.0 2023-11-24 12:02:11,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2023-11-24 12:02:13,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2823220.0, ans=0.0 2023-11-24 12:02:15,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2823220.0, ans=0.0 2023-11-24 12:02:18,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.376e+01 8.976e+01 9.750e+01 1.206e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-24 12:02:24,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-24 12:02:27,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423500 2023-11-24 12:02:54,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2823420.0, ans=0.05 2023-11-24 12:03:02,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2823486.6666666665, ans=0.125 2023-11-24 12:03:03,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2823486.6666666665, ans=0.0 2023-11-24 12:03:10,459 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2700, loss[loss=0.05645, simple_loss=0.07771, pruned_loss=0.009746, audio_tagging_loss=0.007845, over 14863.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09124, pruned_loss=0.01305, audio_tagging_loss=0.008794, over 3049422.98 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:03:27,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2823620.0, ans=0.0 2023-11-24 12:03:28,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423550 2023-11-24 12:03:39,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2823686.6666666665, ans=0.0 2023-11-24 12:04:11,623 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2750, loss[loss=0.07471, simple_loss=0.1019, pruned_loss=0.01286, audio_tagging_loss=0.01088, over 16595.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09005, pruned_loss=0.01294, audio_tagging_loss=0.008848, over 3049871.37 frames. ], batch size: 63, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:04:20,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.328e+01 8.931e+01 9.724e+01 1.117e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 12:04:22,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823953.3333333335, ans=0.1 2023-11-24 12:04:23,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2823953.3333333335, ans=0.125 2023-11-24 12:04:30,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423600 2023-11-24 12:04:59,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2824086.6666666665, ans=0.2 2023-11-24 12:05:03,804 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:05:10,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2824153.3333333335, ans=0.125 2023-11-24 12:05:13,995 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2800, loss[loss=0.06434, simple_loss=0.0909, pruned_loss=0.01038, audio_tagging_loss=0.008507, over 15268.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08978, pruned_loss=0.01293, audio_tagging_loss=0.008783, over 3052099.63 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:05:14,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-24 12:05:24,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-24 12:05:29,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2824286.6666666665, ans=0.125 2023-11-24 12:05:33,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423650 2023-11-24 12:05:39,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2824353.3333333335, ans=0.125 2023-11-24 12:05:53,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2824420.0, ans=0.2 2023-11-24 12:06:00,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824420.0, ans=0.1 2023-11-24 12:06:17,161 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2850, loss[loss=0.06192, simple_loss=0.0826, pruned_loss=0.01096, audio_tagging_loss=0.009652, over 15371.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08972, pruned_loss=0.01305, audio_tagging_loss=0.008749, over 3050020.94 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:06:17,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2824553.3333333335, ans=0.125 2023-11-24 12:06:18,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2824553.3333333335, ans=0.0 2023-11-24 12:06:26,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.601e+01 9.263e+01 1.001e+02 1.378e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 12:06:35,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423700 2023-11-24 12:06:42,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824686.6666666665, ans=0.1 2023-11-24 12:06:43,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2824686.6666666665, ans=0.04949747468305833 2023-11-24 12:06:44,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2824686.6666666665, ans=0.125 2023-11-24 12:06:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2824753.3333333335, ans=0.125 2023-11-24 12:06:57,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2824753.3333333335, ans=0.125 2023-11-24 12:07:05,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2023-11-24 12:07:13,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2824820.0, ans=0.125 2023-11-24 12:07:18,754 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2900, loss[loss=0.08339, simple_loss=0.1136, pruned_loss=0.01934, audio_tagging_loss=0.007248, over 15123.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08989, pruned_loss=0.01307, audio_tagging_loss=0.00875, over 3042606.86 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:07:19,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2824886.6666666665, ans=0.2 2023-11-24 12:07:37,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423750 2023-11-24 12:07:41,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2824953.3333333335, ans=0.1 2023-11-24 12:07:58,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2825086.6666666665, ans=0.2 2023-11-24 12:08:02,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-24 12:08:04,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2825086.6666666665, ans=0.0 2023-11-24 12:08:08,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2825153.3333333335, ans=0.2 2023-11-24 12:08:18,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2825153.3333333335, ans=0.125 2023-11-24 12:08:21,112 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 2950, loss[loss=0.06036, simple_loss=0.08298, pruned_loss=0.01019, audio_tagging_loss=0.008678, over 14871.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09062, pruned_loss=0.01298, audio_tagging_loss=0.00878, over 3045896.12 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:08:31,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.518e+01 9.259e+01 1.017e+02 1.246e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 12:08:35,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2825286.6666666665, ans=0.2 2023-11-24 12:08:41,117 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423800 2023-11-24 12:08:49,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2825353.3333333335, ans=10.0 2023-11-24 12:08:53,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2023-11-24 12:08:59,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-24 12:09:06,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825420.0, ans=0.1 2023-11-24 12:09:07,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2825420.0, ans=0.125 2023-11-24 12:09:20,834 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:09:24,616 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3000, loss[loss=0.05119, simple_loss=0.05347, pruned_loss=0.007797, audio_tagging_loss=0.01666, over 14972.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0901, pruned_loss=0.013, audio_tagging_loss=0.008905, over 3054448.41 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:09:24,617 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 12:09:57,220 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7763, 5.7891, 5.8877, 5.8606], device='cuda:1') 2023-11-24 12:10:06,929 INFO [train_asr.py:1253] (1/4) Epoch 36, validation: loss=0.05726, simple_loss=0.05083, pruned_loss=0.005098, audio_tagging_loss=0.02675, over 4681554.00 frames. 2023-11-24 12:10:06,930 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 12:10:26,566 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423850 2023-11-24 12:10:44,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-24 12:11:04,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2825820.0, ans=0.125 2023-11-24 12:11:10,536 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3050, loss[loss=0.07409, simple_loss=0.1083, pruned_loss=0.01128, audio_tagging_loss=0.00864, over 16382.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09198, pruned_loss=0.01331, audio_tagging_loss=0.008861, over 3057389.56 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:11:20,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.750e+01 9.306e+01 1.016e+02 1.344e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-24 12:11:27,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2825953.3333333335, ans=0.07 2023-11-24 12:11:29,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423900 2023-11-24 12:11:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2825953.3333333335, ans=15.0 2023-11-24 12:11:37,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2826020.0, ans=0.125 2023-11-24 12:11:46,854 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:12:06,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826153.3333333335, ans=0.1 2023-11-24 12:12:07,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2826153.3333333335, ans=0.2 2023-11-24 12:12:13,622 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3100, loss[loss=0.07598, simple_loss=0.1033, pruned_loss=0.01647, audio_tagging_loss=0.007891, over 16356.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09181, pruned_loss=0.0132, audio_tagging_loss=0.008972, over 3057527.12 frames. ], batch size: 62, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:12:16,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:12:17,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2826220.0, ans=0.2 2023-11-24 12:12:32,256 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 423950 2023-11-24 12:12:44,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2826353.3333333335, ans=0.1 2023-11-24 12:12:50,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2826420.0, ans=0.09899494936611666 2023-11-24 12:13:08,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2826486.6666666665, ans=0.0 2023-11-24 12:13:16,559 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3150, loss[loss=0.05259, simple_loss=0.06255, pruned_loss=0.01128, audio_tagging_loss=0.01003, over 15521.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09228, pruned_loss=0.0133, audio_tagging_loss=0.008979, over 3055560.61 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:13:25,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.600e+01 9.080e+01 9.861e+01 1.236e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 12:13:29,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-11-24 12:13:34,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424000 2023-11-24 12:14:15,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2826820.0, ans=0.0 2023-11-24 12:14:16,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2826820.0, ans=0.05 2023-11-24 12:14:21,318 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3200, loss[loss=0.06217, simple_loss=0.08603, pruned_loss=0.00859, audio_tagging_loss=0.01057, over 13662.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09248, pruned_loss=0.01324, audio_tagging_loss=0.009097, over 3052095.67 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:14:32,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2826886.6666666665, ans=0.0 2023-11-24 12:14:39,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826953.3333333335, ans=0.1 2023-11-24 12:14:41,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424050 2023-11-24 12:14:49,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2023-11-24 12:15:00,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2827086.6666666665, ans=0.0 2023-11-24 12:15:02,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827086.6666666665, ans=0.125 2023-11-24 12:15:07,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2827086.6666666665, ans=0.2 2023-11-24 12:15:11,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2827153.3333333335, ans=0.0 2023-11-24 12:15:17,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-24 12:15:25,380 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3250, loss[loss=0.06067, simple_loss=0.07829, pruned_loss=0.01033, audio_tagging_loss=0.0112, over 14520.00 frames. ], tot_loss[loss=0.0682, simple_loss=0.09174, pruned_loss=0.0132, audio_tagging_loss=0.009128, over 3044127.39 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:15:34,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.544e+01 9.313e+01 9.948e+01 1.268e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 12:15:43,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424100 2023-11-24 12:15:47,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-24 12:16:04,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2827420.0, ans=0.125 2023-11-24 12:16:05,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2827420.0, ans=0.125 2023-11-24 12:16:16,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:16:27,549 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3300, loss[loss=0.06316, simple_loss=0.08225, pruned_loss=0.01366, audio_tagging_loss=0.008372, over 15295.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09183, pruned_loss=0.01317, audio_tagging_loss=0.009218, over 3041767.55 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:16:27,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2827553.3333333335, ans=0.125 2023-11-24 12:16:30,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827553.3333333335, ans=0.125 2023-11-24 12:16:45,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424150 2023-11-24 12:16:48,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2827620.0, ans=0.125 2023-11-24 12:17:27,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2827820.0, ans=0.0 2023-11-24 12:17:29,994 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3350, loss[loss=0.08785, simple_loss=0.1103, pruned_loss=0.02235, audio_tagging_loss=0.01034, over 15284.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09083, pruned_loss=0.01304, audio_tagging_loss=0.009221, over 3035338.87 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:17:39,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2827886.6666666665, ans=0.125 2023-11-24 12:17:40,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.571e+01 9.014e+01 9.710e+01 1.291e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 12:17:40,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2827886.6666666665, ans=0.125 2023-11-24 12:17:49,605 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424200 2023-11-24 12:17:56,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2023-11-24 12:18:08,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2828086.6666666665, ans=0.1 2023-11-24 12:18:13,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2828086.6666666665, ans=0.0 2023-11-24 12:18:13,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=15.0 2023-11-24 12:18:14,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2828086.6666666665, ans=0.125 2023-11-24 12:18:23,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2828153.3333333335, ans=0.0 2023-11-24 12:18:33,254 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3400, loss[loss=0.08862, simple_loss=0.1197, pruned_loss=0.02076, audio_tagging_loss=0.007991, over 16025.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09083, pruned_loss=0.01299, audio_tagging_loss=0.009067, over 3045027.06 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 32.0 2023-11-24 12:18:35,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2828220.0, ans=0.0 2023-11-24 12:18:47,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2828286.6666666665, ans=0.125 2023-11-24 12:18:49,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2828286.6666666665, ans=0.125 2023-11-24 12:18:51,917 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424250 2023-11-24 12:19:03,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-11-24 12:19:15,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2828420.0, ans=0.0 2023-11-24 12:19:15,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2828420.0, ans=0.2 2023-11-24 12:19:35,821 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3450, loss[loss=0.0521, simple_loss=0.07292, pruned_loss=0.008689, audio_tagging_loss=0.006954, over 14304.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09017, pruned_loss=0.0129, audio_tagging_loss=0.008955, over 3037274.32 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:19:47,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.469e+01 9.145e+01 9.844e+01 1.227e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 12:19:50,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.30 vs. limit=5.0 2023-11-24 12:19:54,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424300 2023-11-24 12:19:54,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-24 12:20:14,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2828753.3333333335, ans=0.125 2023-11-24 12:20:36,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2828820.0, ans=0.125 2023-11-24 12:20:38,440 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3500, loss[loss=0.08976, simple_loss=0.1196, pruned_loss=0.02221, audio_tagging_loss=0.007762, over 15218.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08901, pruned_loss=0.01289, audio_tagging_loss=0.008962, over 3038073.85 frames. ], batch size: 54, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:20:58,016 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424350 2023-11-24 12:21:01,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2828953.3333333335, ans=0.1 2023-11-24 12:21:11,071 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:21:14,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2829086.6666666665, ans=0.0 2023-11-24 12:21:23,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2829086.6666666665, ans=0.125 2023-11-24 12:21:30,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2829153.3333333335, ans=0.1 2023-11-24 12:21:35,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2829153.3333333335, ans=0.0 2023-11-24 12:21:41,487 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3550, loss[loss=0.06953, simple_loss=0.0987, pruned_loss=0.01444, audio_tagging_loss=0.005742, over 14645.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08969, pruned_loss=0.01297, audio_tagging_loss=0.008859, over 3041512.67 frames. ], batch size: 53, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:21:47,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-24 12:21:52,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.449e+01 8.981e+01 9.607e+01 1.173e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-24 12:21:58,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2829286.6666666665, ans=0.0 2023-11-24 12:21:59,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2829286.6666666665, ans=0.125 2023-11-24 12:22:00,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424400 2023-11-24 12:22:30,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=15.0 2023-11-24 12:22:35,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2829486.6666666665, ans=0.2 2023-11-24 12:22:36,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2829486.6666666665, ans=0.0 2023-11-24 12:22:44,313 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3600, loss[loss=0.06542, simple_loss=0.08509, pruned_loss=0.01334, audio_tagging_loss=0.009541, over 15320.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09, pruned_loss=0.01297, audio_tagging_loss=0.008822, over 3042417.08 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:22:56,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2829620.0, ans=0.0 2023-11-24 12:22:57,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2829620.0, ans=0.1 2023-11-24 12:23:03,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424450 2023-11-24 12:23:04,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2829620.0, ans=0.2 2023-11-24 12:23:42,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2023-11-24 12:23:46,401 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3650, loss[loss=0.08262, simple_loss=0.1116, pruned_loss=0.01656, audio_tagging_loss=0.01027, over 15142.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09084, pruned_loss=0.01311, audio_tagging_loss=0.008782, over 3040424.06 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:23:50,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2829886.6666666665, ans=0.125 2023-11-24 12:23:54,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2829886.6666666665, ans=0.0 2023-11-24 12:24:00,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.304e+01 8.721e+01 9.501e+01 1.428e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-24 12:24:02,614 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:24:06,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424500 2023-11-24 12:24:06,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2829953.3333333335, ans=0.0 2023-11-24 12:24:09,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-24 12:24:17,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2830020.0, ans=0.0 2023-11-24 12:24:24,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2830086.6666666665, ans=0.0 2023-11-24 12:24:40,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830153.3333333335, ans=0.1 2023-11-24 12:24:49,728 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3700, loss[loss=0.07121, simple_loss=0.0924, pruned_loss=0.01589, audio_tagging_loss=0.009116, over 15766.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09172, pruned_loss=0.01317, audio_tagging_loss=0.008773, over 3040216.28 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:25:03,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2830286.6666666665, ans=0.07 2023-11-24 12:25:08,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424550 2023-11-24 12:25:09,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2830286.6666666665, ans=0.125 2023-11-24 12:25:22,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2830353.3333333335, ans=0.125 2023-11-24 12:25:37,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2830420.0, ans=0.05 2023-11-24 12:25:39,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2830486.6666666665, ans=0.0 2023-11-24 12:25:51,802 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3750, loss[loss=0.07003, simple_loss=0.09298, pruned_loss=0.01357, audio_tagging_loss=0.009972, over 17505.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09194, pruned_loss=0.01324, audio_tagging_loss=0.008848, over 3051102.84 frames. ], batch size: 63, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:26:03,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.897e+01 9.293e+01 9.941e+01 1.332e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-24 12:26:09,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424600 2023-11-24 12:26:35,385 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:26:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2830753.3333333335, ans=0.0 2023-11-24 12:26:43,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2830820.0, ans=0.125 2023-11-24 12:26:43,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2830820.0, ans=0.125 2023-11-24 12:26:53,211 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3800, loss[loss=0.07683, simple_loss=0.1076, pruned_loss=0.01521, audio_tagging_loss=0.007804, over 15471.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09218, pruned_loss=0.01335, audio_tagging_loss=0.008908, over 3057572.88 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:27:12,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424650 2023-11-24 12:27:30,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2831086.6666666665, ans=0.125 2023-11-24 12:27:32,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2831086.6666666665, ans=0.0 2023-11-24 12:27:43,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2831153.3333333335, ans=0.0 2023-11-24 12:27:47,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2831153.3333333335, ans=0.0 2023-11-24 12:27:53,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2831153.3333333335, ans=0.125 2023-11-24 12:27:56,861 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3850, loss[loss=0.08356, simple_loss=0.1189, pruned_loss=0.01734, audio_tagging_loss=0.006761, over 16286.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.0918, pruned_loss=0.01298, audio_tagging_loss=0.008916, over 3056856.98 frames. ], batch size: 57, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:28:00,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2831220.0, ans=0.1 2023-11-24 12:28:06,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2831220.0, ans=0.2 2023-11-24 12:28:09,280 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.465e+01 9.054e+01 9.529e+01 1.182e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-24 12:28:15,432 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424700 2023-11-24 12:28:24,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2831353.3333333335, ans=0.125 2023-11-24 12:28:32,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2023-11-24 12:28:45,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2831486.6666666665, ans=0.0 2023-11-24 12:28:58,697 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3900, loss[loss=0.06753, simple_loss=0.09156, pruned_loss=0.01283, audio_tagging_loss=0.008922, over 15218.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09105, pruned_loss=0.01292, audio_tagging_loss=0.008987, over 3049813.85 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:29:05,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2831553.3333333335, ans=0.0 2023-11-24 12:29:07,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-11-24 12:29:16,383 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424750 2023-11-24 12:29:42,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-24 12:30:00,517 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 3950, loss[loss=0.08524, simple_loss=0.1197, pruned_loss=0.01957, audio_tagging_loss=0.005831, over 16027.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09117, pruned_loss=0.01294, audio_tagging_loss=0.009004, over 3050352.93 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:30:14,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.514e+01 9.052e+01 9.990e+01 1.654e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 12:30:20,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424800 2023-11-24 12:30:25,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2832020.0, ans=0.0 2023-11-24 12:30:59,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2832153.3333333335, ans=0.125 2023-11-24 12:31:01,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2832153.3333333335, ans=0.125 2023-11-24 12:31:02,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2832220.0, ans=0.125 2023-11-24 12:31:03,779 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4000, loss[loss=0.07062, simple_loss=0.08496, pruned_loss=0.01579, audio_tagging_loss=0.01234, over 14363.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09147, pruned_loss=0.01313, audio_tagging_loss=0.00909, over 3049733.68 frames. ], batch size: 58, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:31:05,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2832220.0, ans=0.125 2023-11-24 12:31:10,661 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:31:20,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2832286.6666666665, ans=0.0 2023-11-24 12:31:23,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424850 2023-11-24 12:31:24,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-24 12:31:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2832353.3333333335, ans=0.125 2023-11-24 12:31:42,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832420.0, ans=0.1 2023-11-24 12:31:44,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2023-11-24 12:32:07,615 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4050, loss[loss=0.08221, simple_loss=0.1138, pruned_loss=0.01634, audio_tagging_loss=0.00898, over 14074.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09167, pruned_loss=0.01315, audio_tagging_loss=0.009089, over 3056947.36 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:32:11,193 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:32:13,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2832553.3333333335, ans=0.125 2023-11-24 12:32:21,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.871e+01 9.409e+01 1.005e+02 1.323e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-24 12:32:25,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424900 2023-11-24 12:32:26,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2832620.0, ans=0.2 2023-11-24 12:32:30,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2832686.6666666665, ans=0.0 2023-11-24 12:32:39,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2832686.6666666665, ans=0.125 2023-11-24 12:32:56,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2832820.0, ans=0.125 2023-11-24 12:33:09,314 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4100, loss[loss=0.04302, simple_loss=0.06094, pruned_loss=0.00531, audio_tagging_loss=0.007236, over 15167.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09255, pruned_loss=0.0132, audio_tagging_loss=0.009023, over 3056196.60 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:33:12,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-24 12:33:17,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2023-11-24 12:33:18,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2832886.6666666665, ans=0.125 2023-11-24 12:33:19,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832886.6666666665, ans=0.1 2023-11-24 12:33:27,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2832953.3333333335, ans=0.125 2023-11-24 12:33:28,337 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 424950 2023-11-24 12:33:49,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2833086.6666666665, ans=0.05 2023-11-24 12:34:12,398 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4150, loss[loss=0.0744, simple_loss=0.1043, pruned_loss=0.01604, audio_tagging_loss=0.006214, over 15219.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09293, pruned_loss=0.01339, audio_tagging_loss=0.00889, over 3060959.58 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:34:12,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2833220.0, ans=0.0 2023-11-24 12:34:27,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.532e+01 9.065e+01 9.755e+01 1.245e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-24 12:34:31,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425000 2023-11-24 12:34:32,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2833286.6666666665, ans=0.125 2023-11-24 12:34:46,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2833353.3333333335, ans=0.025 2023-11-24 12:34:54,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2833420.0, ans=0.0 2023-11-24 12:34:57,028 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:35:03,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2833486.6666666665, ans=0.125 2023-11-24 12:35:14,959 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4200, loss[loss=0.0493, simple_loss=0.06472, pruned_loss=0.006514, audio_tagging_loss=0.01043, over 14686.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09227, pruned_loss=0.01326, audio_tagging_loss=0.008689, over 3056512.75 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:35:27,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2833620.0, ans=0.0 2023-11-24 12:35:34,451 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425050 2023-11-24 12:36:08,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2833820.0, ans=0.1 2023-11-24 12:36:18,757 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4250, loss[loss=0.06875, simple_loss=0.09431, pruned_loss=0.01137, audio_tagging_loss=0.01023, over 15964.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09275, pruned_loss=0.01326, audio_tagging_loss=0.008693, over 3064280.99 frames. ], batch size: 59, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:36:32,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.749e+01 9.313e+01 9.960e+01 2.008e+02, threshold=1.863e+02, percent-clipped=1.0 2023-11-24 12:36:34,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2023-11-24 12:36:37,218 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425100 2023-11-24 12:36:45,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-24 12:36:46,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2834020.0, ans=0.125 2023-11-24 12:37:00,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2834086.6666666665, ans=0.0 2023-11-24 12:37:07,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=10.0 2023-11-24 12:37:20,820 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4300, loss[loss=0.08268, simple_loss=0.1135, pruned_loss=0.01772, audio_tagging_loss=0.008211, over 15508.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09371, pruned_loss=0.01352, audio_tagging_loss=0.008609, over 3057745.80 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:37:40,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425150 2023-11-24 12:38:01,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2834420.0, ans=0.125 2023-11-24 12:38:04,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2834420.0, ans=0.1 2023-11-24 12:38:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2834486.6666666665, ans=0.0 2023-11-24 12:38:24,199 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4350, loss[loss=0.07739, simple_loss=0.1023, pruned_loss=0.01489, audio_tagging_loss=0.01133, over 14903.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09281, pruned_loss=0.01332, audio_tagging_loss=0.008765, over 3058511.20 frames. ], batch size: 55, lr: 1.89e-03, grad_scale: 8.0 2023-11-24 12:38:26,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2834553.3333333335, ans=0.05 2023-11-24 12:38:36,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2834620.0, ans=0.125 2023-11-24 12:38:39,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.655e+01 9.322e+01 1.008e+02 1.169e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-24 12:38:43,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425200 2023-11-24 12:39:23,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-24 12:39:27,396 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4400, loss[loss=0.05057, simple_loss=0.0611, pruned_loss=0.009276, audio_tagging_loss=0.01074, over 14264.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09135, pruned_loss=0.01304, audio_tagging_loss=0.008908, over 3061013.15 frames. ], batch size: 56, lr: 1.89e-03, grad_scale: 16.0 2023-11-24 12:39:45,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425250 2023-11-24 12:40:02,599 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:40:11,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835086.6666666665, ans=0.1 2023-11-24 12:40:29,271 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4450, loss[loss=0.0696, simple_loss=0.09753, pruned_loss=0.01376, audio_tagging_loss=0.007077, over 14745.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09219, pruned_loss=0.01331, audio_tagging_loss=0.008838, over 3063062.24 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:40:37,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2835220.0, ans=0.125 2023-11-24 12:40:44,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.586e+01 9.357e+01 9.969e+01 1.625e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-24 12:40:46,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2835286.6666666665, ans=0.125 2023-11-24 12:40:48,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425300 2023-11-24 12:41:14,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2835420.0, ans=0.0 2023-11-24 12:41:23,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835486.6666666665, ans=0.125 2023-11-24 12:41:32,469 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4500, loss[loss=0.05375, simple_loss=0.06596, pruned_loss=0.008429, audio_tagging_loss=0.01234, over 14748.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09106, pruned_loss=0.01313, audio_tagging_loss=0.008808, over 3063503.68 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:41:34,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2835553.3333333335, ans=0.0 2023-11-24 12:41:48,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=10.0 2023-11-24 12:41:48,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-24 12:41:51,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425350 2023-11-24 12:42:09,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-24 12:42:35,598 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4550, loss[loss=0.04859, simple_loss=0.05705, pruned_loss=0.008607, audio_tagging_loss=0.01146, over 14874.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09099, pruned_loss=0.01303, audio_tagging_loss=0.008784, over 3057355.42 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:42:50,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.325e+01 9.085e+01 9.707e+01 1.236e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 12:42:53,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835953.3333333335, ans=0.125 2023-11-24 12:42:54,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425400 2023-11-24 12:42:56,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2835953.3333333335, ans=0.125 2023-11-24 12:42:58,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2835953.3333333335, ans=0.0 2023-11-24 12:43:23,424 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 12:43:27,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2836153.3333333335, ans=0.125 2023-11-24 12:43:38,250 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4600, loss[loss=0.0774, simple_loss=0.1089, pruned_loss=0.01373, audio_tagging_loss=0.009214, over 17034.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08985, pruned_loss=0.01283, audio_tagging_loss=0.008859, over 3054578.07 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:43:53,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2836286.6666666665, ans=0.125 2023-11-24 12:43:57,393 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425450 2023-11-24 12:44:20,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-11-24 12:44:34,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836486.6666666665, ans=0.1 2023-11-24 12:44:41,245 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4650, loss[loss=0.06498, simple_loss=0.07767, pruned_loss=0.01566, audio_tagging_loss=0.01049, over 14630.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08949, pruned_loss=0.01295, audio_tagging_loss=0.008946, over 3051371.08 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:44:43,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2836553.3333333335, ans=0.125 2023-11-24 12:44:55,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.503e+01 9.255e+01 1.001e+02 1.285e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 12:44:59,685 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425500 2023-11-24 12:45:01,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2836620.0, ans=0.125 2023-11-24 12:45:02,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2836620.0, ans=0.0 2023-11-24 12:45:17,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2836753.3333333335, ans=0.0 2023-11-24 12:45:20,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2836753.3333333335, ans=0.0 2023-11-24 12:45:32,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2836820.0, ans=0.0 2023-11-24 12:45:43,855 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4700, loss[loss=0.04652, simple_loss=0.06284, pruned_loss=0.004629, audio_tagging_loss=0.01047, over 14689.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09003, pruned_loss=0.01319, audio_tagging_loss=0.00908, over 3053947.06 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:45:51,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836886.6666666665, ans=0.1 2023-11-24 12:45:55,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-11-24 12:45:59,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2836953.3333333335, ans=0.0 2023-11-24 12:46:02,026 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425550 2023-11-24 12:46:27,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2837086.6666666665, ans=0.0 2023-11-24 12:46:28,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2837086.6666666665, ans=0.125 2023-11-24 12:46:45,790 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4750, loss[loss=0.05297, simple_loss=0.06412, pruned_loss=0.01058, audio_tagging_loss=0.01034, over 14243.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08946, pruned_loss=0.01294, audio_tagging_loss=0.009092, over 3050290.03 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:46:59,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2837286.6666666665, ans=0.125 2023-11-24 12:47:01,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.647e+01 9.446e+01 1.032e+02 1.298e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-24 12:47:05,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425600 2023-11-24 12:47:07,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-24 12:47:34,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2837420.0, ans=0.07 2023-11-24 12:47:49,898 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4800, loss[loss=0.05729, simple_loss=0.07002, pruned_loss=0.0101, audio_tagging_loss=0.01218, over 14210.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0889, pruned_loss=0.01287, audio_tagging_loss=0.009249, over 3054245.86 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:48:09,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425650 2023-11-24 12:48:26,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2837753.3333333335, ans=0.125 2023-11-24 12:48:27,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837753.3333333335, ans=0.1 2023-11-24 12:48:45,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2837820.0, ans=0.125 2023-11-24 12:48:54,397 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4850, loss[loss=0.0669, simple_loss=0.09343, pruned_loss=0.01101, audio_tagging_loss=0.009171, over 14844.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08876, pruned_loss=0.01282, audio_tagging_loss=0.009421, over 3050216.57 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:49:04,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-24 12:49:08,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.570e+01 9.281e+01 9.821e+01 1.175e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-24 12:49:08,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2837953.3333333335, ans=0.0 2023-11-24 12:49:12,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425700 2023-11-24 12:49:30,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2838086.6666666665, ans=0.0 2023-11-24 12:49:40,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2838086.6666666665, ans=0.015 2023-11-24 12:49:43,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2838153.3333333335, ans=0.125 2023-11-24 12:49:51,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2838153.3333333335, ans=0.0 2023-11-24 12:49:56,086 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4900, loss[loss=0.08096, simple_loss=0.1135, pruned_loss=0.01718, audio_tagging_loss=0.007037, over 16054.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08894, pruned_loss=0.01298, audio_tagging_loss=0.009415, over 3035112.72 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:50:00,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2838220.0, ans=0.125 2023-11-24 12:50:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2838220.0, ans=0.125 2023-11-24 12:50:15,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425750 2023-11-24 12:50:19,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2838286.6666666665, ans=0.125 2023-11-24 12:50:19,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2838286.6666666665, ans=0.0 2023-11-24 12:50:22,966 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 12:50:27,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2838353.3333333335, ans=0.1 2023-11-24 12:50:41,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2838420.0, ans=0.125 2023-11-24 12:50:45,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2838486.6666666665, ans=0.0 2023-11-24 12:50:58,318 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 4950, loss[loss=0.08648, simple_loss=0.1173, pruned_loss=0.02123, audio_tagging_loss=0.00659, over 15089.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.08978, pruned_loss=0.01309, audio_tagging_loss=0.009186, over 3036421.53 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:51:05,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2838553.3333333335, ans=0.125 2023-11-24 12:51:14,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.587e+01 9.264e+01 9.833e+01 1.255e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 12:51:16,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2838620.0, ans=0.2 2023-11-24 12:51:16,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2838620.0, ans=0.125 2023-11-24 12:51:17,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425800 2023-11-24 12:51:47,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-24 12:52:02,002 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5000, loss[loss=0.04013, simple_loss=0.04683, pruned_loss=0.006888, audio_tagging_loss=0.009828, over 15392.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09003, pruned_loss=0.01315, audio_tagging_loss=0.008983, over 3036479.68 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:52:19,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425850 2023-11-24 12:52:24,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2839020.0, ans=0.09899494936611666 2023-11-24 12:52:58,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2839153.3333333335, ans=0.035 2023-11-24 12:52:59,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=12.0 2023-11-24 12:53:03,492 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5050, loss[loss=0.09273, simple_loss=0.1264, pruned_loss=0.02339, audio_tagging_loss=0.006152, over 14636.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09097, pruned_loss=0.01315, audio_tagging_loss=0.008792, over 3041804.51 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:53:12,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2839220.0, ans=0.0 2023-11-24 12:53:17,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.266e+01 8.922e+01 9.681e+01 1.367e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-24 12:53:22,174 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425900 2023-11-24 12:53:39,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-11-24 12:54:03,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2839486.6666666665, ans=0.125 2023-11-24 12:54:03,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-24 12:54:06,742 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5100, loss[loss=0.05874, simple_loss=0.07508, pruned_loss=0.009064, audio_tagging_loss=0.01214, over 16634.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09055, pruned_loss=0.01306, audio_tagging_loss=0.008751, over 3048397.40 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:54:11,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-11-24 12:54:22,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2839620.0, ans=0.0 2023-11-24 12:54:26,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 425950 2023-11-24 12:54:51,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2839753.3333333335, ans=0.125 2023-11-24 12:55:11,204 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5150, loss[loss=0.08596, simple_loss=0.1224, pruned_loss=0.01992, audio_tagging_loss=0.004859, over 15718.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0898, pruned_loss=0.01284, audio_tagging_loss=0.008771, over 3045192.41 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:55:16,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2839886.6666666665, ans=0.0 2023-11-24 12:55:26,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.369e+01 9.000e+01 9.837e+01 1.217e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-24 12:55:29,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426000 2023-11-24 12:55:53,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=12.0 2023-11-24 12:56:14,382 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5200, loss[loss=0.08004, simple_loss=0.1094, pruned_loss=0.01759, audio_tagging_loss=0.007773, over 14731.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09008, pruned_loss=0.01278, audio_tagging_loss=0.008851, over 3038481.15 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 12:56:15,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2840220.0, ans=0.0 2023-11-24 12:56:19,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2840220.0, ans=0.125 2023-11-24 12:56:19,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-11-24 12:56:32,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426050 2023-11-24 12:56:34,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840286.6666666665, ans=0.1 2023-11-24 12:56:40,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2840353.3333333335, ans=0.125 2023-11-24 12:57:03,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2840486.6666666665, ans=0.0 2023-11-24 12:57:03,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2840486.6666666665, ans=0.0 2023-11-24 12:57:07,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2840486.6666666665, ans=0.0 2023-11-24 12:57:09,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2840486.6666666665, ans=0.2 2023-11-24 12:57:12,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2840486.6666666665, ans=0.0 2023-11-24 12:57:15,459 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5250, loss[loss=0.06079, simple_loss=0.07911, pruned_loss=0.01121, audio_tagging_loss=0.01002, over 15731.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09214, pruned_loss=0.01324, audio_tagging_loss=0.008724, over 3043825.94 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:57:32,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.528e+01 9.139e+01 1.004e+02 1.210e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 12:57:35,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426100 2023-11-24 12:57:35,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2840620.0, ans=0.125 2023-11-24 12:57:57,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2840753.3333333335, ans=0.125 2023-11-24 12:58:19,032 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5300, loss[loss=0.06252, simple_loss=0.08139, pruned_loss=0.01333, audio_tagging_loss=0.008493, over 15194.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09214, pruned_loss=0.01316, audio_tagging_loss=0.008743, over 3044380.68 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:58:19,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2840886.6666666665, ans=0.125 2023-11-24 12:58:32,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2840953.3333333335, ans=0.0 2023-11-24 12:58:38,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426150 2023-11-24 12:58:39,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2840953.3333333335, ans=0.0 2023-11-24 12:58:43,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-24 12:58:44,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841020.0, ans=0.1 2023-11-24 12:58:49,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2841020.0, ans=0.125 2023-11-24 12:59:01,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-24 12:59:05,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2841086.6666666665, ans=0.125 2023-11-24 12:59:12,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2841153.3333333335, ans=0.0 2023-11-24 12:59:14,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2841153.3333333335, ans=0.125 2023-11-24 12:59:22,041 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5350, loss[loss=0.06382, simple_loss=0.08247, pruned_loss=0.01472, audio_tagging_loss=0.007867, over 15527.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09222, pruned_loss=0.01306, audio_tagging_loss=0.008694, over 3042153.52 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 12:59:28,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-24 12:59:30,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2841220.0, ans=0.125 2023-11-24 12:59:37,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-24 12:59:37,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.528e+01 9.137e+01 9.887e+01 1.205e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 12:59:40,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426200 2023-11-24 12:59:44,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2841286.6666666665, ans=0.125 2023-11-24 12:59:45,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841353.3333333335, ans=0.1 2023-11-24 12:59:56,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2841353.3333333335, ans=0.09899494936611666 2023-11-24 13:00:10,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2841420.0, ans=0.09899494936611666 2023-11-24 13:00:20,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2841486.6666666665, ans=15.0 2023-11-24 13:00:24,514 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5400, loss[loss=0.04711, simple_loss=0.06158, pruned_loss=0.006768, audio_tagging_loss=0.009551, over 15331.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09142, pruned_loss=0.01301, audio_tagging_loss=0.008824, over 3049515.06 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:00:40,193 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:00:43,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426250 2023-11-24 13:00:53,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2841686.6666666665, ans=0.2 2023-11-24 13:00:59,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2841686.6666666665, ans=0.025 2023-11-24 13:00:59,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2841686.6666666665, ans=0.07 2023-11-24 13:01:22,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2841820.0, ans=0.125 2023-11-24 13:01:27,625 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5450, loss[loss=0.07068, simple_loss=0.09074, pruned_loss=0.0157, audio_tagging_loss=0.009611, over 13653.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.09211, pruned_loss=0.01319, audio_tagging_loss=0.00887, over 3047296.06 frames. ], batch size: 52, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:01:43,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.576e+01 9.179e+01 9.817e+01 1.405e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 13:01:46,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426300 2023-11-24 13:01:58,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2023-11-24 13:02:08,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2842086.6666666665, ans=0.2 2023-11-24 13:02:09,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-24 13:02:16,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2842153.3333333335, ans=0.125 2023-11-24 13:02:30,296 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5500, loss[loss=0.05756, simple_loss=0.06928, pruned_loss=0.01352, audio_tagging_loss=0.009391, over 15432.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.0913, pruned_loss=0.0131, audio_tagging_loss=0.008911, over 3048436.90 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:02:30,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2842220.0, ans=0.07 2023-11-24 13:02:48,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426350 2023-11-24 13:02:59,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2842353.3333333335, ans=0.2 2023-11-24 13:03:23,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2842486.6666666665, ans=0.0 2023-11-24 13:03:32,883 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5550, loss[loss=0.06702, simple_loss=0.08207, pruned_loss=0.01627, audio_tagging_loss=0.009708, over 14497.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09153, pruned_loss=0.01332, audio_tagging_loss=0.009056, over 3048229.92 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 8.0 2023-11-24 13:03:50,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.724e+01 9.337e+01 1.024e+02 1.345e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-24 13:03:52,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426400 2023-11-24 13:03:57,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2842686.6666666665, ans=0.07 2023-11-24 13:04:05,009 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:04:10,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2842753.3333333335, ans=0.07 2023-11-24 13:04:28,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2023-11-24 13:04:36,528 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5600, loss[loss=0.0685, simple_loss=0.09721, pruned_loss=0.01263, audio_tagging_loss=0.007262, over 15639.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09279, pruned_loss=0.01343, audio_tagging_loss=0.009104, over 3056005.67 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:04:40,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2842886.6666666665, ans=0.125 2023-11-24 13:04:45,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2842886.6666666665, ans=0.1 2023-11-24 13:04:55,202 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426450 2023-11-24 13:05:03,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-11-24 13:05:10,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2843020.0, ans=0.0 2023-11-24 13:05:16,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2843086.6666666665, ans=15.0 2023-11-24 13:05:19,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2843086.6666666665, ans=0.2 2023-11-24 13:05:20,371 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 13:05:22,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-24 13:05:39,335 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5650, loss[loss=0.06521, simple_loss=0.09095, pruned_loss=0.01015, audio_tagging_loss=0.009578, over 13721.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09302, pruned_loss=0.01344, audio_tagging_loss=0.009126, over 3060128.86 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:05:44,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2843220.0, ans=0.125 2023-11-24 13:05:44,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2843220.0, ans=0.125 2023-11-24 13:05:56,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.631e+01 8.507e+01 8.997e+01 9.661e+01 1.393e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-24 13:05:57,879 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426500 2023-11-24 13:06:04,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2843353.3333333335, ans=0.1 2023-11-24 13:06:11,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2843353.3333333335, ans=10.0 2023-11-24 13:06:11,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2843353.3333333335, ans=0.125 2023-11-24 13:06:18,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2843420.0, ans=0.0 2023-11-24 13:06:23,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2843420.0, ans=0.125 2023-11-24 13:06:34,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2843486.6666666665, ans=0.125 2023-11-24 13:06:41,757 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5700, loss[loss=0.07803, simple_loss=0.09531, pruned_loss=0.02121, audio_tagging_loss=0.009165, over 13653.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.09321, pruned_loss=0.01369, audio_tagging_loss=0.00902, over 3049977.99 frames. ], batch size: 52, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:06:49,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2843553.3333333335, ans=0.1 2023-11-24 13:07:01,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426550 2023-11-24 13:07:11,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2843686.6666666665, ans=0.2 2023-11-24 13:07:12,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2843686.6666666665, ans=0.125 2023-11-24 13:07:21,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2843753.3333333335, ans=0.125 2023-11-24 13:07:42,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-24 13:07:44,423 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5750, loss[loss=0.04341, simple_loss=0.05128, pruned_loss=0.006942, audio_tagging_loss=0.01083, over 14708.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09186, pruned_loss=0.01344, audio_tagging_loss=0.008971, over 3046115.10 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:08:00,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2843953.3333333335, ans=0.0 2023-11-24 13:08:02,008 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.384e+01 8.873e+01 9.530e+01 1.243e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-24 13:08:03,325 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426600 2023-11-24 13:08:03,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2843953.3333333335, ans=0.125 2023-11-24 13:08:48,082 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5800, loss[loss=0.06506, simple_loss=0.09238, pruned_loss=0.00966, audio_tagging_loss=0.009211, over 14017.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09111, pruned_loss=0.01328, audio_tagging_loss=0.008825, over 3053073.70 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:09:04,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2844286.6666666665, ans=0.125 2023-11-24 13:09:05,925 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426650 2023-11-24 13:09:31,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2844420.0, ans=0.1 2023-11-24 13:09:47,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2844486.6666666665, ans=0.0 2023-11-24 13:09:48,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2844553.3333333335, ans=0.0 2023-11-24 13:09:48,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2844553.3333333335, ans=0.07 2023-11-24 13:09:49,426 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5850, loss[loss=0.09716, simple_loss=0.1298, pruned_loss=0.025, audio_tagging_loss=0.007259, over 16901.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09113, pruned_loss=0.01333, audio_tagging_loss=0.008851, over 3046650.40 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:09:54,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2844553.3333333335, ans=0.0 2023-11-24 13:09:57,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2844553.3333333335, ans=0.0 2023-11-24 13:10:07,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.013e+01 8.489e+01 9.123e+01 9.809e+01 1.343e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-24 13:10:08,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426700 2023-11-24 13:10:08,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2844620.0, ans=0.025 2023-11-24 13:10:26,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2844753.3333333335, ans=0.0 2023-11-24 13:10:27,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2844753.3333333335, ans=0.125 2023-11-24 13:10:52,123 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5900, loss[loss=0.06245, simple_loss=0.08174, pruned_loss=0.0125, audio_tagging_loss=0.009083, over 15192.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09244, pruned_loss=0.01342, audio_tagging_loss=0.00868, over 3045205.56 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:10:52,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2844886.6666666665, ans=0.0 2023-11-24 13:11:06,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-11-24 13:11:06,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2023-11-24 13:11:11,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426750 2023-11-24 13:11:27,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2023-11-24 13:11:30,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2845086.6666666665, ans=0.0 2023-11-24 13:11:37,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-24 13:11:41,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-24 13:11:48,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2845153.3333333335, ans=0.125 2023-11-24 13:11:53,910 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 5950, loss[loss=0.05953, simple_loss=0.0764, pruned_loss=0.01001, audio_tagging_loss=0.01132, over 15214.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09183, pruned_loss=0.0133, audio_tagging_loss=0.008717, over 3046769.05 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:11:55,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2845220.0, ans=0.0 2023-11-24 13:12:01,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2845220.0, ans=0.125 2023-11-24 13:12:01,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2845220.0, ans=0.05 2023-11-24 13:12:11,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.386e+01 9.064e+01 9.655e+01 1.412e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-24 13:12:12,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426800 2023-11-24 13:12:21,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2845353.3333333335, ans=0.125 2023-11-24 13:12:30,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2845420.0, ans=0.0 2023-11-24 13:12:32,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2845420.0, ans=0.0 2023-11-24 13:12:56,163 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6000, loss[loss=0.08846, simple_loss=0.1282, pruned_loss=0.01684, audio_tagging_loss=0.00753, over 16394.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09194, pruned_loss=0.0133, audio_tagging_loss=0.008678, over 3050424.10 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:12:56,164 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 13:13:26,495 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5972, 3.5636, 3.8825, 3.4427], device='cuda:1') 2023-11-24 13:13:32,151 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8695, 1.6320, 3.4329, 3.0593, 2.9762, 3.0669, 3.0320, 3.1603], device='cuda:1') 2023-11-24 13:13:36,415 INFO [train_asr.py:1253] (1/4) Epoch 36, validation: loss=0.05813, simple_loss=0.0509, pruned_loss=0.005269, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-24 13:13:36,415 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 13:13:47,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2845620.0, ans=0.0 2023-11-24 13:13:54,686 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426850 2023-11-24 13:14:01,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2845686.6666666665, ans=0.0 2023-11-24 13:14:02,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-11-24 13:14:17,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2845753.3333333335, ans=0.125 2023-11-24 13:14:20,379 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 13:14:20,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2845753.3333333335, ans=0.2 2023-11-24 13:14:24,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2845753.3333333335, ans=0.125 2023-11-24 13:14:27,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2845820.0, ans=0.1 2023-11-24 13:14:28,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-24 13:14:38,875 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6050, loss[loss=0.05373, simple_loss=0.07795, pruned_loss=0.007657, audio_tagging_loss=0.007098, over 16409.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09198, pruned_loss=0.01328, audio_tagging_loss=0.008709, over 3054141.81 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:14:55,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.477e+01 9.229e+01 1.011e+02 1.265e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 13:14:55,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2845953.3333333335, ans=0.0 2023-11-24 13:14:56,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426900 2023-11-24 13:14:59,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2845953.3333333335, ans=0.0 2023-11-24 13:15:01,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2846020.0, ans=0.125 2023-11-24 13:15:10,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2846020.0, ans=0.1 2023-11-24 13:15:18,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2846086.6666666665, ans=0.2 2023-11-24 13:15:21,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2846086.6666666665, ans=0.0 2023-11-24 13:15:39,756 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6100, loss[loss=0.06173, simple_loss=0.08572, pruned_loss=0.01148, audio_tagging_loss=0.007383, over 16291.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09118, pruned_loss=0.01328, audio_tagging_loss=0.008732, over 3058040.57 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:15:51,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-24 13:15:55,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2846286.6666666665, ans=0.09899494936611666 2023-11-24 13:15:59,097 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 426950 2023-11-24 13:16:24,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2846420.0, ans=10.0 2023-11-24 13:16:36,222 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.516e-03 2023-11-24 13:16:38,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2846486.6666666665, ans=0.125 2023-11-24 13:16:42,233 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6150, loss[loss=0.05917, simple_loss=0.08216, pruned_loss=0.009987, audio_tagging_loss=0.008106, over 16539.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09082, pruned_loss=0.01313, audio_tagging_loss=0.008759, over 3063392.54 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:16:57,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2846620.0, ans=0.125 2023-11-24 13:17:00,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.380e+01 9.080e+01 9.869e+01 1.269e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 13:17:01,753 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427000 2023-11-24 13:17:07,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-24 13:17:17,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2846686.6666666665, ans=0.025 2023-11-24 13:17:37,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:17:46,196 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6200, loss[loss=0.06139, simple_loss=0.08612, pruned_loss=0.01144, audio_tagging_loss=0.006888, over 15055.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09121, pruned_loss=0.01318, audio_tagging_loss=0.008834, over 3064075.78 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:18:03,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-24 13:18:04,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427050 2023-11-24 13:18:13,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-24 13:18:18,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2847020.0, ans=0.125 2023-11-24 13:18:39,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2847153.3333333335, ans=0.0 2023-11-24 13:18:46,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-24 13:18:48,609 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6250, loss[loss=0.05999, simple_loss=0.08488, pruned_loss=0.00763, audio_tagging_loss=0.009919, over 14699.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09093, pruned_loss=0.01305, audio_tagging_loss=0.008936, over 3059817.59 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:19:03,160 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:19:04,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2847286.6666666665, ans=0.125 2023-11-24 13:19:07,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.748e+01 9.375e+01 1.005e+02 1.259e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-24 13:19:07,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427100 2023-11-24 13:19:22,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2847353.3333333335, ans=0.125 2023-11-24 13:19:29,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2847420.0, ans=0.0 2023-11-24 13:19:51,306 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6300, loss[loss=0.05936, simple_loss=0.08115, pruned_loss=0.009938, audio_tagging_loss=0.00885, over 16869.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09073, pruned_loss=0.01311, audio_tagging_loss=0.009054, over 3063731.19 frames. ], batch size: 66, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:19:55,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2847553.3333333335, ans=0.1 2023-11-24 13:20:02,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2847553.3333333335, ans=0.0 2023-11-24 13:20:11,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427150 2023-11-24 13:20:21,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2847686.6666666665, ans=0.125 2023-11-24 13:20:38,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2847753.3333333335, ans=0.125 2023-11-24 13:20:55,132 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6350, loss[loss=0.07549, simple_loss=0.09989, pruned_loss=0.0182, audio_tagging_loss=0.007339, over 15176.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09107, pruned_loss=0.01308, audio_tagging_loss=0.008997, over 3060262.90 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:20:58,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2847886.6666666665, ans=0.125 2023-11-24 13:21:05,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=2847886.6666666665, ans=22.5 2023-11-24 13:21:13,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.568e+01 9.156e+01 9.751e+01 1.252e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 13:21:13,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427200 2023-11-24 13:21:38,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-24 13:21:38,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2848086.6666666665, ans=0.125 2023-11-24 13:21:40,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2848086.6666666665, ans=0.2 2023-11-24 13:21:57,639 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6400, loss[loss=0.0626, simple_loss=0.08418, pruned_loss=0.01434, audio_tagging_loss=0.006169, over 15167.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09139, pruned_loss=0.01319, audio_tagging_loss=0.009053, over 3056866.99 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:22:07,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2848220.0, ans=0.0 2023-11-24 13:22:10,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-24 13:22:15,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427250 2023-11-24 13:22:52,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2848486.6666666665, ans=0.5 2023-11-24 13:22:59,733 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6450, loss[loss=0.06412, simple_loss=0.09014, pruned_loss=0.01044, audio_tagging_loss=0.008619, over 14992.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09105, pruned_loss=0.01307, audio_tagging_loss=0.009159, over 3049535.46 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:23:01,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848553.3333333335, ans=0.1 2023-11-24 13:23:05,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2848553.3333333335, ans=0.0 2023-11-24 13:23:18,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.325e+01 9.052e+01 9.909e+01 1.282e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 13:23:19,759 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427300 2023-11-24 13:23:19,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2848620.0, ans=0.2 2023-11-24 13:23:21,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2848620.0, ans=0.0 2023-11-24 13:23:43,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2023-11-24 13:23:45,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2848753.3333333335, ans=0.0 2023-11-24 13:24:03,307 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6500, loss[loss=0.06315, simple_loss=0.09582, pruned_loss=0.007757, audio_tagging_loss=0.007483, over 16165.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09161, pruned_loss=0.01319, audio_tagging_loss=0.009044, over 3047880.79 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:24:06,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2848886.6666666665, ans=0.125 2023-11-24 13:24:10,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2023-11-24 13:24:22,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427350 2023-11-24 13:24:26,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2848953.3333333335, ans=0.0 2023-11-24 13:24:28,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2849020.0, ans=0.1 2023-11-24 13:24:50,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2849086.6666666665, ans=0.0 2023-11-24 13:25:06,996 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6550, loss[loss=0.07934, simple_loss=0.1124, pruned_loss=0.01468, audio_tagging_loss=0.008476, over 14813.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09179, pruned_loss=0.01316, audio_tagging_loss=0.008947, over 3049025.66 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:25:15,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2849220.0, ans=0.125 2023-11-24 13:25:25,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427400 2023-11-24 13:25:26,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 8.536e+01 9.192e+01 9.836e+01 1.277e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-24 13:25:35,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2849353.3333333335, ans=0.125 2023-11-24 13:25:36,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2849353.3333333335, ans=0.04949747468305833 2023-11-24 13:25:48,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-24 13:26:09,474 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6600, loss[loss=0.07045, simple_loss=0.1016, pruned_loss=0.01463, audio_tagging_loss=0.005003, over 15802.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09128, pruned_loss=0.01287, audio_tagging_loss=0.008805, over 3039549.57 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:26:25,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2849620.0, ans=0.0 2023-11-24 13:26:28,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427450 2023-11-24 13:26:29,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-24 13:26:30,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2849620.0, ans=0.0 2023-11-24 13:26:40,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2849686.6666666665, ans=0.1 2023-11-24 13:26:46,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2849753.3333333335, ans=10.0 2023-11-24 13:26:59,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2849820.0, ans=0.1 2023-11-24 13:27:02,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2849820.0, ans=0.125 2023-11-24 13:27:11,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-24 13:27:13,525 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6650, loss[loss=0.06193, simple_loss=0.08425, pruned_loss=0.01246, audio_tagging_loss=0.007341, over 15294.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09058, pruned_loss=0.01298, audio_tagging_loss=0.008868, over 3035030.01 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:27:31,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427500 2023-11-24 13:27:33,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.517e+01 9.117e+01 9.877e+01 1.434e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 13:27:47,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2850020.0, ans=0.0 2023-11-24 13:27:52,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2850086.6666666665, ans=0.05 2023-11-24 13:28:16,072 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6700, loss[loss=0.07202, simple_loss=0.09502, pruned_loss=0.01515, audio_tagging_loss=0.00936, over 15198.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09201, pruned_loss=0.01318, audio_tagging_loss=0.008824, over 3032114.19 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:28:26,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2850220.0, ans=0.0 2023-11-24 13:28:34,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427550 2023-11-24 13:28:42,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2850353.3333333335, ans=0.125 2023-11-24 13:29:11,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2850486.6666666665, ans=0.0 2023-11-24 13:29:19,038 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6750, loss[loss=0.06311, simple_loss=0.08271, pruned_loss=0.01345, audio_tagging_loss=0.008303, over 15894.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.0914, pruned_loss=0.01309, audio_tagging_loss=0.008827, over 3026380.63 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:29:20,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2850553.3333333335, ans=0.02 2023-11-24 13:29:23,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2850553.3333333335, ans=0.2 2023-11-24 13:29:24,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2850553.3333333335, ans=0.125 2023-11-24 13:29:38,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427600 2023-11-24 13:29:38,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2850620.0, ans=0.1 2023-11-24 13:29:39,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.434e+01 8.966e+01 9.987e+01 1.535e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 13:30:17,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2850820.0, ans=0.2 2023-11-24 13:30:22,661 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6800, loss[loss=0.1023, simple_loss=0.1388, pruned_loss=0.02514, audio_tagging_loss=0.007791, over 15475.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09168, pruned_loss=0.01321, audio_tagging_loss=0.008798, over 3028790.23 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:30:28,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-24 13:30:35,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-24 13:30:41,066 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427650 2023-11-24 13:30:57,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2851020.0, ans=0.125 2023-11-24 13:31:02,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2851086.6666666665, ans=0.09899494936611666 2023-11-24 13:31:17,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2023-11-24 13:31:24,673 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6850, loss[loss=0.07446, simple_loss=0.1067, pruned_loss=0.01574, audio_tagging_loss=0.005379, over 16187.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.0919, pruned_loss=0.01329, audio_tagging_loss=0.008626, over 3026714.13 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:31:30,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2851220.0, ans=0.04949747468305833 2023-11-24 13:31:32,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851220.0, ans=0.1 2023-11-24 13:31:40,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2851286.6666666665, ans=0.0 2023-11-24 13:31:43,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427700 2023-11-24 13:31:44,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.309e+01 9.090e+01 9.871e+01 1.187e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-24 13:32:00,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851420.0, ans=0.1 2023-11-24 13:32:02,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-24 13:32:08,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2851420.0, ans=0.05 2023-11-24 13:32:11,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-24 13:32:17,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-24 13:32:24,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851486.6666666665, ans=0.1 2023-11-24 13:32:26,252 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6900, loss[loss=0.08429, simple_loss=0.1154, pruned_loss=0.01835, audio_tagging_loss=0.008256, over 15466.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09222, pruned_loss=0.01318, audio_tagging_loss=0.008609, over 3029673.84 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:32:45,900 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427750 2023-11-24 13:33:05,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2851753.3333333335, ans=0.125 2023-11-24 13:33:08,843 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:33:11,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-24 13:33:14,579 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 13:33:17,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2851820.0, ans=0.125 2023-11-24 13:33:25,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2851820.0, ans=0.2 2023-11-24 13:33:28,825 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 6950, loss[loss=0.06112, simple_loss=0.08366, pruned_loss=0.01032, audio_tagging_loss=0.008967, over 16589.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09314, pruned_loss=0.01341, audio_tagging_loss=0.008657, over 3033992.78 frames. ], batch size: 63, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:33:47,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427800 2023-11-24 13:33:50,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.545e+01 9.109e+01 9.801e+01 1.234e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-24 13:33:51,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851953.3333333335, ans=0.1 2023-11-24 13:34:19,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.78 vs. limit=10.0 2023-11-24 13:34:31,853 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7000, loss[loss=0.08603, simple_loss=0.126, pruned_loss=0.01719, audio_tagging_loss=0.005817, over 15486.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09239, pruned_loss=0.01328, audio_tagging_loss=0.00878, over 3046638.78 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:34:38,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2023-11-24 13:34:41,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2852220.0, ans=0.1 2023-11-24 13:34:49,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427850 2023-11-24 13:35:08,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2852420.0, ans=0.125 2023-11-24 13:35:11,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2852420.0, ans=0.125 2023-11-24 13:35:24,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-24 13:35:29,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2852486.6666666665, ans=0.0 2023-11-24 13:35:33,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2852553.3333333335, ans=0.0 2023-11-24 13:35:34,277 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7050, loss[loss=0.06952, simple_loss=0.08928, pruned_loss=0.01418, audio_tagging_loss=0.0107, over 15087.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09214, pruned_loss=0.01328, audio_tagging_loss=0.00889, over 3045948.10 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:35:48,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2852620.0, ans=0.04949747468305833 2023-11-24 13:35:52,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2852620.0, ans=0.125 2023-11-24 13:35:53,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427900 2023-11-24 13:35:57,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.577e+01 9.267e+01 9.852e+01 1.175e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 13:35:58,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2852686.6666666665, ans=0.125 2023-11-24 13:36:01,376 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:36:14,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-24 13:36:27,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2852820.0, ans=0.125 2023-11-24 13:36:28,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2852820.0, ans=0.0 2023-11-24 13:36:30,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2852820.0, ans=0.0 2023-11-24 13:36:37,823 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7100, loss[loss=0.08559, simple_loss=0.114, pruned_loss=0.02069, audio_tagging_loss=0.007898, over 15770.00 frames. ], tot_loss[loss=0.06843, simple_loss=0.09211, pruned_loss=0.01343, audio_tagging_loss=0.008947, over 3043217.13 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 8.0 2023-11-24 13:36:45,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2852886.6666666665, ans=0.0 2023-11-24 13:36:52,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2023-11-24 13:36:56,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 427950 2023-11-24 13:37:33,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2853153.3333333335, ans=0.125 2023-11-24 13:37:40,813 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7150, loss[loss=0.08383, simple_loss=0.1182, pruned_loss=0.01693, audio_tagging_loss=0.007772, over 15514.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09293, pruned_loss=0.01354, audio_tagging_loss=0.008948, over 3043145.40 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 8.0 2023-11-24 13:37:52,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2853286.6666666665, ans=0.125 2023-11-24 13:37:57,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2853286.6666666665, ans=0.0 2023-11-24 13:37:59,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428000 2023-11-24 13:38:05,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.644e+01 9.347e+01 1.029e+02 1.240e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-24 13:38:06,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2853286.6666666665, ans=0.125 2023-11-24 13:38:08,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=22.5 2023-11-24 13:38:11,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-11-24 13:38:23,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-24 13:38:46,522 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7200, loss[loss=0.05703, simple_loss=0.07069, pruned_loss=0.009559, audio_tagging_loss=0.01212, over 14448.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09177, pruned_loss=0.01332, audio_tagging_loss=0.009045, over 3044514.07 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:38:49,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2853553.3333333335, ans=0.1 2023-11-24 13:38:57,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2853620.0, ans=0.0 2023-11-24 13:38:59,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2853620.0, ans=0.0 2023-11-24 13:39:03,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2853620.0, ans=0.0 2023-11-24 13:39:04,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428050 2023-11-24 13:39:16,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2853686.6666666665, ans=0.125 2023-11-24 13:39:35,342 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:39:39,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2853820.0, ans=0.2 2023-11-24 13:39:46,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-24 13:39:48,300 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7250, loss[loss=0.06423, simple_loss=0.08757, pruned_loss=0.01196, audio_tagging_loss=0.008491, over 14879.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09117, pruned_loss=0.01329, audio_tagging_loss=0.009089, over 3040581.89 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:40:08,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428100 2023-11-24 13:40:08,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2853953.3333333335, ans=0.125 2023-11-24 13:40:09,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2853953.3333333335, ans=0.0 2023-11-24 13:40:12,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.374e+01 8.915e+01 9.867e+01 1.264e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-24 13:40:51,994 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7300, loss[loss=0.09122, simple_loss=0.1284, pruned_loss=0.02088, audio_tagging_loss=0.00615, over 15096.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09092, pruned_loss=0.01308, audio_tagging_loss=0.008938, over 3041676.46 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:41:10,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428150 2023-11-24 13:41:10,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2854286.6666666665, ans=0.2 2023-11-24 13:41:13,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2854286.6666666665, ans=0.125 2023-11-24 13:41:14,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2854286.6666666665, ans=0.125 2023-11-24 13:41:15,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2023-11-24 13:41:16,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2854353.3333333335, ans=0.125 2023-11-24 13:41:16,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2854353.3333333335, ans=0.2 2023-11-24 13:41:27,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2854353.3333333335, ans=0.0 2023-11-24 13:41:53,636 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7350, loss[loss=0.07396, simple_loss=0.1056, pruned_loss=0.01371, audio_tagging_loss=0.007475, over 14450.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09033, pruned_loss=0.01295, audio_tagging_loss=0.008928, over 3038697.74 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:41:55,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.19 vs. limit=22.5 2023-11-24 13:42:02,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2854553.3333333335, ans=0.1 2023-11-24 13:42:08,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2854620.0, ans=0.0 2023-11-24 13:42:12,265 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428200 2023-11-24 13:42:15,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.426e+01 8.548e+01 9.171e+01 1.029e+02 1.460e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 13:42:29,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=2854686.6666666665, ans=0.1 2023-11-24 13:42:30,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2854753.3333333335, ans=10.0 2023-11-24 13:42:33,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2854753.3333333335, ans=0.5 2023-11-24 13:42:35,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2854753.3333333335, ans=0.125 2023-11-24 13:42:37,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2854753.3333333335, ans=0.125 2023-11-24 13:42:37,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2854753.3333333335, ans=0.0 2023-11-24 13:42:50,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2854820.0, ans=0.125 2023-11-24 13:42:55,285 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7400, loss[loss=0.07134, simple_loss=0.09328, pruned_loss=0.0146, audio_tagging_loss=0.0101, over 13279.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09054, pruned_loss=0.01299, audio_tagging_loss=0.008878, over 3038705.85 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:43:03,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2854886.6666666665, ans=0.125 2023-11-24 13:43:05,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2854886.6666666665, ans=0.125 2023-11-24 13:43:14,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428250 2023-11-24 13:43:37,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2855086.6666666665, ans=0.125 2023-11-24 13:43:58,286 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7450, loss[loss=0.07141, simple_loss=0.1022, pruned_loss=0.01204, audio_tagging_loss=0.00827, over 14780.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09074, pruned_loss=0.01308, audio_tagging_loss=0.008896, over 3037354.67 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:44:05,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2855220.0, ans=0.2 2023-11-24 13:44:17,757 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428300 2023-11-24 13:44:21,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.702e+01 9.284e+01 1.003e+02 1.240e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-24 13:44:22,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2855353.3333333335, ans=0.125 2023-11-24 13:44:47,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2855486.6666666665, ans=0.0 2023-11-24 13:44:51,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2855486.6666666665, ans=0.2 2023-11-24 13:45:01,220 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7500, loss[loss=0.06429, simple_loss=0.07291, pruned_loss=0.01481, audio_tagging_loss=0.01302, over 14820.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09063, pruned_loss=0.01293, audio_tagging_loss=0.008846, over 3044132.02 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:45:19,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428350 2023-11-24 13:45:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2855753.3333333335, ans=0.125 2023-11-24 13:46:02,700 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7550, loss[loss=0.04752, simple_loss=0.06055, pruned_loss=0.007306, audio_tagging_loss=0.009934, over 14395.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.0903, pruned_loss=0.01313, audio_tagging_loss=0.008858, over 3035609.17 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:46:21,629 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428400 2023-11-24 13:46:26,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.555e+01 9.258e+01 9.894e+01 1.443e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 13:46:47,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-24 13:46:48,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2856086.6666666665, ans=0.125 2023-11-24 13:47:05,640 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7600, loss[loss=0.06501, simple_loss=0.09083, pruned_loss=0.01229, audio_tagging_loss=0.007301, over 14416.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08996, pruned_loss=0.01292, audio_tagging_loss=0.008874, over 3039600.11 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:47:13,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2023-11-24 13:47:24,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428450 2023-11-24 13:48:02,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2856486.6666666665, ans=0.125 2023-11-24 13:48:08,986 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7650, loss[loss=0.07184, simple_loss=0.1014, pruned_loss=0.01314, audio_tagging_loss=0.008, over 16443.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09092, pruned_loss=0.01318, audio_tagging_loss=0.008776, over 3050134.03 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:48:20,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2856620.0, ans=0.0 2023-11-24 13:48:27,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428500 2023-11-24 13:48:30,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.493e+01 9.050e+01 9.587e+01 1.815e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 13:48:55,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2856753.3333333335, ans=0.125 2023-11-24 13:48:58,167 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:49:12,501 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7700, loss[loss=0.06984, simple_loss=0.09911, pruned_loss=0.0139, audio_tagging_loss=0.006382, over 13884.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09044, pruned_loss=0.01305, audio_tagging_loss=0.008831, over 3041242.37 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:49:12,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2856886.6666666665, ans=0.0 2023-11-24 13:49:31,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428550 2023-11-24 13:49:31,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2856953.3333333335, ans=0.125 2023-11-24 13:49:42,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2857020.0, ans=0.125 2023-11-24 13:49:44,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2857020.0, ans=0.0 2023-11-24 13:49:58,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2857086.6666666665, ans=0.2 2023-11-24 13:50:15,599 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7750, loss[loss=0.07507, simple_loss=0.1022, pruned_loss=0.01735, audio_tagging_loss=0.006617, over 15598.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0906, pruned_loss=0.01293, audio_tagging_loss=0.008791, over 3043051.53 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:50:19,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2857220.0, ans=0.0 2023-11-24 13:50:21,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2857220.0, ans=0.125 2023-11-24 13:50:34,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428600 2023-11-24 13:50:36,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2857286.6666666665, ans=0.09899494936611666 2023-11-24 13:50:38,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.354e+01 9.227e+01 9.861e+01 1.458e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-24 13:51:18,256 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7800, loss[loss=0.06653, simple_loss=0.08723, pruned_loss=0.0136, audio_tagging_loss=0.009315, over 15617.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09034, pruned_loss=0.01296, audio_tagging_loss=0.008891, over 3044172.84 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:51:36,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428650 2023-11-24 13:52:03,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2857753.3333333335, ans=0.2 2023-11-24 13:52:04,596 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:52:20,574 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7850, loss[loss=0.0839, simple_loss=0.1048, pruned_loss=0.02149, audio_tagging_loss=0.01002, over 15470.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08949, pruned_loss=0.01292, audio_tagging_loss=0.008997, over 3041582.61 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:52:24,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2857886.6666666665, ans=0.125 2023-11-24 13:52:26,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2857886.6666666665, ans=0.0 2023-11-24 13:52:39,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428700 2023-11-24 13:52:43,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 8.512e+01 9.149e+01 9.887e+01 1.408e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 13:52:45,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2858020.0, ans=0.0 2023-11-24 13:52:54,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2858020.0, ans=0.0 2023-11-24 13:52:59,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2023-11-24 13:53:23,072 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7900, loss[loss=0.06448, simple_loss=0.09085, pruned_loss=0.009345, audio_tagging_loss=0.009708, over 14661.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08956, pruned_loss=0.01288, audio_tagging_loss=0.009135, over 3039536.87 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:53:24,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-24 13:53:26,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2858220.0, ans=0.0 2023-11-24 13:53:31,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2858220.0, ans=0.0 2023-11-24 13:53:37,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2858286.6666666665, ans=0.2 2023-11-24 13:53:42,159 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428750 2023-11-24 13:53:47,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2858353.3333333335, ans=0.125 2023-11-24 13:53:48,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2858353.3333333335, ans=0.125 2023-11-24 13:54:02,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2858420.0, ans=0.125 2023-11-24 13:54:09,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858420.0, ans=0.1 2023-11-24 13:54:12,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2858486.6666666665, ans=0.0 2023-11-24 13:54:26,265 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 7950, loss[loss=0.07324, simple_loss=0.09948, pruned_loss=0.01459, audio_tagging_loss=0.008907, over 16152.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.08967, pruned_loss=0.01286, audio_tagging_loss=0.009163, over 3039390.61 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:54:31,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-24 13:54:32,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2858553.3333333335, ans=0.125 2023-11-24 13:54:41,146 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 13:54:44,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428800 2023-11-24 13:54:49,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.877e+01 9.547e+01 1.019e+02 1.290e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-24 13:54:53,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2858686.6666666665, ans=0.2 2023-11-24 13:55:16,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2858820.0, ans=0.0 2023-11-24 13:55:22,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2858820.0, ans=0.2 2023-11-24 13:55:23,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2858820.0, ans=0.0 2023-11-24 13:55:28,585 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8000, loss[loss=0.05427, simple_loss=0.07806, pruned_loss=0.00497, audio_tagging_loss=0.01027, over 14925.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08981, pruned_loss=0.01286, audio_tagging_loss=0.009184, over 3041574.67 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 13:55:34,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2858886.6666666665, ans=0.05 2023-11-24 13:55:46,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428850 2023-11-24 13:56:15,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2859086.6666666665, ans=0.2 2023-11-24 13:56:21,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2859153.3333333335, ans=0.125 2023-11-24 13:56:24,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2859153.3333333335, ans=0.0 2023-11-24 13:56:27,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2859153.3333333335, ans=0.0 2023-11-24 13:56:30,593 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8050, loss[loss=0.05607, simple_loss=0.06755, pruned_loss=0.01038, audio_tagging_loss=0.01192, over 14968.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0889, pruned_loss=0.01281, audio_tagging_loss=0.009251, over 3042176.34 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:56:49,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428900 2023-11-24 13:56:55,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.620e+01 9.250e+01 9.827e+01 1.123e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 13:57:05,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2859353.3333333335, ans=0.5 2023-11-24 13:57:24,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2859486.6666666665, ans=0.04949747468305833 2023-11-24 13:57:28,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2859486.6666666665, ans=0.125 2023-11-24 13:57:32,540 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8100, loss[loss=0.07143, simple_loss=0.0947, pruned_loss=0.01465, audio_tagging_loss=0.009429, over 15726.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0884, pruned_loss=0.01286, audio_tagging_loss=0.009221, over 3040015.12 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:57:45,400 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 13:57:51,010 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 428950 2023-11-24 13:58:06,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2859686.6666666665, ans=0.125 2023-11-24 13:58:15,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2859753.3333333335, ans=0.1 2023-11-24 13:58:20,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2859820.0, ans=0.0 2023-11-24 13:58:31,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2859820.0, ans=0.2 2023-11-24 13:58:32,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2859820.0, ans=0.0 2023-11-24 13:58:34,212 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8150, loss[loss=0.07655, simple_loss=0.1093, pruned_loss=0.0132, audio_tagging_loss=0.008721, over 16543.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08874, pruned_loss=0.01285, audio_tagging_loss=0.009084, over 3040274.92 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:58:53,435 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429000 2023-11-24 13:59:00,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.661e+01 9.369e+01 1.006e+02 1.730e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-24 13:59:15,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2860086.6666666665, ans=0.0 2023-11-24 13:59:23,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2860153.3333333335, ans=0.0 2023-11-24 13:59:27,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2860153.3333333335, ans=0.2 2023-11-24 13:59:35,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2860220.0, ans=0.125 2023-11-24 13:59:36,945 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8200, loss[loss=0.05931, simple_loss=0.0754, pruned_loss=0.009337, audio_tagging_loss=0.01227, over 14698.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.0902, pruned_loss=0.01294, audio_tagging_loss=0.009045, over 3042505.36 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 13:59:37,014 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 13:59:37,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2860220.0, ans=0.0 2023-11-24 13:59:38,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-24 13:59:48,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2023-11-24 13:59:52,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2860286.6666666665, ans=0.125 2023-11-24 13:59:55,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429050 2023-11-24 14:00:09,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-24 14:00:17,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2860420.0, ans=0.0 2023-11-24 14:00:39,588 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8250, loss[loss=0.0537, simple_loss=0.06981, pruned_loss=0.009463, audio_tagging_loss=0.009327, over 14960.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08962, pruned_loss=0.01293, audio_tagging_loss=0.00902, over 3040430.21 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:00:57,984 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429100 2023-11-24 14:01:03,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.495e+01 9.085e+01 9.878e+01 1.172e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 14:01:12,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2860686.6666666665, ans=0.125 2023-11-24 14:01:40,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2860886.6666666665, ans=0.125 2023-11-24 14:01:41,634 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8300, loss[loss=0.05925, simple_loss=0.07486, pruned_loss=0.01196, audio_tagging_loss=0.009849, over 14654.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0894, pruned_loss=0.01276, audio_tagging_loss=0.009031, over 3043210.99 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 8.0 2023-11-24 14:01:53,804 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:01:57,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2860953.3333333335, ans=0.125 2023-11-24 14:01:57,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2860953.3333333335, ans=0.04949747468305833 2023-11-24 14:01:59,624 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429150 2023-11-24 14:02:14,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2861020.0, ans=0.125 2023-11-24 14:02:38,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2861153.3333333335, ans=0.0 2023-11-24 14:02:38,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-24 14:02:39,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2861153.3333333335, ans=0.2 2023-11-24 14:02:42,966 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8350, loss[loss=0.05588, simple_loss=0.07703, pruned_loss=0.009932, audio_tagging_loss=0.00743, over 15604.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08977, pruned_loss=0.01289, audio_tagging_loss=0.008942, over 3047303.05 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 8.0 2023-11-24 14:02:44,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-24 14:02:48,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2861220.0, ans=15.0 2023-11-24 14:02:51,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2861220.0, ans=0.125 2023-11-24 14:03:00,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2861286.6666666665, ans=0.125 2023-11-24 14:03:02,482 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429200 2023-11-24 14:03:03,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2861286.6666666665, ans=0.125 2023-11-24 14:03:03,895 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:03:10,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.055e+01 8.776e+01 9.274e+01 1.012e+02 1.289e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-24 14:03:15,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=2861353.3333333335, ans=12.0 2023-11-24 14:03:26,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2861420.0, ans=0.125 2023-11-24 14:03:46,360 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8400, loss[loss=0.07221, simple_loss=0.102, pruned_loss=0.01362, audio_tagging_loss=0.007598, over 14795.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08961, pruned_loss=0.01281, audio_tagging_loss=0.008889, over 3043696.75 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:03:49,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2861553.3333333335, ans=0.07 2023-11-24 14:03:54,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2861553.3333333335, ans=0.125 2023-11-24 14:04:04,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429250 2023-11-24 14:04:23,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2023-11-24 14:04:25,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2861753.3333333335, ans=0.125 2023-11-24 14:04:29,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2861753.3333333335, ans=0.0 2023-11-24 14:04:47,751 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8450, loss[loss=0.06209, simple_loss=0.08151, pruned_loss=0.009029, audio_tagging_loss=0.01231, over 15934.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08966, pruned_loss=0.01278, audio_tagging_loss=0.008902, over 3044168.17 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:04:49,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2861886.6666666665, ans=0.125 2023-11-24 14:05:05,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429300 2023-11-24 14:05:13,169 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.551e+01 9.201e+01 9.862e+01 3.144e+02, threshold=1.840e+02, percent-clipped=1.0 2023-11-24 14:05:14,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2862020.0, ans=0.0 2023-11-24 14:05:34,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2862086.6666666665, ans=0.125 2023-11-24 14:05:35,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-24 14:05:48,431 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8500, loss[loss=0.048, simple_loss=0.06194, pruned_loss=0.00663, audio_tagging_loss=0.0104, over 16129.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09, pruned_loss=0.01275, audio_tagging_loss=0.008927, over 3052313.08 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:05:50,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-11-24 14:05:55,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2862220.0, ans=0.125 2023-11-24 14:06:06,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862286.6666666665, ans=0.1 2023-11-24 14:06:08,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429350 2023-11-24 14:06:17,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2862353.3333333335, ans=0.05 2023-11-24 14:06:18,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2862353.3333333335, ans=0.2 2023-11-24 14:06:20,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2023-11-24 14:06:25,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2862420.0, ans=0.0 2023-11-24 14:06:27,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2862420.0, ans=0.125 2023-11-24 14:06:50,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2862553.3333333335, ans=0.0 2023-11-24 14:06:51,106 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8550, loss[loss=0.05566, simple_loss=0.0758, pruned_loss=0.009725, audio_tagging_loss=0.008041, over 14752.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08972, pruned_loss=0.01281, audio_tagging_loss=0.008994, over 3052573.17 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:06:59,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2862553.3333333335, ans=0.0 2023-11-24 14:07:10,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429400 2023-11-24 14:07:17,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.644e+01 9.213e+01 9.720e+01 1.244e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 14:07:19,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2862686.6666666665, ans=0.0 2023-11-24 14:07:21,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-24 14:07:25,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2862686.6666666665, ans=0.0 2023-11-24 14:07:38,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862753.3333333335, ans=0.1 2023-11-24 14:07:54,006 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8600, loss[loss=0.05182, simple_loss=0.06867, pruned_loss=0.009052, audio_tagging_loss=0.008438, over 16936.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.08968, pruned_loss=0.01296, audio_tagging_loss=0.009049, over 3046023.79 frames. ], batch size: 66, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:08:02,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2862886.6666666665, ans=0.125 2023-11-24 14:08:08,719 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:08:12,094 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429450 2023-11-24 14:08:17,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863020.0, ans=0.1 2023-11-24 14:08:26,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2863020.0, ans=0.125 2023-11-24 14:08:44,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2863153.3333333335, ans=0.5 2023-11-24 14:08:47,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2863153.3333333335, ans=0.125 2023-11-24 14:08:55,749 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8650, loss[loss=0.0655, simple_loss=0.08298, pruned_loss=0.0132, audio_tagging_loss=0.01081, over 15637.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09031, pruned_loss=0.01293, audio_tagging_loss=0.008955, over 3045461.94 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:08:56,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-24 14:09:14,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429500 2023-11-24 14:09:22,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.469e+01 9.080e+01 9.893e+01 1.324e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-24 14:09:28,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2023-11-24 14:09:33,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2863420.0, ans=0.125 2023-11-24 14:09:38,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2863420.0, ans=0.125 2023-11-24 14:09:57,742 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8700, loss[loss=0.05838, simple_loss=0.08051, pruned_loss=0.01041, audio_tagging_loss=0.007715, over 14823.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09078, pruned_loss=0.01301, audio_tagging_loss=0.009019, over 3043495.62 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:10:16,350 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:10:17,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429550 2023-11-24 14:10:21,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:10:36,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2863753.3333333335, ans=0.125 2023-11-24 14:11:00,198 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8750, loss[loss=0.08794, simple_loss=0.1165, pruned_loss=0.01994, audio_tagging_loss=0.009734, over 14486.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09164, pruned_loss=0.01311, audio_tagging_loss=0.009086, over 3044363.14 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2023-11-24 14:11:00,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2863886.6666666665, ans=0.125 2023-11-24 14:11:17,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429600 2023-11-24 14:11:25,143 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 8.680e+01 9.303e+01 1.028e+02 1.677e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-24 14:11:29,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2864020.0, ans=0.1 2023-11-24 14:11:40,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864086.6666666665, ans=0.1 2023-11-24 14:11:44,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2864086.6666666665, ans=0.09899494936611666 2023-11-24 14:11:45,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-24 14:11:56,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2864153.3333333335, ans=0.2 2023-11-24 14:11:57,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2864153.3333333335, ans=0.125 2023-11-24 14:12:02,023 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8800, loss[loss=0.06606, simple_loss=0.09597, pruned_loss=0.01107, audio_tagging_loss=0.007004, over 15457.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09271, pruned_loss=0.01332, audio_tagging_loss=0.00905, over 3048965.68 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 14:12:16,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2864286.6666666665, ans=0.0 2023-11-24 14:12:21,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429650 2023-11-24 14:13:04,071 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8850, loss[loss=0.05556, simple_loss=0.07304, pruned_loss=0.01119, audio_tagging_loss=0.00785, over 14688.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09268, pruned_loss=0.01334, audio_tagging_loss=0.009155, over 3051217.57 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 14:13:06,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2864553.3333333335, ans=0.125 2023-11-24 14:13:09,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=22.5 2023-11-24 14:13:16,431 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:13:23,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429700 2023-11-24 14:13:30,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2864686.6666666665, ans=0.0 2023-11-24 14:13:30,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.562e+01 9.103e+01 9.836e+01 1.222e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 14:13:43,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2864753.3333333335, ans=0.2 2023-11-24 14:13:43,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864753.3333333335, ans=0.1 2023-11-24 14:13:43,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-24 14:14:07,397 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8900, loss[loss=0.06492, simple_loss=0.08847, pruned_loss=0.0103, audio_tagging_loss=0.01038, over 15349.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09203, pruned_loss=0.0133, audio_tagging_loss=0.009116, over 3047598.04 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 14:14:08,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864886.6666666665, ans=0.125 2023-11-24 14:14:14,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2864886.6666666665, ans=0.035 2023-11-24 14:14:25,790 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429750 2023-11-24 14:14:33,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2865020.0, ans=0.125 2023-11-24 14:15:01,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2865153.3333333335, ans=0.09899494936611666 2023-11-24 14:15:02,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2865153.3333333335, ans=0.2 2023-11-24 14:15:05,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2865153.3333333335, ans=0.0 2023-11-24 14:15:08,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2865220.0, ans=0.0 2023-11-24 14:15:09,576 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 8950, loss[loss=0.06191, simple_loss=0.08076, pruned_loss=0.01359, audio_tagging_loss=0.007942, over 13747.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09224, pruned_loss=0.01313, audio_tagging_loss=0.009005, over 3049539.10 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 32.0 2023-11-24 14:15:10,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2865220.0, ans=0.05 2023-11-24 14:15:15,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2865220.0, ans=0.125 2023-11-24 14:15:27,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429800 2023-11-24 14:15:35,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.550e+01 9.254e+01 9.924e+01 1.359e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 14:15:39,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865353.3333333335, ans=0.1 2023-11-24 14:16:11,695 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9000, loss[loss=0.07878, simple_loss=0.111, pruned_loss=0.01354, audio_tagging_loss=0.009725, over 14593.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09158, pruned_loss=0.01303, audio_tagging_loss=0.008926, over 3042113.46 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:16:11,696 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 14:16:43,964 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8091, 5.8545, 5.8923, 5.8856], device='cuda:1') 2023-11-24 14:16:46,037 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4392, 3.7265, 4.3912, 3.4006], device='cuda:1') 2023-11-24 14:16:50,245 INFO [train_asr.py:1253] (1/4) Epoch 36, validation: loss=0.05864, simple_loss=0.05081, pruned_loss=0.005226, audio_tagging_loss=0.02801, over 4681554.00 frames. 2023-11-24 14:16:50,246 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 14:16:53,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2865553.3333333335, ans=0.125 2023-11-24 14:17:08,454 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429850 2023-11-24 14:17:51,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865886.6666666665, ans=0.1 2023-11-24 14:17:52,338 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9050, loss[loss=0.06916, simple_loss=0.08898, pruned_loss=0.01373, audio_tagging_loss=0.01094, over 15595.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09125, pruned_loss=0.01298, audio_tagging_loss=0.008855, over 3040811.07 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:17:53,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2865886.6666666665, ans=0.125 2023-11-24 14:18:09,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-24 14:18:10,808 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429900 2023-11-24 14:18:16,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2866020.0, ans=0.125 2023-11-24 14:18:17,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2866020.0, ans=0.1 2023-11-24 14:18:19,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.711e+01 9.316e+01 9.957e+01 1.805e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 14:18:24,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2866020.0, ans=0.2 2023-11-24 14:18:36,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2866086.6666666665, ans=0.125 2023-11-24 14:18:37,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2866086.6666666665, ans=0.125 2023-11-24 14:18:41,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2866153.3333333335, ans=0.125 2023-11-24 14:18:53,960 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9100, loss[loss=0.04095, simple_loss=0.05608, pruned_loss=0.005424, audio_tagging_loss=0.007482, over 14888.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09095, pruned_loss=0.01285, audio_tagging_loss=0.008745, over 3040454.65 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:19:11,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2866286.6666666665, ans=0.125 2023-11-24 14:19:13,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 429950 2023-11-24 14:19:15,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2866286.6666666665, ans=0.1 2023-11-24 14:19:20,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2866353.3333333335, ans=0.125 2023-11-24 14:19:25,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-11-24 14:19:26,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-11-24 14:19:39,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2866420.0, ans=0.1 2023-11-24 14:19:53,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2866486.6666666665, ans=0.2 2023-11-24 14:19:55,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2866553.3333333335, ans=0.0 2023-11-24 14:19:56,572 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9150, loss[loss=0.06094, simple_loss=0.08457, pruned_loss=0.008807, audio_tagging_loss=0.009848, over 15466.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09111, pruned_loss=0.01287, audio_tagging_loss=0.008766, over 3036767.07 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:20:15,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430000 2023-11-24 14:20:23,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.685e+01 9.383e+01 1.043e+02 1.576e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-24 14:20:31,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2866686.6666666665, ans=0.0 2023-11-24 14:20:32,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2023-11-24 14:20:37,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-24 14:20:40,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2866753.3333333335, ans=0.125 2023-11-24 14:20:44,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2866753.3333333335, ans=0.0 2023-11-24 14:20:51,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-24 14:20:53,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2866820.0, ans=0.2 2023-11-24 14:20:58,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2866886.6666666665, ans=0.0 2023-11-24 14:20:59,761 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9200, loss[loss=0.0576, simple_loss=0.07851, pruned_loss=0.01005, audio_tagging_loss=0.008295, over 15249.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09113, pruned_loss=0.01293, audio_tagging_loss=0.008814, over 3039409.04 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:21:01,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2866886.6666666665, ans=0.0 2023-11-24 14:21:02,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2866886.6666666665, ans=0.0 2023-11-24 14:21:13,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2866953.3333333335, ans=0.2 2023-11-24 14:21:18,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430050 2023-11-24 14:21:23,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2867020.0, ans=0.0 2023-11-24 14:21:37,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2867086.6666666665, ans=0.1 2023-11-24 14:21:41,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2867086.6666666665, ans=0.1 2023-11-24 14:22:02,248 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9250, loss[loss=0.07427, simple_loss=0.1014, pruned_loss=0.01615, audio_tagging_loss=0.007417, over 15197.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09156, pruned_loss=0.01313, audio_tagging_loss=0.008714, over 3040922.80 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:22:08,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2867220.0, ans=0.125 2023-11-24 14:22:12,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2867220.0, ans=0.1 2023-11-24 14:22:21,294 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430100 2023-11-24 14:22:22,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-24 14:22:30,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.538e+01 9.195e+01 9.828e+01 1.249e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 14:22:36,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2867353.3333333335, ans=0.2 2023-11-24 14:22:52,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2867486.6666666665, ans=0.07 2023-11-24 14:23:04,562 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9300, loss[loss=0.0665, simple_loss=0.0867, pruned_loss=0.0115, audio_tagging_loss=0.01165, over 14239.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.0914, pruned_loss=0.01305, audio_tagging_loss=0.008726, over 3040459.31 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:23:22,966 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430150 2023-11-24 14:23:44,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2867753.3333333335, ans=0.0 2023-11-24 14:23:49,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2867753.3333333335, ans=0.1 2023-11-24 14:23:58,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2867820.0, ans=0.125 2023-11-24 14:24:06,045 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9350, loss[loss=0.07204, simple_loss=0.09994, pruned_loss=0.0107, audio_tagging_loss=0.01137, over 15130.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09166, pruned_loss=0.01308, audio_tagging_loss=0.00874, over 3042219.23 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:24:13,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2867886.6666666665, ans=0.125 2023-11-24 14:24:24,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-11-24 14:24:25,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430200 2023-11-24 14:24:35,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.521e+01 8.556e+01 8.988e+01 9.800e+01 1.410e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-24 14:24:51,731 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:24:59,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2868153.3333333335, ans=6.0 2023-11-24 14:25:07,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2868220.0, ans=0.07 2023-11-24 14:25:08,884 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9400, loss[loss=0.05907, simple_loss=0.06813, pruned_loss=0.01036, audio_tagging_loss=0.01465, over 15056.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09108, pruned_loss=0.01312, audio_tagging_loss=0.008896, over 3046738.43 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:25:13,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2868220.0, ans=0.125 2023-11-24 14:25:16,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2868220.0, ans=0.0 2023-11-24 14:25:21,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2023-11-24 14:25:27,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430250 2023-11-24 14:25:30,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2023-11-24 14:25:47,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2868420.0, ans=0.125 2023-11-24 14:25:47,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2868420.0, ans=0.2 2023-11-24 14:26:07,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2868486.6666666665, ans=0.025 2023-11-24 14:26:10,691 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:26:11,853 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9450, loss[loss=0.05726, simple_loss=0.0677, pruned_loss=0.01406, audio_tagging_loss=0.009353, over 15985.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09125, pruned_loss=0.01327, audio_tagging_loss=0.008946, over 3053565.25 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:26:25,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2023-11-24 14:26:30,261 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430300 2023-11-24 14:26:39,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.666e+01 9.464e+01 1.022e+02 1.388e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-24 14:26:58,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2868753.3333333335, ans=0.07 2023-11-24 14:27:13,070 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9500, loss[loss=0.07346, simple_loss=0.09775, pruned_loss=0.01577, audio_tagging_loss=0.008807, over 15800.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09147, pruned_loss=0.01326, audio_tagging_loss=0.008942, over 3049786.45 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:27:18,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2868886.6666666665, ans=0.0 2023-11-24 14:27:26,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2868953.3333333335, ans=0.1 2023-11-24 14:27:31,256 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430350 2023-11-24 14:27:57,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2869086.6666666665, ans=0.125 2023-11-24 14:28:14,250 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9550, loss[loss=0.06575, simple_loss=0.08747, pruned_loss=0.01173, audio_tagging_loss=0.01028, over 15656.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09132, pruned_loss=0.01317, audio_tagging_loss=0.008994, over 3055231.83 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:28:18,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-24 14:28:27,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2869286.6666666665, ans=0.125 2023-11-24 14:28:33,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430400 2023-11-24 14:28:43,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.513e+01 9.117e+01 9.826e+01 1.260e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-24 14:29:10,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2869486.6666666665, ans=0.1 2023-11-24 14:29:16,937 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9600, loss[loss=0.06611, simple_loss=0.08611, pruned_loss=0.01163, audio_tagging_loss=0.01143, over 14968.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09146, pruned_loss=0.01309, audio_tagging_loss=0.00908, over 3051591.46 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:29:35,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430450 2023-11-24 14:29:44,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2869686.6666666665, ans=0.02 2023-11-24 14:29:55,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-24 14:30:00,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2869753.3333333335, ans=0.2 2023-11-24 14:30:13,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2869820.0, ans=0.1 2023-11-24 14:30:19,691 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9650, loss[loss=0.04579, simple_loss=0.05613, pruned_loss=0.005761, audio_tagging_loss=0.01196, over 16815.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09085, pruned_loss=0.01293, audio_tagging_loss=0.009116, over 3050680.72 frames. ], batch size: 67, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:30:19,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2869886.6666666665, ans=0.0 2023-11-24 14:30:34,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2869953.3333333335, ans=0.125 2023-11-24 14:30:37,847 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430500 2023-11-24 14:30:49,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.447e+01 9.267e+01 9.901e+01 1.317e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 14:31:05,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2870086.6666666665, ans=0.125 2023-11-24 14:31:06,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-24 14:31:17,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2870153.3333333335, ans=0.0 2023-11-24 14:31:18,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2870153.3333333335, ans=0.125 2023-11-24 14:31:19,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2870153.3333333335, ans=0.07 2023-11-24 14:31:22,783 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9700, loss[loss=0.08036, simple_loss=0.1127, pruned_loss=0.01529, audio_tagging_loss=0.008715, over 15857.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.0911, pruned_loss=0.01297, audio_tagging_loss=0.008961, over 3052686.71 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:31:42,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430550 2023-11-24 14:31:43,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2870286.6666666665, ans=0.2 2023-11-24 14:31:48,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2870353.3333333335, ans=0.125 2023-11-24 14:32:10,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-11-24 14:32:25,791 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9750, loss[loss=0.05439, simple_loss=0.08156, pruned_loss=0.006866, audio_tagging_loss=0.006738, over 14050.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09135, pruned_loss=0.01305, audio_tagging_loss=0.008818, over 3050815.04 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:32:38,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2870620.0, ans=0.1 2023-11-24 14:32:46,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430600 2023-11-24 14:32:54,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2870686.6666666665, ans=0.035 2023-11-24 14:32:54,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2870686.6666666665, ans=0.125 2023-11-24 14:32:56,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2870686.6666666665, ans=0.125 2023-11-24 14:32:57,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.594e+01 9.328e+01 1.010e+02 1.236e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-24 14:33:14,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2870753.3333333335, ans=0.1 2023-11-24 14:33:18,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2870820.0, ans=0.125 2023-11-24 14:33:31,847 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9800, loss[loss=0.04853, simple_loss=0.06441, pruned_loss=0.008569, audio_tagging_loss=0.007753, over 13985.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.0913, pruned_loss=0.01314, audio_tagging_loss=0.008727, over 3053812.47 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:33:50,291 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430650 2023-11-24 14:33:51,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2870953.3333333335, ans=0.125 2023-11-24 14:34:08,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2871086.6666666665, ans=0.125 2023-11-24 14:34:26,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2871153.3333333335, ans=0.125 2023-11-24 14:34:26,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2871153.3333333335, ans=0.125 2023-11-24 14:34:28,051 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:34:28,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2871153.3333333335, ans=0.0 2023-11-24 14:34:34,251 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9850, loss[loss=0.05083, simple_loss=0.06412, pruned_loss=0.01082, audio_tagging_loss=0.007948, over 15402.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09184, pruned_loss=0.01321, audio_tagging_loss=0.008606, over 3053911.99 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:34:43,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-24 14:34:48,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2871286.6666666665, ans=0.2 2023-11-24 14:34:53,805 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430700 2023-11-24 14:34:54,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2871286.6666666665, ans=0.125 2023-11-24 14:34:55,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2871286.6666666665, ans=0.035 2023-11-24 14:35:05,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.366e+01 9.070e+01 9.726e+01 1.220e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 14:35:11,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2871420.0, ans=0.0 2023-11-24 14:35:37,174 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9900, loss[loss=0.07893, simple_loss=0.1109, pruned_loss=0.01496, audio_tagging_loss=0.008508, over 16207.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09211, pruned_loss=0.01332, audio_tagging_loss=0.008635, over 3057621.67 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:35:44,121 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:35:51,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2871620.0, ans=0.1 2023-11-24 14:35:54,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2871620.0, ans=0.125 2023-11-24 14:35:56,919 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430750 2023-11-24 14:35:57,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2871620.0, ans=10.0 2023-11-24 14:36:07,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2871686.6666666665, ans=0.125 2023-11-24 14:36:12,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2871686.6666666665, ans=0.125 2023-11-24 14:36:24,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2871753.3333333335, ans=0.125 2023-11-24 14:36:30,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=22.5 2023-11-24 14:36:35,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2871820.0, ans=0.125 2023-11-24 14:36:38,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2871820.0, ans=0.125 2023-11-24 14:36:40,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2871886.6666666665, ans=0.125 2023-11-24 14:36:40,992 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 9950, loss[loss=0.04939, simple_loss=0.06545, pruned_loss=0.007269, audio_tagging_loss=0.009401, over 15677.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09217, pruned_loss=0.0133, audio_tagging_loss=0.008538, over 3054170.33 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:36:43,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-24 14:36:59,242 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430800 2023-11-24 14:36:59,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2023-11-24 14:37:03,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2871953.3333333335, ans=0.125 2023-11-24 14:37:10,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2872020.0, ans=0.125 2023-11-24 14:37:11,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.477e+01 9.166e+01 9.822e+01 1.194e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 14:37:15,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2872020.0, ans=0.125 2023-11-24 14:37:16,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2872020.0, ans=0.035 2023-11-24 14:37:19,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872086.6666666665, ans=0.1 2023-11-24 14:37:44,588 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10000, loss[loss=0.07639, simple_loss=0.0918, pruned_loss=0.02267, audio_tagging_loss=0.007822, over 14528.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09165, pruned_loss=0.0133, audio_tagging_loss=0.008631, over 3053847.63 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:37:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2872286.6666666665, ans=0.0 2023-11-24 14:38:04,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430850 2023-11-24 14:38:05,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2872286.6666666665, ans=0.125 2023-11-24 14:38:16,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2872353.3333333335, ans=0.0 2023-11-24 14:38:31,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2872420.0, ans=0.2 2023-11-24 14:38:42,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2872486.6666666665, ans=0.0 2023-11-24 14:38:45,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-11-24 14:38:50,174 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10050, loss[loss=0.06824, simple_loss=0.09247, pruned_loss=0.0121, audio_tagging_loss=0.009906, over 16201.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09078, pruned_loss=0.0131, audio_tagging_loss=0.00867, over 3050681.07 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:39:04,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2872620.0, ans=0.125 2023-11-24 14:39:09,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430900 2023-11-24 14:39:18,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2872686.6666666665, ans=0.125 2023-11-24 14:39:20,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.558e+01 9.038e+01 9.635e+01 1.199e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 14:39:21,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2872686.6666666665, ans=0.2 2023-11-24 14:39:40,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2872820.0, ans=0.125 2023-11-24 14:39:46,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2872820.0, ans=0.0 2023-11-24 14:39:54,132 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10100, loss[loss=0.05157, simple_loss=0.07386, pruned_loss=0.006361, audio_tagging_loss=0.008283, over 14692.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.091, pruned_loss=0.01304, audio_tagging_loss=0.008767, over 3053243.29 frames. ], batch size: 53, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:40:01,172 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:40:05,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2872886.6666666665, ans=0.0 2023-11-24 14:40:12,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-24 14:40:13,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 430950 2023-11-24 14:40:32,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-24 14:40:35,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2873086.6666666665, ans=0.0 2023-11-24 14:40:40,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2873086.6666666665, ans=0.0 2023-11-24 14:40:46,839 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:40:47,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-24 14:40:58,562 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10150, loss[loss=0.06132, simple_loss=0.08468, pruned_loss=0.01035, audio_tagging_loss=0.008631, over 14837.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09178, pruned_loss=0.01315, audio_tagging_loss=0.008717, over 3054515.72 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:40:59,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-24 14:41:01,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-24 14:41:07,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-24 14:41:09,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2873286.6666666665, ans=0.0 2023-11-24 14:41:10,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2873286.6666666665, ans=0.125 2023-11-24 14:41:17,703 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431000 2023-11-24 14:41:27,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2873353.3333333335, ans=0.0 2023-11-24 14:41:29,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.738e+01 9.315e+01 1.037e+02 1.393e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 14:41:29,483 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:41:36,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-24 14:42:00,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2873486.6666666665, ans=0.125 2023-11-24 14:42:00,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-24 14:42:03,130 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10200, loss[loss=0.07215, simple_loss=0.09736, pruned_loss=0.01438, audio_tagging_loss=0.00909, over 15194.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09116, pruned_loss=0.01295, audio_tagging_loss=0.008952, over 3044571.96 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:42:04,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2873553.3333333335, ans=0.125 2023-11-24 14:42:22,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-11-24 14:42:22,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431050 2023-11-24 14:42:27,538 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:42:27,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2873686.6666666665, ans=0.0 2023-11-24 14:42:33,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2873686.6666666665, ans=0.07 2023-11-24 14:42:41,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873753.3333333335, ans=0.1 2023-11-24 14:42:42,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2873753.3333333335, ans=0.125 2023-11-24 14:42:57,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2873820.0, ans=0.2 2023-11-24 14:43:06,409 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10250, loss[loss=0.06038, simple_loss=0.07357, pruned_loss=0.01208, audio_tagging_loss=0.01151, over 15819.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09156, pruned_loss=0.01301, audio_tagging_loss=0.009041, over 3055740.11 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 14:43:07,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2873886.6666666665, ans=0.125 2023-11-24 14:43:25,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431100 2023-11-24 14:43:35,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2874020.0, ans=0.125 2023-11-24 14:43:36,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874020.0, ans=0.1 2023-11-24 14:43:38,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=12.0 2023-11-24 14:43:39,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.593e+01 9.288e+01 9.901e+01 1.156e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-24 14:43:43,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2874086.6666666665, ans=0.125 2023-11-24 14:44:10,532 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10300, loss[loss=0.07742, simple_loss=0.1088, pruned_loss=0.01335, audio_tagging_loss=0.009654, over 15601.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09049, pruned_loss=0.01293, audio_tagging_loss=0.00918, over 3048115.39 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 14:44:14,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874220.0, ans=0.1 2023-11-24 14:44:29,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431150 2023-11-24 14:44:34,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2874353.3333333335, ans=0.2 2023-11-24 14:44:45,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2874353.3333333335, ans=10.0 2023-11-24 14:45:12,585 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10350, loss[loss=0.08463, simple_loss=0.1125, pruned_loss=0.02135, audio_tagging_loss=0.007046, over 14844.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09079, pruned_loss=0.01299, audio_tagging_loss=0.009154, over 3044627.38 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 14:45:32,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431200 2023-11-24 14:45:45,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2874686.6666666665, ans=0.125 2023-11-24 14:45:46,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 8.642e+01 9.293e+01 9.950e+01 1.145e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-24 14:46:06,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-24 14:46:17,061 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10400, loss[loss=0.06468, simple_loss=0.08631, pruned_loss=0.01319, audio_tagging_loss=0.008341, over 15746.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09066, pruned_loss=0.01297, audio_tagging_loss=0.009241, over 3042997.43 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:46:17,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2874886.6666666665, ans=0.125 2023-11-24 14:46:20,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=2874886.6666666665, ans=15.0 2023-11-24 14:46:36,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431250 2023-11-24 14:46:37,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2874953.3333333335, ans=0.125 2023-11-24 14:46:53,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-24 14:47:05,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2875086.6666666665, ans=0.0 2023-11-24 14:47:06,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2875086.6666666665, ans=0.0 2023-11-24 14:47:13,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2875153.3333333335, ans=0.0 2023-11-24 14:47:17,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2875153.3333333335, ans=0.125 2023-11-24 14:47:21,091 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10450, loss[loss=0.06822, simple_loss=0.09366, pruned_loss=0.01266, audio_tagging_loss=0.008729, over 13886.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09061, pruned_loss=0.01308, audio_tagging_loss=0.009327, over 3048162.07 frames. ], batch size: 53, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:47:23,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875220.0, ans=0.1 2023-11-24 14:47:27,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2875220.0, ans=0.07 2023-11-24 14:47:39,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431300 2023-11-24 14:47:50,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2875353.3333333335, ans=0.125 2023-11-24 14:47:54,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.463e+01 9.315e+01 9.862e+01 1.314e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 14:48:23,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2875553.3333333335, ans=0.125 2023-11-24 14:48:24,020 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10500, loss[loss=0.04902, simple_loss=0.06121, pruned_loss=0.01019, audio_tagging_loss=0.008226, over 13757.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09098, pruned_loss=0.01331, audio_tagging_loss=0.009126, over 3047161.58 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:48:42,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431350 2023-11-24 14:48:51,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2875686.6666666665, ans=0.125 2023-11-24 14:48:55,525 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 14:49:04,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2875753.3333333335, ans=0.125 2023-11-24 14:49:26,972 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10550, loss[loss=0.06133, simple_loss=0.09189, pruned_loss=0.01065, audio_tagging_loss=0.004741, over 15698.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09207, pruned_loss=0.0134, audio_tagging_loss=0.008904, over 3050583.60 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:49:27,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2875886.6666666665, ans=0.125 2023-11-24 14:49:38,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2875953.3333333335, ans=0.125 2023-11-24 14:49:39,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2875953.3333333335, ans=0.125 2023-11-24 14:49:40,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2875953.3333333335, ans=0.0 2023-11-24 14:49:42,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2023-11-24 14:49:45,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431400 2023-11-24 14:49:58,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.652e+01 9.234e+01 9.954e+01 1.298e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 14:50:13,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2876086.6666666665, ans=0.1 2023-11-24 14:50:29,154 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10600, loss[loss=0.05893, simple_loss=0.08048, pruned_loss=0.01284, audio_tagging_loss=0.005851, over 14955.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09273, pruned_loss=0.01345, audio_tagging_loss=0.00873, over 3051418.50 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:50:39,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2876220.0, ans=0.2 2023-11-24 14:50:48,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431450 2023-11-24 14:50:52,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2876286.6666666665, ans=0.125 2023-11-24 14:51:15,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2876420.0, ans=0.125 2023-11-24 14:51:31,316 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10650, loss[loss=0.05986, simple_loss=0.07693, pruned_loss=0.01124, audio_tagging_loss=0.01016, over 14659.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09278, pruned_loss=0.01334, audio_tagging_loss=0.008655, over 3048411.41 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:51:31,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2876553.3333333335, ans=0.2 2023-11-24 14:51:37,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2876553.3333333335, ans=0.125 2023-11-24 14:51:51,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431500 2023-11-24 14:52:04,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.499e+01 9.165e+01 9.904e+01 1.171e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 14:52:26,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2876820.0, ans=0.0 2023-11-24 14:52:26,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2876820.0, ans=0.125 2023-11-24 14:52:33,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-24 14:52:36,285 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10700, loss[loss=0.06984, simple_loss=0.09389, pruned_loss=0.01723, audio_tagging_loss=0.005659, over 14608.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09212, pruned_loss=0.01314, audio_tagging_loss=0.00865, over 3050716.55 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:52:48,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2876953.3333333335, ans=0.125 2023-11-24 14:52:48,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2876953.3333333335, ans=0.0 2023-11-24 14:52:53,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2876953.3333333335, ans=0.1 2023-11-24 14:52:55,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431550 2023-11-24 14:53:30,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2877153.3333333335, ans=0.025 2023-11-24 14:53:40,852 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10750, loss[loss=0.05617, simple_loss=0.07257, pruned_loss=0.01042, audio_tagging_loss=0.009462, over 13497.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09111, pruned_loss=0.01297, audio_tagging_loss=0.008685, over 3054715.44 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 14:53:59,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431600 2023-11-24 14:54:14,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.628e+01 9.351e+01 9.677e+01 1.250e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-24 14:54:20,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2877420.0, ans=0.09899494936611666 2023-11-24 14:54:38,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-24 14:54:43,926 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10800, loss[loss=0.06915, simple_loss=0.09802, pruned_loss=0.01248, audio_tagging_loss=0.007652, over 15034.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09207, pruned_loss=0.0132, audio_tagging_loss=0.008606, over 3058502.28 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:55:00,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2877620.0, ans=0.0 2023-11-24 14:55:02,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431650 2023-11-24 14:55:13,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2877686.6666666665, ans=0.125 2023-11-24 14:55:13,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2877686.6666666665, ans=0.125 2023-11-24 14:55:20,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2877753.3333333335, ans=0.125 2023-11-24 14:55:20,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2877753.3333333335, ans=0.125 2023-11-24 14:55:24,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2877753.3333333335, ans=0.0 2023-11-24 14:55:35,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2877820.0, ans=0.125 2023-11-24 14:55:44,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2877820.0, ans=0.1 2023-11-24 14:55:46,752 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10850, loss[loss=0.07188, simple_loss=0.09566, pruned_loss=0.01241, audio_tagging_loss=0.01165, over 15129.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09215, pruned_loss=0.0132, audio_tagging_loss=0.00874, over 3059359.25 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:56:03,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2877953.3333333335, ans=0.1 2023-11-24 14:56:05,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431700 2023-11-24 14:56:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2878020.0, ans=0.07 2023-11-24 14:56:18,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.559e+01 9.138e+01 9.969e+01 1.288e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 14:56:36,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-24 14:56:37,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2878153.3333333335, ans=0.125 2023-11-24 14:56:45,765 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:56:49,787 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10900, loss[loss=0.07695, simple_loss=0.1095, pruned_loss=0.014, audio_tagging_loss=0.008209, over 15644.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09277, pruned_loss=0.01327, audio_tagging_loss=0.008784, over 3055050.62 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:57:07,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431750 2023-11-24 14:57:34,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2878420.0, ans=0.2 2023-11-24 14:57:44,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878486.6666666665, ans=0.1 2023-11-24 14:57:51,580 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 10950, loss[loss=0.06228, simple_loss=0.08594, pruned_loss=0.01269, audio_tagging_loss=0.006616, over 14360.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09172, pruned_loss=0.01313, audio_tagging_loss=0.008891, over 3054600.36 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:58:00,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2878553.3333333335, ans=0.125 2023-11-24 14:58:02,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2878620.0, ans=0.125 2023-11-24 14:58:10,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431800 2023-11-24 14:58:15,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2878686.6666666665, ans=0.125 2023-11-24 14:58:20,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2878686.6666666665, ans=0.125 2023-11-24 14:58:24,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.538e+01 9.236e+01 1.007e+02 1.326e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 14:58:32,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-24 14:58:53,761 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11000, loss[loss=0.06196, simple_loss=0.07998, pruned_loss=0.01265, audio_tagging_loss=0.009317, over 14873.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09143, pruned_loss=0.01307, audio_tagging_loss=0.008881, over 3049419.31 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 14:58:59,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-24 14:59:05,658 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 14:59:05,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2878953.3333333335, ans=0.95 2023-11-24 14:59:13,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431850 2023-11-24 14:59:17,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-11-24 14:59:18,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2879020.0, ans=15.0 2023-11-24 14:59:20,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2879020.0, ans=0.0 2023-11-24 14:59:40,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879086.6666666665, ans=0.1 2023-11-24 14:59:55,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2879220.0, ans=0.125 2023-11-24 14:59:57,351 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11050, loss[loss=0.07461, simple_loss=0.1024, pruned_loss=0.01446, audio_tagging_loss=0.008945, over 14066.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09236, pruned_loss=0.01328, audio_tagging_loss=0.008926, over 3043113.79 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:00:08,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2879286.6666666665, ans=0.125 2023-11-24 15:00:10,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2879286.6666666665, ans=0.125 2023-11-24 15:00:15,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431900 2023-11-24 15:00:27,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-24 15:00:28,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.634e+01 9.324e+01 1.017e+02 1.244e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 15:00:44,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-24 15:00:52,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2879486.6666666665, ans=0.1 2023-11-24 15:00:54,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2879486.6666666665, ans=0.125 2023-11-24 15:00:59,014 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11100, loss[loss=0.0484, simple_loss=0.06098, pruned_loss=0.01029, audio_tagging_loss=0.00762, over 14502.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.0915, pruned_loss=0.01305, audio_tagging_loss=0.009101, over 3046451.88 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:01:08,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2879553.3333333335, ans=0.125 2023-11-24 15:01:13,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2879620.0, ans=0.2 2023-11-24 15:01:17,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 431950 2023-11-24 15:01:21,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2879620.0, ans=0.2 2023-11-24 15:01:36,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-24 15:01:57,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2879820.0, ans=0.2 2023-11-24 15:02:00,902 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11150, loss[loss=0.0784, simple_loss=0.1079, pruned_loss=0.01653, audio_tagging_loss=0.007924, over 16323.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09141, pruned_loss=0.01306, audio_tagging_loss=0.009123, over 3049492.43 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:02:06,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2879886.6666666665, ans=0.125 2023-11-24 15:02:14,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2879953.3333333335, ans=0.125 2023-11-24 15:02:21,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432000 2023-11-24 15:02:21,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2879953.3333333335, ans=0.2 2023-11-24 15:02:37,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.446e+01 9.218e+01 9.993e+01 1.214e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 15:02:42,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2880086.6666666665, ans=0.125 2023-11-24 15:02:42,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2880086.6666666665, ans=0.0 2023-11-24 15:02:48,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2880086.6666666665, ans=0.2 2023-11-24 15:02:56,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-24 15:03:07,427 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11200, loss[loss=0.06307, simple_loss=0.08544, pruned_loss=0.01024, audio_tagging_loss=0.0101, over 14828.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09033, pruned_loss=0.01281, audio_tagging_loss=0.009147, over 3041095.35 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:03:25,993 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432050 2023-11-24 15:03:32,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2880353.3333333335, ans=0.125 2023-11-24 15:04:01,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2880486.6666666665, ans=0.125 2023-11-24 15:04:07,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2880486.6666666665, ans=0.125 2023-11-24 15:04:09,703 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11250, loss[loss=0.05675, simple_loss=0.07833, pruned_loss=0.01078, audio_tagging_loss=0.006805, over 14830.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09051, pruned_loss=0.01285, audio_tagging_loss=0.009158, over 3045189.99 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:04:16,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-24 15:04:19,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2880553.3333333335, ans=0.07 2023-11-24 15:04:22,941 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:04:26,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2880620.0, ans=0.125 2023-11-24 15:04:27,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432100 2023-11-24 15:04:41,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.402e+01 9.085e+01 9.896e+01 1.491e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 15:04:46,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2880753.3333333335, ans=0.95 2023-11-24 15:04:47,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=22.5 2023-11-24 15:04:54,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2880753.3333333335, ans=0.0 2023-11-24 15:05:10,589 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11300, loss[loss=0.04667, simple_loss=0.04885, pruned_loss=0.008831, audio_tagging_loss=0.01341, over 15572.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09067, pruned_loss=0.01297, audio_tagging_loss=0.009019, over 3052722.21 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:05:11,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-11-24 15:05:13,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2880886.6666666665, ans=0.0 2023-11-24 15:05:17,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2880886.6666666665, ans=0.125 2023-11-24 15:05:17,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2880886.6666666665, ans=0.0 2023-11-24 15:05:29,598 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432150 2023-11-24 15:05:30,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2880953.3333333335, ans=0.05 2023-11-24 15:05:47,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2881086.6666666665, ans=0.2 2023-11-24 15:05:54,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2881086.6666666665, ans=0.0 2023-11-24 15:06:03,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.58 vs. limit=22.5 2023-11-24 15:06:13,221 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11350, loss[loss=0.07348, simple_loss=0.1017, pruned_loss=0.01528, audio_tagging_loss=0.007345, over 15654.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09012, pruned_loss=0.01303, audio_tagging_loss=0.00895, over 3054847.66 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:06:21,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2881220.0, ans=0.125 2023-11-24 15:06:25,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2881286.6666666665, ans=0.125 2023-11-24 15:06:28,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2881286.6666666665, ans=0.2 2023-11-24 15:06:32,167 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432200 2023-11-24 15:06:38,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2881353.3333333335, ans=0.2 2023-11-24 15:06:45,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.945e+01 8.608e+01 9.253e+01 9.930e+01 1.315e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 15:07:14,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2881486.6666666665, ans=0.125 2023-11-24 15:07:16,358 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11400, loss[loss=0.06102, simple_loss=0.08683, pruned_loss=0.007722, audio_tagging_loss=0.009878, over 15613.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09102, pruned_loss=0.01315, audio_tagging_loss=0.008799, over 3057531.81 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:07:34,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432250 2023-11-24 15:07:58,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2881753.3333333335, ans=0.025 2023-11-24 15:08:05,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-24 15:08:06,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2881820.0, ans=0.0 2023-11-24 15:08:18,292 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11450, loss[loss=0.0668, simple_loss=0.08913, pruned_loss=0.01306, audio_tagging_loss=0.009175, over 14662.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.0908, pruned_loss=0.01315, audio_tagging_loss=0.008761, over 3053441.13 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:08:25,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2881886.6666666665, ans=0.125 2023-11-24 15:08:37,394 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432300 2023-11-24 15:08:51,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.606e+01 9.263e+01 9.916e+01 1.290e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 15:08:53,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2882020.0, ans=0.0 2023-11-24 15:09:03,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2882086.6666666665, ans=0.05 2023-11-24 15:09:08,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-24 15:09:15,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882153.3333333335, ans=0.1 2023-11-24 15:09:20,623 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11500, loss[loss=0.06132, simple_loss=0.07738, pruned_loss=0.01028, audio_tagging_loss=0.01235, over 15511.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.0908, pruned_loss=0.01331, audio_tagging_loss=0.008746, over 3054559.79 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 15:09:39,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432350 2023-11-24 15:09:57,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2882420.0, ans=0.035 2023-11-24 15:10:10,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2882486.6666666665, ans=0.125 2023-11-24 15:10:21,965 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11550, loss[loss=0.05712, simple_loss=0.07756, pruned_loss=0.007521, audio_tagging_loss=0.01082, over 14964.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09016, pruned_loss=0.01297, audio_tagging_loss=0.008878, over 3058744.57 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 15:10:24,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2882553.3333333335, ans=0.07 2023-11-24 15:10:39,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2882620.0, ans=0.0 2023-11-24 15:10:40,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432400 2023-11-24 15:10:41,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2882620.0, ans=0.0 2023-11-24 15:10:55,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.536e+01 9.084e+01 9.886e+01 1.170e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 15:11:00,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2882753.3333333335, ans=0.125 2023-11-24 15:11:01,154 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 15:11:07,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2882753.3333333335, ans=0.125 2023-11-24 15:11:24,101 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11600, loss[loss=0.08614, simple_loss=0.1087, pruned_loss=0.02505, audio_tagging_loss=0.006726, over 14648.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09111, pruned_loss=0.01321, audio_tagging_loss=0.008861, over 3056936.36 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2023-11-24 15:11:33,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2882886.6666666665, ans=0.125 2023-11-24 15:11:38,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2882953.3333333335, ans=0.0 2023-11-24 15:11:42,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2882953.3333333335, ans=0.1 2023-11-24 15:11:43,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432450 2023-11-24 15:11:51,057 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:11:53,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2883020.0, ans=0.125 2023-11-24 15:11:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2883020.0, ans=0.125 2023-11-24 15:12:26,850 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11650, loss[loss=0.07887, simple_loss=0.1076, pruned_loss=0.01808, audio_tagging_loss=0.006978, over 15473.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09118, pruned_loss=0.01322, audio_tagging_loss=0.008775, over 3050106.55 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 15:12:29,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2883220.0, ans=0.2 2023-11-24 15:12:45,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432500 2023-11-24 15:12:53,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2883353.3333333335, ans=0.125 2023-11-24 15:13:01,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.463e+01 9.086e+01 9.673e+01 1.142e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 15:13:24,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2883486.6666666665, ans=0.0 2023-11-24 15:13:28,779 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11700, loss[loss=0.07281, simple_loss=0.092, pruned_loss=0.01634, audio_tagging_loss=0.01046, over 14972.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09095, pruned_loss=0.01315, audio_tagging_loss=0.008878, over 3044542.12 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:13:36,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2883553.3333333335, ans=0.125 2023-11-24 15:13:47,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432550 2023-11-24 15:13:47,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2883620.0, ans=0.2 2023-11-24 15:13:49,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2883620.0, ans=0.125 2023-11-24 15:14:09,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2883753.3333333335, ans=0.125 2023-11-24 15:14:15,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2883753.3333333335, ans=0.0 2023-11-24 15:14:31,231 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11750, loss[loss=0.05891, simple_loss=0.07972, pruned_loss=0.01154, audio_tagging_loss=0.007513, over 14887.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08981, pruned_loss=0.01297, audio_tagging_loss=0.008993, over 3037780.24 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:14:40,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2883886.6666666665, ans=0.2 2023-11-24 15:14:43,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-24 15:14:47,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2883953.3333333335, ans=0.125 2023-11-24 15:14:49,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2883953.3333333335, ans=0.0 2023-11-24 15:14:50,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432600 2023-11-24 15:15:07,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.388e+01 9.250e+01 9.905e+01 1.592e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 15:15:07,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2884086.6666666665, ans=0.2 2023-11-24 15:15:34,420 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11800, loss[loss=0.05534, simple_loss=0.07097, pruned_loss=0.01093, audio_tagging_loss=0.00893, over 15074.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08957, pruned_loss=0.01306, audio_tagging_loss=0.009028, over 3033538.09 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:15:52,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432650 2023-11-24 15:16:03,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2884353.3333333335, ans=0.0 2023-11-24 15:16:28,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2884486.6666666665, ans=0.09899494936611666 2023-11-24 15:16:30,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2884486.6666666665, ans=0.125 2023-11-24 15:16:36,947 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11850, loss[loss=0.06334, simple_loss=0.08438, pruned_loss=0.01088, audio_tagging_loss=0.01028, over 14885.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08974, pruned_loss=0.013, audio_tagging_loss=0.00908, over 3036072.18 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:16:48,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-24 15:16:54,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432700 2023-11-24 15:16:56,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2884620.0, ans=0.07 2023-11-24 15:17:11,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2884686.6666666665, ans=0.2 2023-11-24 15:17:13,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.479e+01 9.210e+01 9.675e+01 1.207e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 15:17:15,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-24 15:17:22,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-24 15:17:22,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-24 15:17:24,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2884753.3333333335, ans=0.07 2023-11-24 15:17:27,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2884820.0, ans=0.125 2023-11-24 15:17:33,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-24 15:17:34,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2884820.0, ans=0.125 2023-11-24 15:17:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2884820.0, ans=0.025 2023-11-24 15:17:38,782 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11900, loss[loss=0.07315, simple_loss=0.1005, pruned_loss=0.01355, audio_tagging_loss=0.009373, over 15614.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09032, pruned_loss=0.01295, audio_tagging_loss=0.009251, over 3040456.15 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:17:40,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=12.0 2023-11-24 15:17:55,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2884953.3333333335, ans=0.04949747468305833 2023-11-24 15:17:58,170 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432750 2023-11-24 15:18:13,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2885020.0, ans=0.125 2023-11-24 15:18:32,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-24 15:18:35,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-24 15:18:40,531 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 11950, loss[loss=0.04903, simple_loss=0.06858, pruned_loss=0.006785, audio_tagging_loss=0.007955, over 16035.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08947, pruned_loss=0.01276, audio_tagging_loss=0.009222, over 3043825.16 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 8.0 2023-11-24 15:19:00,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432800 2023-11-24 15:19:05,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2885353.3333333335, ans=0.2 2023-11-24 15:19:16,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.459e+01 8.985e+01 9.590e+01 1.349e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-24 15:19:26,548 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:19:38,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2023-11-24 15:19:41,561 INFO [train_asr.py:1221] (1/4) Epoch 36, batch 12000, loss[loss=0.07693, simple_loss=0.1011, pruned_loss=0.0164, audio_tagging_loss=0.009956, over 16295.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09091, pruned_loss=0.01302, audio_tagging_loss=0.009215, over 3055558.62 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 16.0 2023-11-24 15:19:41,562 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 15:20:09,322 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4458, 3.7712, 2.9372, 3.7230], device='cuda:1') 2023-11-24 15:20:23,119 INFO [train_asr.py:1253] (1/4) Epoch 36, validation: loss=0.05822, simple_loss=0.05085, pruned_loss=0.005219, audio_tagging_loss=0.02757, over 4681554.00 frames. 2023-11-24 15:20:23,120 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 15:20:28,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2885553.3333333335, ans=0.2 2023-11-24 15:20:39,982 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432850 2023-11-24 15:20:48,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2023-11-24 15:21:26,918 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 0, loss[loss=0.09097, simple_loss=0.1196, pruned_loss=0.01529, audio_tagging_loss=0.01586, over 16389.00 frames. ], tot_loss[loss=0.09097, simple_loss=0.1196, pruned_loss=0.01529, audio_tagging_loss=0.01586, over 16389.00 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:21:26,919 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 15:21:48,129 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3496, 5.0215, 4.7174, 5.1889], device='cuda:1') 2023-11-24 15:22:03,067 INFO [train_asr.py:1253] (1/4) Epoch 37, validation: loss=0.05797, simple_loss=0.05085, pruned_loss=0.005252, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-24 15:22:03,067 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 15:22:07,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2885720.0, ans=0.04949747468305833 2023-11-24 15:22:09,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2885720.0, ans=0.2 2023-11-24 15:22:18,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2885786.6666666665, ans=0.09899494936611666 2023-11-24 15:22:24,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2885786.6666666665, ans=0.125 2023-11-24 15:22:26,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.90 vs. limit=10.0 2023-11-24 15:22:44,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2885920.0, ans=0.1 2023-11-24 15:22:53,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432900 2023-11-24 15:23:01,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2885986.6666666665, ans=0.2 2023-11-24 15:23:06,059 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 50, loss[loss=0.06729, simple_loss=0.07683, pruned_loss=0.008663, audio_tagging_loss=0.02021, over 14868.00 frames. ], tot_loss[loss=0.07344, simple_loss=0.08694, pruned_loss=0.01213, audio_tagging_loss=0.01784, over 686284.38 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:23:06,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2886053.3333333335, ans=0.1 2023-11-24 15:23:10,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 9.190e+01 9.778e+01 1.072e+02 1.495e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-24 15:23:10,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2886053.3333333335, ans=0.0 2023-11-24 15:23:27,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2886120.0, ans=0.125 2023-11-24 15:23:37,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-24 15:23:51,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2886253.3333333335, ans=0.0 2023-11-24 15:23:55,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 432950 2023-11-24 15:24:07,950 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 100, loss[loss=0.08439, simple_loss=0.1037, pruned_loss=0.01597, audio_tagging_loss=0.01655, over 15216.00 frames. ], tot_loss[loss=0.07395, simple_loss=0.08867, pruned_loss=0.01256, audio_tagging_loss=0.01706, over 1205487.96 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:24:11,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2886386.6666666665, ans=0.0 2023-11-24 15:24:14,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886386.6666666665, ans=0.1 2023-11-24 15:24:51,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2023-11-24 15:24:53,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2886586.6666666665, ans=0.1 2023-11-24 15:24:55,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2886586.6666666665, ans=0.0 2023-11-24 15:24:57,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433000 2023-11-24 15:25:09,898 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 150, loss[loss=0.06603, simple_loss=0.07717, pruned_loss=0.01249, audio_tagging_loss=0.01496, over 15105.00 frames. ], tot_loss[loss=0.07245, simple_loss=0.08973, pruned_loss=0.01258, audio_tagging_loss=0.01501, over 1609829.57 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:25:11,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2023-11-24 15:25:14,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-24 15:25:16,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.996e+01 9.560e+01 1.019e+02 1.193e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-24 15:25:45,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2886853.3333333335, ans=0.0 2023-11-24 15:25:45,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2886853.3333333335, ans=0.5 2023-11-24 15:25:59,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433050 2023-11-24 15:26:02,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-24 15:26:03,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2886986.6666666665, ans=0.2 2023-11-24 15:26:05,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-24 15:26:12,771 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 200, loss[loss=0.06912, simple_loss=0.09247, pruned_loss=0.01539, audio_tagging_loss=0.007493, over 14456.00 frames. ], tot_loss[loss=0.07222, simple_loss=0.09211, pruned_loss=0.01305, audio_tagging_loss=0.01311, over 1926041.41 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:26:14,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2887053.3333333335, ans=0.2 2023-11-24 15:26:14,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-24 15:26:29,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2023-11-24 15:26:36,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2887186.6666666665, ans=10.0 2023-11-24 15:26:38,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-11-24 15:27:02,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433100 2023-11-24 15:27:04,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2887320.0, ans=0.125 2023-11-24 15:27:15,136 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 250, loss[loss=0.05613, simple_loss=0.0702, pruned_loss=0.009105, audio_tagging_loss=0.01193, over 15910.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.0921, pruned_loss=0.01306, audio_tagging_loss=0.01189, over 2175256.39 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:27:20,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.812e+01 9.417e+01 1.032e+02 1.666e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-24 15:27:42,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2887520.0, ans=0.0 2023-11-24 15:27:46,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2887520.0, ans=0.0 2023-11-24 15:28:04,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433150 2023-11-24 15:28:16,217 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 300, loss[loss=0.07189, simple_loss=0.1023, pruned_loss=0.01453, audio_tagging_loss=0.006201, over 15194.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.09312, pruned_loss=0.01342, audio_tagging_loss=0.01098, over 2377565.25 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:28:20,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-24 15:28:25,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2887720.0, ans=0.0 2023-11-24 15:28:31,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2887786.6666666665, ans=0.0 2023-11-24 15:28:35,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2887786.6666666665, ans=0.0 2023-11-24 15:28:46,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2887853.3333333335, ans=0.1 2023-11-24 15:28:54,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2887920.0, ans=0.05 2023-11-24 15:29:02,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2887920.0, ans=0.125 2023-11-24 15:29:05,471 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433200 2023-11-24 15:29:08,445 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:29:08,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2887986.6666666665, ans=0.125 2023-11-24 15:29:15,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=8.0 2023-11-24 15:29:16,112 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:29:18,702 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 350, loss[loss=0.08476, simple_loss=0.109, pruned_loss=0.02121, audio_tagging_loss=0.009068, over 15454.00 frames. ], tot_loss[loss=0.06985, simple_loss=0.09248, pruned_loss=0.01331, audio_tagging_loss=0.01029, over 2526867.10 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:29:21,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2888053.3333333335, ans=0.125 2023-11-24 15:29:23,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2888053.3333333335, ans=0.07 2023-11-24 15:29:25,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.531e+01 9.197e+01 9.925e+01 1.241e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 15:29:35,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2888120.0, ans=0.125 2023-11-24 15:29:42,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2888186.6666666665, ans=0.125 2023-11-24 15:29:51,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2888186.6666666665, ans=0.125 2023-11-24 15:29:55,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2888253.3333333335, ans=0.125 2023-11-24 15:30:09,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433250 2023-11-24 15:30:11,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2888320.0, ans=0.0 2023-11-24 15:30:16,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888320.0, ans=0.1 2023-11-24 15:30:16,527 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:30:21,525 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 400, loss[loss=0.05123, simple_loss=0.07255, pruned_loss=0.00646, audio_tagging_loss=0.008491, over 15908.00 frames. ], tot_loss[loss=0.06936, simple_loss=0.09272, pruned_loss=0.01316, audio_tagging_loss=0.009836, over 2647979.46 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:30:21,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2888386.6666666665, ans=0.1 2023-11-24 15:30:23,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2023-11-24 15:30:39,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888453.3333333335, ans=0.1 2023-11-24 15:30:50,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2888520.0, ans=0.0 2023-11-24 15:31:11,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433300 2023-11-24 15:31:11,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2888653.3333333335, ans=0.125 2023-11-24 15:31:23,313 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 450, loss[loss=0.09179, simple_loss=0.129, pruned_loss=0.01942, audio_tagging_loss=0.007849, over 16635.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09188, pruned_loss=0.01301, audio_tagging_loss=0.009627, over 2741350.79 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:31:30,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 8.409e+01 9.030e+01 9.754e+01 1.188e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-24 15:31:55,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-24 15:32:12,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433350 2023-11-24 15:32:23,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-24 15:32:25,459 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 500, loss[loss=0.07715, simple_loss=0.1204, pruned_loss=0.01179, audio_tagging_loss=0.005173, over 15753.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.0908, pruned_loss=0.01279, audio_tagging_loss=0.009558, over 2806636.67 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:32:26,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2889053.3333333335, ans=0.0 2023-11-24 15:32:33,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2889053.3333333335, ans=0.125 2023-11-24 15:32:33,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2889053.3333333335, ans=0.0 2023-11-24 15:32:50,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2889186.6666666665, ans=0.2 2023-11-24 15:33:16,191 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433400 2023-11-24 15:33:22,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2889320.0, ans=0.0 2023-11-24 15:33:28,897 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 550, loss[loss=0.0504, simple_loss=0.06377, pruned_loss=0.008629, audio_tagging_loss=0.009879, over 14917.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09096, pruned_loss=0.01287, audio_tagging_loss=0.009392, over 2852013.77 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:33:35,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2889386.6666666665, ans=0.04949747468305833 2023-11-24 15:33:36,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.673e+01 8.403e+01 9.014e+01 9.757e+01 1.245e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 15:33:56,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2889520.0, ans=0.0 2023-11-24 15:34:18,812 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433450 2023-11-24 15:34:20,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2889653.3333333335, ans=0.2 2023-11-24 15:34:30,538 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 600, loss[loss=0.07006, simple_loss=0.09742, pruned_loss=0.01402, audio_tagging_loss=0.007323, over 15114.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09177, pruned_loss=0.01314, audio_tagging_loss=0.009324, over 2888569.89 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:34:59,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2889853.3333333335, ans=0.0 2023-11-24 15:35:04,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2889853.3333333335, ans=0.09899494936611666 2023-11-24 15:35:08,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2889920.0, ans=0.1 2023-11-24 15:35:10,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2023-11-24 15:35:14,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2889920.0, ans=0.0 2023-11-24 15:35:20,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433500 2023-11-24 15:35:33,123 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 650, loss[loss=0.08862, simple_loss=0.1292, pruned_loss=0.0174, audio_tagging_loss=0.006629, over 15876.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.09239, pruned_loss=0.01321, audio_tagging_loss=0.009217, over 2928448.04 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:35:38,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2890053.3333333335, ans=0.125 2023-11-24 15:35:40,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2023-11-24 15:35:40,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.381e+01 9.134e+01 9.904e+01 1.320e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 15:35:56,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2890120.0, ans=0.2 2023-11-24 15:36:23,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433550 2023-11-24 15:36:29,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2890320.0, ans=0.0 2023-11-24 15:36:32,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-24 15:36:35,534 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 700, loss[loss=0.04819, simple_loss=0.06105, pruned_loss=0.008283, audio_tagging_loss=0.009386, over 15065.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09192, pruned_loss=0.01305, audio_tagging_loss=0.009179, over 2952601.21 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:36:38,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2890386.6666666665, ans=0.125 2023-11-24 15:36:47,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:36:49,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2890453.3333333335, ans=0.2 2023-11-24 15:37:25,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433600 2023-11-24 15:37:28,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2890653.3333333335, ans=0.0 2023-11-24 15:37:38,628 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 750, loss[loss=0.07761, simple_loss=0.1147, pruned_loss=0.01267, audio_tagging_loss=0.007599, over 15649.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.0919, pruned_loss=0.01303, audio_tagging_loss=0.009181, over 2978805.02 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:37:42,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2890720.0, ans=0.125 2023-11-24 15:37:43,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2890720.0, ans=0.2 2023-11-24 15:37:45,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.622e+01 9.264e+01 1.020e+02 2.352e+02, threshold=1.853e+02, percent-clipped=1.0 2023-11-24 15:37:46,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2890720.0, ans=0.1 2023-11-24 15:38:07,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-24 15:38:28,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433650 2023-11-24 15:38:40,553 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 800, loss[loss=0.05492, simple_loss=0.07106, pruned_loss=0.01016, audio_tagging_loss=0.00923, over 14776.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09136, pruned_loss=0.01296, audio_tagging_loss=0.009232, over 2998343.10 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:38:46,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2023-11-24 15:38:51,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2891053.3333333335, ans=0.0 2023-11-24 15:38:59,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-11-24 15:39:14,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2891186.6666666665, ans=10.0 2023-11-24 15:39:17,936 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:39:31,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433700 2023-11-24 15:39:36,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891320.0, ans=0.1 2023-11-24 15:39:43,397 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 850, loss[loss=0.07651, simple_loss=0.1049, pruned_loss=0.01329, audio_tagging_loss=0.01077, over 15660.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.0914, pruned_loss=0.01294, audio_tagging_loss=0.009256, over 3005009.28 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:39:43,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2891386.6666666665, ans=0.0 2023-11-24 15:39:51,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.494e+01 9.223e+01 9.765e+01 1.207e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-24 15:40:06,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=8.0 2023-11-24 15:40:15,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2891520.0, ans=0.125 2023-11-24 15:40:21,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2891586.6666666665, ans=0.1 2023-11-24 15:40:24,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2891586.6666666665, ans=0.125 2023-11-24 15:40:33,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433750 2023-11-24 15:40:45,660 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 900, loss[loss=0.07069, simple_loss=0.09748, pruned_loss=0.01414, audio_tagging_loss=0.007808, over 14648.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09101, pruned_loss=0.01311, audio_tagging_loss=0.00922, over 3007766.57 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:40:54,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891720.0, ans=0.1 2023-11-24 15:41:22,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2891920.0, ans=0.0 2023-11-24 15:41:35,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433800 2023-11-24 15:41:48,025 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 950, loss[loss=0.06429, simple_loss=0.08193, pruned_loss=0.01323, audio_tagging_loss=0.0101, over 14104.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09187, pruned_loss=0.01314, audio_tagging_loss=0.009094, over 3022990.31 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:41:57,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.595e+01 9.194e+01 9.952e+01 1.132e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 15:42:15,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2892186.6666666665, ans=0.125 2023-11-24 15:42:21,046 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:42:28,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.90 vs. limit=6.0 2023-11-24 15:42:31,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2892253.3333333335, ans=0.125 2023-11-24 15:42:35,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-24 15:42:38,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433850 2023-11-24 15:42:51,321 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1000, loss[loss=0.05241, simple_loss=0.07344, pruned_loss=0.008231, audio_tagging_loss=0.007458, over 15034.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09093, pruned_loss=0.01298, audio_tagging_loss=0.009021, over 3024308.21 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:42:51,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2892386.6666666665, ans=0.0 2023-11-24 15:43:02,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2892453.3333333335, ans=0.125 2023-11-24 15:43:05,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2892453.3333333335, ans=0.125 2023-11-24 15:43:17,440 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 15:43:40,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433900 2023-11-24 15:43:52,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2892720.0, ans=0.125 2023-11-24 15:43:52,916 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1050, loss[loss=0.05363, simple_loss=0.06529, pruned_loss=0.007692, audio_tagging_loss=0.01329, over 14265.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08966, pruned_loss=0.01276, audio_tagging_loss=0.008992, over 3030802.39 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:43:53,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2892720.0, ans=0.015 2023-11-24 15:44:01,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.504e+01 9.256e+01 1.007e+02 1.365e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 15:44:08,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2023-11-24 15:44:30,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2892920.0, ans=0.1 2023-11-24 15:44:34,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2892920.0, ans=0.125 2023-11-24 15:44:42,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 433950 2023-11-24 15:44:53,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2023-11-24 15:44:55,273 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1100, loss[loss=0.06159, simple_loss=0.09347, pruned_loss=0.007658, audio_tagging_loss=0.007198, over 15998.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09063, pruned_loss=0.01299, audio_tagging_loss=0.00883, over 3039379.17 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:44:57,709 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 15:45:04,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-24 15:45:11,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2893120.0, ans=0.2 2023-11-24 15:45:45,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434000 2023-11-24 15:45:46,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2893320.0, ans=0.125 2023-11-24 15:45:58,259 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1150, loss[loss=0.06192, simple_loss=0.08271, pruned_loss=0.01074, audio_tagging_loss=0.009824, over 16401.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09042, pruned_loss=0.01287, audio_tagging_loss=0.008793, over 3042604.68 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:46:06,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.608e+01 9.158e+01 9.709e+01 1.539e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 15:46:18,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2893453.3333333335, ans=15.0 2023-11-24 15:46:40,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2893586.6666666665, ans=0.125 2023-11-24 15:46:48,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434050 2023-11-24 15:46:50,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2893653.3333333335, ans=0.2 2023-11-24 15:46:54,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2893653.3333333335, ans=0.125 2023-11-24 15:47:00,392 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1200, loss[loss=0.0579, simple_loss=0.08368, pruned_loss=0.008223, audio_tagging_loss=0.00784, over 14849.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09029, pruned_loss=0.01282, audio_tagging_loss=0.00877, over 3041744.29 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:47:11,286 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 15:47:50,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434100 2023-11-24 15:48:01,995 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1250, loss[loss=0.08265, simple_loss=0.114, pruned_loss=0.01856, audio_tagging_loss=0.007093, over 15370.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09102, pruned_loss=0.01309, audio_tagging_loss=0.008766, over 3039810.34 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:48:03,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2894053.3333333335, ans=0.125 2023-11-24 15:48:11,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.496e+01 8.998e+01 9.780e+01 2.107e+02, threshold=1.800e+02, percent-clipped=1.0 2023-11-24 15:48:51,834 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434150 2023-11-24 15:49:02,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2894320.0, ans=0.025 2023-11-24 15:49:05,241 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1300, loss[loss=0.05776, simple_loss=0.07039, pruned_loss=0.01303, audio_tagging_loss=0.009533, over 15110.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09081, pruned_loss=0.0131, audio_tagging_loss=0.008626, over 3034304.97 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:49:24,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2894453.3333333335, ans=0.0 2023-11-24 15:49:34,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2894520.0, ans=0.0 2023-11-24 15:49:49,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2894586.6666666665, ans=0.125 2023-11-24 15:49:52,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2894586.6666666665, ans=10.0 2023-11-24 15:49:54,812 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434200 2023-11-24 15:50:07,479 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1350, loss[loss=0.05413, simple_loss=0.07518, pruned_loss=0.00845, audio_tagging_loss=0.008093, over 14576.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09031, pruned_loss=0.01301, audio_tagging_loss=0.008643, over 3039065.91 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:50:07,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2894720.0, ans=0.2 2023-11-24 15:50:15,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.208e+01 8.839e+01 9.831e+01 1.172e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-24 15:50:17,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2894720.0, ans=0.125 2023-11-24 15:50:21,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2894786.6666666665, ans=0.1 2023-11-24 15:50:35,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2894853.3333333335, ans=0.125 2023-11-24 15:50:41,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-24 15:50:51,213 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 15:50:52,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2894920.0, ans=0.125 2023-11-24 15:50:56,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434250 2023-11-24 15:50:56,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2894986.6666666665, ans=0.0 2023-11-24 15:51:01,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-24 15:51:02,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2894986.6666666665, ans=0.2 2023-11-24 15:51:08,230 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1400, loss[loss=0.08402, simple_loss=0.1192, pruned_loss=0.0167, audio_tagging_loss=0.007714, over 16224.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09104, pruned_loss=0.01303, audio_tagging_loss=0.008663, over 3038075.08 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:51:19,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2895120.0, ans=0.0 2023-11-24 15:51:29,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2895120.0, ans=0.0 2023-11-24 15:51:32,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2023-11-24 15:51:37,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2895186.6666666665, ans=0.125 2023-11-24 15:51:42,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-24 15:51:53,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2895253.3333333335, ans=0.025 2023-11-24 15:51:57,612 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434300 2023-11-24 15:52:10,530 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1450, loss[loss=0.07219, simple_loss=0.09684, pruned_loss=0.0118, audio_tagging_loss=0.01197, over 16159.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09079, pruned_loss=0.01319, audio_tagging_loss=0.008815, over 3040789.10 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:52:19,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-24 15:52:20,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.122e+01 8.571e+01 9.242e+01 1.000e+02 1.352e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 15:52:37,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2895520.0, ans=0.0 2023-11-24 15:53:00,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434350 2023-11-24 15:53:02,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2895653.3333333335, ans=0.0 2023-11-24 15:53:02,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2895653.3333333335, ans=0.1 2023-11-24 15:53:05,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2895653.3333333335, ans=0.125 2023-11-24 15:53:11,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2895720.0, ans=0.125 2023-11-24 15:53:12,006 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1500, loss[loss=0.08736, simple_loss=0.1206, pruned_loss=0.02056, audio_tagging_loss=0.006482, over 15397.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09038, pruned_loss=0.01323, audio_tagging_loss=0.008977, over 3039072.86 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:53:12,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2895720.0, ans=0.5 2023-11-24 15:53:32,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2895786.6666666665, ans=6.0 2023-11-24 15:53:37,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2895853.3333333335, ans=0.2 2023-11-24 15:53:43,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2895853.3333333335, ans=0.125 2023-11-24 15:53:43,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2895853.3333333335, ans=0.125 2023-11-24 15:53:55,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2895920.0, ans=0.0 2023-11-24 15:53:57,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2895920.0, ans=0.125 2023-11-24 15:54:01,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434400 2023-11-24 15:54:09,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2895986.6666666665, ans=0.0 2023-11-24 15:54:14,014 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1550, loss[loss=0.04877, simple_loss=0.06206, pruned_loss=0.007291, audio_tagging_loss=0.01045, over 14521.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.0908, pruned_loss=0.01323, audio_tagging_loss=0.00896, over 3041607.09 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:54:23,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.776e+01 9.350e+01 1.010e+02 1.250e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-24 15:54:40,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2896186.6666666665, ans=0.1 2023-11-24 15:54:54,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2896253.3333333335, ans=10.0 2023-11-24 15:55:03,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434450 2023-11-24 15:55:16,191 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1600, loss[loss=0.08484, simple_loss=0.1214, pruned_loss=0.01452, audio_tagging_loss=0.009604, over 15660.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09102, pruned_loss=0.0131, audio_tagging_loss=0.009022, over 3043774.70 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:55:20,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2896386.6666666665, ans=0.1 2023-11-24 15:55:55,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2896586.6666666665, ans=0.1 2023-11-24 15:55:56,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2896586.6666666665, ans=0.125 2023-11-24 15:55:57,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2896586.6666666665, ans=0.125 2023-11-24 15:56:05,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434500 2023-11-24 15:56:17,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-11-24 15:56:18,320 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1650, loss[loss=0.05484, simple_loss=0.0771, pruned_loss=0.008972, audio_tagging_loss=0.007321, over 15243.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09068, pruned_loss=0.01295, audio_tagging_loss=0.009114, over 3046609.91 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 15:56:19,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2896720.0, ans=0.125 2023-11-24 15:56:28,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.575e+01 9.140e+01 1.003e+02 1.202e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 15:56:32,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2896786.6666666665, ans=0.125 2023-11-24 15:56:45,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2896853.3333333335, ans=10.0 2023-11-24 15:57:05,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-24 15:57:08,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434550 2023-11-24 15:57:14,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2896986.6666666665, ans=0.1 2023-11-24 15:57:20,450 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1700, loss[loss=0.06946, simple_loss=0.09469, pruned_loss=0.01072, audio_tagging_loss=0.01139, over 15396.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09173, pruned_loss=0.01297, audio_tagging_loss=0.009075, over 3051119.59 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:57:56,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2897253.3333333335, ans=0.125 2023-11-24 15:58:01,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2897253.3333333335, ans=0.125 2023-11-24 15:58:06,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2897253.3333333335, ans=0.1 2023-11-24 15:58:10,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434600 2023-11-24 15:58:19,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2897320.0, ans=0.125 2023-11-24 15:58:22,740 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1750, loss[loss=0.0608, simple_loss=0.07903, pruned_loss=0.01202, audio_tagging_loss=0.009265, over 16426.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09216, pruned_loss=0.01303, audio_tagging_loss=0.008972, over 3062037.32 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:58:31,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2897386.6666666665, ans=0.04949747468305833 2023-11-24 15:58:33,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.384e+01 9.014e+01 9.814e+01 1.863e+02, threshold=1.803e+02, percent-clipped=1.0 2023-11-24 15:58:35,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2897453.3333333335, ans=0.2 2023-11-24 15:58:43,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2897453.3333333335, ans=0.125 2023-11-24 15:58:44,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-24 15:58:47,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2897520.0, ans=0.125 2023-11-24 15:58:51,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2023-11-24 15:58:54,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2897520.0, ans=0.2 2023-11-24 15:59:12,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434650 2023-11-24 15:59:16,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2897653.3333333335, ans=0.035 2023-11-24 15:59:21,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2897653.3333333335, ans=0.125 2023-11-24 15:59:24,851 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1800, loss[loss=0.06041, simple_loss=0.0769, pruned_loss=0.01309, audio_tagging_loss=0.00887, over 14360.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09266, pruned_loss=0.01316, audio_tagging_loss=0.008863, over 3054675.56 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 15:59:39,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2897786.6666666665, ans=0.125 2023-11-24 15:59:55,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2897853.3333333335, ans=0.125 2023-11-24 15:59:57,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-11-24 16:00:01,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2897920.0, ans=0.125 2023-11-24 16:00:04,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=22.5 2023-11-24 16:00:09,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-24 16:00:10,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2023-11-24 16:00:12,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2897920.0, ans=0.0 2023-11-24 16:00:15,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434700 2023-11-24 16:00:18,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2897986.6666666665, ans=0.2 2023-11-24 16:00:27,533 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1850, loss[loss=0.04082, simple_loss=0.04425, pruned_loss=0.007202, audio_tagging_loss=0.01149, over 14623.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09238, pruned_loss=0.01311, audio_tagging_loss=0.008879, over 3057040.83 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:00:35,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2898053.3333333335, ans=0.0 2023-11-24 16:00:38,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.698e+01 9.278e+01 9.936e+01 1.415e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-24 16:00:42,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2898120.0, ans=0.125 2023-11-24 16:01:01,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2023-11-24 16:01:15,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2898253.3333333335, ans=0.125 2023-11-24 16:01:17,889 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434750 2023-11-24 16:01:21,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2898320.0, ans=0.0 2023-11-24 16:01:29,989 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1900, loss[loss=0.0781, simple_loss=0.102, pruned_loss=0.01771, audio_tagging_loss=0.009396, over 14558.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.0924, pruned_loss=0.01297, audio_tagging_loss=0.00877, over 3057204.11 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:01:35,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2898386.6666666665, ans=0.0 2023-11-24 16:01:37,933 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:02:20,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434800 2023-11-24 16:02:20,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2898653.3333333335, ans=0.125 2023-11-24 16:02:32,975 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 1950, loss[loss=0.05951, simple_loss=0.07653, pruned_loss=0.01028, audio_tagging_loss=0.01096, over 14378.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09213, pruned_loss=0.01314, audio_tagging_loss=0.008744, over 3054840.17 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:02:44,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.590e+01 9.514e+01 1.026e+02 1.248e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-24 16:03:00,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-24 16:03:05,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:03:22,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434850 2023-11-24 16:03:35,530 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2000, loss[loss=0.06527, simple_loss=0.09241, pruned_loss=0.01036, audio_tagging_loss=0.008705, over 15374.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09096, pruned_loss=0.0131, audio_tagging_loss=0.008783, over 3053859.87 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:03:55,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899120.0, ans=0.1 2023-11-24 16:04:24,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-24 16:04:25,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434900 2023-11-24 16:04:34,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2899320.0, ans=0.125 2023-11-24 16:04:35,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-24 16:04:36,757 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2050, loss[loss=0.0657, simple_loss=0.09165, pruned_loss=0.01087, audio_tagging_loss=0.009001, over 15358.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09152, pruned_loss=0.01311, audio_tagging_loss=0.008802, over 3050801.16 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:04:39,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2899386.6666666665, ans=0.0 2023-11-24 16:04:49,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.636e+01 9.159e+01 9.953e+01 1.239e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 16:04:51,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2899453.3333333335, ans=0.0 2023-11-24 16:04:51,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2899453.3333333335, ans=0.125 2023-11-24 16:04:53,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2899453.3333333335, ans=0.0 2023-11-24 16:04:59,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2899453.3333333335, ans=0.0 2023-11-24 16:05:06,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2899520.0, ans=0.125 2023-11-24 16:05:10,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2899520.0, ans=10.0 2023-11-24 16:05:27,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 434950 2023-11-24 16:05:39,836 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2100, loss[loss=0.09231, simple_loss=0.142, pruned_loss=0.0169, audio_tagging_loss=0.004383, over 16162.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09149, pruned_loss=0.01311, audio_tagging_loss=0.008772, over 3046415.69 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:05:43,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-24 16:05:50,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2899786.6666666665, ans=0.125 2023-11-24 16:06:21,027 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:06:23,309 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:06:26,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2899920.0, ans=0.125 2023-11-24 16:06:29,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435000 2023-11-24 16:06:42,424 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2150, loss[loss=0.07049, simple_loss=0.09502, pruned_loss=0.01448, audio_tagging_loss=0.008495, over 15061.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09195, pruned_loss=0.01323, audio_tagging_loss=0.008773, over 3050819.95 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:06:52,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2900053.3333333335, ans=0.1 2023-11-24 16:06:54,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.565e+01 9.008e+01 9.643e+01 1.315e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-24 16:06:55,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-24 16:07:10,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2023-11-24 16:07:16,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2900186.6666666665, ans=0.015 2023-11-24 16:07:19,830 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:07:32,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435050 2023-11-24 16:07:44,812 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2200, loss[loss=0.07485, simple_loss=0.09843, pruned_loss=0.01574, audio_tagging_loss=0.009896, over 16253.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.0919, pruned_loss=0.01334, audio_tagging_loss=0.008783, over 3048799.35 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:07:57,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2900453.3333333335, ans=0.0 2023-11-24 16:08:34,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435100 2023-11-24 16:08:42,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2023-11-24 16:08:47,283 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2250, loss[loss=0.06865, simple_loss=0.09283, pruned_loss=0.01195, audio_tagging_loss=0.01028, over 15186.00 frames. ], tot_loss[loss=0.06881, simple_loss=0.09312, pruned_loss=0.01348, audio_tagging_loss=0.008775, over 3049567.42 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:08:57,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2900720.0, ans=0.0 2023-11-24 16:08:59,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.499e+01 9.314e+01 1.008e+02 1.259e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 16:09:18,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-24 16:09:20,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-24 16:09:21,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2900853.3333333335, ans=0.125 2023-11-24 16:09:21,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2900853.3333333335, ans=0.0 2023-11-24 16:09:29,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2900920.0, ans=0.125 2023-11-24 16:09:37,543 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435150 2023-11-24 16:09:49,107 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2300, loss[loss=0.06308, simple_loss=0.07632, pruned_loss=0.01167, audio_tagging_loss=0.01324, over 15399.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.09383, pruned_loss=0.0137, audio_tagging_loss=0.008814, over 3051093.16 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:10:19,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-24 16:10:39,100 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435200 2023-11-24 16:10:44,189 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:10:51,733 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2350, loss[loss=0.05658, simple_loss=0.07193, pruned_loss=0.008157, audio_tagging_loss=0.01245, over 14565.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09262, pruned_loss=0.01353, audio_tagging_loss=0.008964, over 3044133.06 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:11:04,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.491e+01 9.098e+01 9.735e+01 1.165e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 16:11:13,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2901453.3333333335, ans=0.2 2023-11-24 16:11:18,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2901520.0, ans=0.125 2023-11-24 16:11:20,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2901520.0, ans=0.0 2023-11-24 16:11:26,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2901520.0, ans=0.125 2023-11-24 16:11:41,195 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435250 2023-11-24 16:11:43,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-24 16:11:47,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901653.3333333335, ans=0.1 2023-11-24 16:11:53,568 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2400, loss[loss=0.05347, simple_loss=0.06883, pruned_loss=0.009777, audio_tagging_loss=0.009277, over 15206.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09081, pruned_loss=0.01305, audio_tagging_loss=0.009092, over 3046561.26 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:11:54,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2023-11-24 16:12:06,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2901786.6666666665, ans=0.0 2023-11-24 16:12:11,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2901786.6666666665, ans=0.0 2023-11-24 16:12:15,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2901786.6666666665, ans=0.0 2023-11-24 16:12:18,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901853.3333333335, ans=0.0 2023-11-24 16:12:25,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2901853.3333333335, ans=0.1 2023-11-24 16:12:30,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2901920.0, ans=0.125 2023-11-24 16:12:44,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435300 2023-11-24 16:12:55,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-24 16:12:56,199 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2450, loss[loss=0.0692, simple_loss=0.1011, pruned_loss=0.01029, audio_tagging_loss=0.00838, over 14987.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.0913, pruned_loss=0.01304, audio_tagging_loss=0.009165, over 3047229.07 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:13:01,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2902053.3333333335, ans=0.0 2023-11-24 16:13:06,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-11-24 16:13:09,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.651e+01 9.072e+01 9.865e+01 1.248e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 16:13:20,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2902186.6666666665, ans=0.0 2023-11-24 16:13:46,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435350 2023-11-24 16:13:50,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2902320.0, ans=0.07 2023-11-24 16:13:51,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2902320.0, ans=22.5 2023-11-24 16:13:53,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2902320.0, ans=0.125 2023-11-24 16:13:58,427 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2500, loss[loss=0.07143, simple_loss=0.09058, pruned_loss=0.01867, audio_tagging_loss=0.007473, over 14630.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09101, pruned_loss=0.01292, audio_tagging_loss=0.009194, over 3046937.66 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:14:06,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2902386.6666666665, ans=0.0 2023-11-24 16:14:08,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2902386.6666666665, ans=0.2 2023-11-24 16:14:19,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2902453.3333333335, ans=0.0 2023-11-24 16:14:22,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2902520.0, ans=0.0 2023-11-24 16:14:31,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2902520.0, ans=0.0 2023-11-24 16:14:48,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435400 2023-11-24 16:14:49,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2902653.3333333335, ans=0.2 2023-11-24 16:14:59,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2902653.3333333335, ans=0.125 2023-11-24 16:15:01,950 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2550, loss[loss=0.07448, simple_loss=0.1063, pruned_loss=0.01037, audio_tagging_loss=0.01096, over 15495.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09016, pruned_loss=0.01289, audio_tagging_loss=0.009073, over 3043790.72 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:15:02,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2902720.0, ans=0.125 2023-11-24 16:15:14,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2902786.6666666665, ans=0.1 2023-11-24 16:15:15,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.712e+01 9.254e+01 9.868e+01 2.546e+02, threshold=1.851e+02, percent-clipped=1.0 2023-11-24 16:15:28,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902853.3333333335, ans=0.1 2023-11-24 16:15:51,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435450 2023-11-24 16:16:03,749 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2600, loss[loss=0.06104, simple_loss=0.08177, pruned_loss=0.01316, audio_tagging_loss=0.006998, over 14554.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09073, pruned_loss=0.01298, audio_tagging_loss=0.008834, over 3038303.08 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:16:06,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2903053.3333333335, ans=0.2 2023-11-24 16:16:07,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2903053.3333333335, ans=0.2 2023-11-24 16:16:09,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-24 16:16:15,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2903120.0, ans=0.125 2023-11-24 16:16:36,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2903186.6666666665, ans=0.125 2023-11-24 16:16:53,224 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435500 2023-11-24 16:17:04,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2903386.6666666665, ans=0.125 2023-11-24 16:17:05,538 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2650, loss[loss=0.05917, simple_loss=0.08021, pruned_loss=0.009902, audio_tagging_loss=0.009159, over 15564.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09099, pruned_loss=0.01302, audio_tagging_loss=0.008801, over 3039907.50 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:17:15,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2903386.6666666665, ans=0.125 2023-11-24 16:17:18,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.439e+01 8.953e+01 1.003e+02 3.059e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-24 16:17:53,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2903653.3333333335, ans=0.0 2023-11-24 16:17:54,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435550 2023-11-24 16:18:06,099 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2700, loss[loss=0.05433, simple_loss=0.06551, pruned_loss=0.0106, audio_tagging_loss=0.01097, over 14333.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09085, pruned_loss=0.01306, audio_tagging_loss=0.00884, over 3050915.45 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:18:06,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2903720.0, ans=0.5 2023-11-24 16:18:09,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2903720.0, ans=0.0 2023-11-24 16:18:16,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2903720.0, ans=0.025 2023-11-24 16:18:17,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2903720.0, ans=0.0 2023-11-24 16:18:33,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2903853.3333333335, ans=0.0 2023-11-24 16:18:51,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-24 16:18:56,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435600 2023-11-24 16:18:58,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2903986.6666666665, ans=0.125 2023-11-24 16:19:09,895 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2750, loss[loss=0.08939, simple_loss=0.1248, pruned_loss=0.01944, audio_tagging_loss=0.00752, over 14791.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09092, pruned_loss=0.01321, audio_tagging_loss=0.008792, over 3060314.14 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 8.0 2023-11-24 16:19:17,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.83 vs. limit=15.0 2023-11-24 16:19:24,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.630e+01 9.216e+01 9.892e+01 1.188e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 16:19:28,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-11-24 16:19:30,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2904120.0, ans=0.125 2023-11-24 16:19:33,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2904186.6666666665, ans=0.2 2023-11-24 16:19:37,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2904186.6666666665, ans=0.125 2023-11-24 16:19:48,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2904253.3333333335, ans=0.125 2023-11-24 16:19:57,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2904253.3333333335, ans=0.1 2023-11-24 16:19:59,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435650 2023-11-24 16:20:02,104 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:20:10,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2904320.0, ans=0.125 2023-11-24 16:20:12,152 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2800, loss[loss=0.04441, simple_loss=0.04766, pruned_loss=0.008928, audio_tagging_loss=0.01165, over 15613.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09014, pruned_loss=0.01307, audio_tagging_loss=0.008845, over 3053549.28 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:20:13,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2904386.6666666665, ans=0.0 2023-11-24 16:20:20,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2904386.6666666665, ans=0.2 2023-11-24 16:20:23,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2904453.3333333335, ans=0.0 2023-11-24 16:20:34,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2904453.3333333335, ans=0.09899494936611666 2023-11-24 16:21:01,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435700 2023-11-24 16:21:05,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2904653.3333333335, ans=0.0 2023-11-24 16:21:13,365 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2850, loss[loss=0.05686, simple_loss=0.08158, pruned_loss=0.009123, audio_tagging_loss=0.00695, over 14625.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08963, pruned_loss=0.01293, audio_tagging_loss=0.008814, over 3053810.00 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:21:13,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2904720.0, ans=0.125 2023-11-24 16:21:29,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.333e+01 9.111e+01 9.785e+01 2.308e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-24 16:21:35,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2904786.6666666665, ans=0.125 2023-11-24 16:21:39,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2904853.3333333335, ans=0.125 2023-11-24 16:21:49,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2904853.3333333335, ans=0.0 2023-11-24 16:21:52,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2904920.0, ans=0.0 2023-11-24 16:21:53,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2904920.0, ans=0.0 2023-11-24 16:21:54,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2904920.0, ans=0.2 2023-11-24 16:21:58,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2904920.0, ans=0.0 2023-11-24 16:22:03,265 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435750 2023-11-24 16:22:16,932 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2900, loss[loss=0.05824, simple_loss=0.07043, pruned_loss=0.01266, audio_tagging_loss=0.01036, over 14741.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09008, pruned_loss=0.01299, audio_tagging_loss=0.008815, over 3050144.73 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:22:38,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-24 16:22:48,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2905186.6666666665, ans=0.09899494936611666 2023-11-24 16:23:00,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2905253.3333333335, ans=0.125 2023-11-24 16:23:06,463 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435800 2023-11-24 16:23:09,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2905320.0, ans=0.125 2023-11-24 16:23:19,019 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 2950, loss[loss=0.0504, simple_loss=0.071, pruned_loss=0.006974, audio_tagging_loss=0.00793, over 14989.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09071, pruned_loss=0.01305, audio_tagging_loss=0.008808, over 3047938.42 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:23:27,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2905386.6666666665, ans=0.2 2023-11-24 16:23:33,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.426e+01 9.166e+01 9.683e+01 1.219e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 16:23:35,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2905453.3333333335, ans=0.1 2023-11-24 16:23:38,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2905453.3333333335, ans=0.125 2023-11-24 16:23:41,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2905453.3333333335, ans=0.125 2023-11-24 16:23:42,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-24 16:23:54,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2905520.0, ans=0.0 2023-11-24 16:24:03,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2905586.6666666665, ans=0.1 2023-11-24 16:24:08,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2905653.3333333335, ans=0.2 2023-11-24 16:24:09,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435850 2023-11-24 16:24:15,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-11-24 16:24:21,007 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3000, loss[loss=0.05759, simple_loss=0.07702, pruned_loss=0.008291, audio_tagging_loss=0.01079, over 14056.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09036, pruned_loss=0.0129, audio_tagging_loss=0.008905, over 3044360.43 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:24:21,008 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 16:24:39,917 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7456, 4.4074, 4.1065, 4.1994], device='cuda:1') 2023-11-24 16:25:01,970 INFO [train_asr.py:1253] (1/4) Epoch 37, validation: loss=0.05757, simple_loss=0.05085, pruned_loss=0.005185, audio_tagging_loss=0.02697, over 4681554.00 frames. 2023-11-24 16:25:01,971 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 16:25:03,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2905720.0, ans=0.1 2023-11-24 16:25:03,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2905720.0, ans=0.0 2023-11-24 16:25:08,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2905720.0, ans=0.0 2023-11-24 16:25:12,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2905720.0, ans=0.125 2023-11-24 16:25:30,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2905853.3333333335, ans=0.95 2023-11-24 16:25:40,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2905920.0, ans=0.125 2023-11-24 16:25:50,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2905986.6666666665, ans=0.125 2023-11-24 16:25:51,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435900 2023-11-24 16:26:04,079 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3050, loss[loss=0.05169, simple_loss=0.06112, pruned_loss=0.007642, audio_tagging_loss=0.01349, over 15369.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09068, pruned_loss=0.0129, audio_tagging_loss=0.008934, over 3045746.22 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:26:04,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2906053.3333333335, ans=0.125 2023-11-24 16:26:06,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2906053.3333333335, ans=0.0 2023-11-24 16:26:16,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2906120.0, ans=0.0 2023-11-24 16:26:18,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.443e+01 9.096e+01 9.825e+01 1.321e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 16:26:20,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-24 16:26:36,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2906186.6666666665, ans=0.0 2023-11-24 16:26:39,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2906186.6666666665, ans=0.2 2023-11-24 16:26:40,223 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:26:53,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-24 16:26:54,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 435950 2023-11-24 16:26:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2906320.0, ans=0.125 2023-11-24 16:27:05,772 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3100, loss[loss=0.06911, simple_loss=0.08995, pruned_loss=0.01266, audio_tagging_loss=0.01148, over 15437.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09095, pruned_loss=0.01295, audio_tagging_loss=0.008956, over 3046857.36 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:27:28,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-24 16:27:32,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-24 16:27:49,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2906586.6666666665, ans=0.125 2023-11-24 16:27:51,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2906586.6666666665, ans=0.0 2023-11-24 16:27:55,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436000 2023-11-24 16:28:12,029 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3150, loss[loss=0.05444, simple_loss=0.07349, pruned_loss=0.008679, audio_tagging_loss=0.00902, over 15598.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09093, pruned_loss=0.01297, audio_tagging_loss=0.008966, over 3055195.48 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:28:14,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2906720.0, ans=10.0 2023-11-24 16:28:27,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.617e+01 9.268e+01 9.907e+01 1.441e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-24 16:28:29,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2906786.6666666665, ans=0.5 2023-11-24 16:28:35,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2906853.3333333335, ans=0.125 2023-11-24 16:28:45,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2906853.3333333335, ans=10.0 2023-11-24 16:28:46,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2906853.3333333335, ans=0.04949747468305833 2023-11-24 16:28:47,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2906920.0, ans=0.125 2023-11-24 16:28:54,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.93 vs. limit=10.0 2023-11-24 16:29:01,666 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436050 2023-11-24 16:29:11,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-24 16:29:14,526 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3200, loss[loss=0.06945, simple_loss=0.09898, pruned_loss=0.01304, audio_tagging_loss=0.006919, over 14920.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09103, pruned_loss=0.01303, audio_tagging_loss=0.009091, over 3054428.71 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:29:35,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-24 16:29:51,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2907253.3333333335, ans=0.0 2023-11-24 16:30:04,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436100 2023-11-24 16:30:06,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2907320.0, ans=0.125 2023-11-24 16:30:08,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2907320.0, ans=0.125 2023-11-24 16:30:08,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2907320.0, ans=0.2 2023-11-24 16:30:16,181 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3250, loss[loss=0.05516, simple_loss=0.07485, pruned_loss=0.006996, audio_tagging_loss=0.01073, over 16748.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09091, pruned_loss=0.01308, audio_tagging_loss=0.009181, over 3044856.85 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:30:17,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2907386.6666666665, ans=0.5 2023-11-24 16:30:17,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2907386.6666666665, ans=0.1 2023-11-24 16:30:24,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2023-11-24 16:30:26,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2907386.6666666665, ans=0.2 2023-11-24 16:30:31,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.491e+01 9.014e+01 9.810e+01 1.302e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 16:30:51,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2907520.0, ans=0.125 2023-11-24 16:31:05,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436150 2023-11-24 16:31:18,081 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3300, loss[loss=0.0554, simple_loss=0.07569, pruned_loss=0.01059, audio_tagging_loss=0.006967, over 15715.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09109, pruned_loss=0.01312, audio_tagging_loss=0.0092, over 3049533.09 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:31:57,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2907920.0, ans=0.05 2023-11-24 16:32:07,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436200 2023-11-24 16:32:17,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2907986.6666666665, ans=0.2 2023-11-24 16:32:21,580 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3350, loss[loss=0.07655, simple_loss=0.1062, pruned_loss=0.01501, audio_tagging_loss=0.008439, over 15989.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09063, pruned_loss=0.0131, audio_tagging_loss=0.009119, over 3048351.49 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:32:29,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2908053.3333333335, ans=0.2 2023-11-24 16:32:35,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.766e+01 9.367e+01 1.008e+02 1.183e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-24 16:32:56,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2908186.6666666665, ans=0.125 2023-11-24 16:33:11,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436250 2023-11-24 16:33:23,270 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3400, loss[loss=0.06405, simple_loss=0.09008, pruned_loss=0.009814, audio_tagging_loss=0.00919, over 14592.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.0909, pruned_loss=0.01308, audio_tagging_loss=0.008976, over 3048645.36 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2023-11-24 16:33:32,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-24 16:33:38,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2908453.3333333335, ans=0.0 2023-11-24 16:33:45,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2023-11-24 16:33:56,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2908520.0, ans=0.125 2023-11-24 16:34:07,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2908586.6666666665, ans=0.2 2023-11-24 16:34:13,198 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436300 2023-11-24 16:34:18,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2908653.3333333335, ans=0.125 2023-11-24 16:34:20,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2023-11-24 16:34:21,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2908653.3333333335, ans=0.125 2023-11-24 16:34:26,313 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3450, loss[loss=0.06254, simple_loss=0.0848, pruned_loss=0.01155, audio_tagging_loss=0.008582, over 14603.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09132, pruned_loss=0.01304, audio_tagging_loss=0.008828, over 3058754.00 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:34:38,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.62 vs. limit=10.0 2023-11-24 16:34:42,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.653e+01 9.241e+01 9.942e+01 2.012e+02, threshold=1.848e+02, percent-clipped=1.0 2023-11-24 16:34:54,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2908853.3333333335, ans=0.0 2023-11-24 16:35:06,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2908920.0, ans=0.0 2023-11-24 16:35:16,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436350 2023-11-24 16:35:24,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2908986.6666666665, ans=0.125 2023-11-24 16:35:29,770 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3500, loss[loss=0.05158, simple_loss=0.07584, pruned_loss=0.005509, audio_tagging_loss=0.008156, over 14926.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09162, pruned_loss=0.01308, audio_tagging_loss=0.008771, over 3052663.96 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2023-11-24 16:35:52,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2909120.0, ans=0.0 2023-11-24 16:36:01,000 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:36:01,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-24 16:36:03,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2909186.6666666665, ans=0.125 2023-11-24 16:36:08,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2909253.3333333335, ans=0.125 2023-11-24 16:36:20,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436400 2023-11-24 16:36:33,042 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3550, loss[loss=0.07481, simple_loss=0.1045, pruned_loss=0.01552, audio_tagging_loss=0.007054, over 14931.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09138, pruned_loss=0.01319, audio_tagging_loss=0.008773, over 3054686.03 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:36:34,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=22.5 2023-11-24 16:36:39,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2909386.6666666665, ans=0.125 2023-11-24 16:36:49,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.760e+01 9.470e+01 1.011e+02 1.264e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-24 16:36:57,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-24 16:37:04,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2909520.0, ans=0.1 2023-11-24 16:37:22,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436450 2023-11-24 16:37:35,203 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3600, loss[loss=0.07459, simple_loss=0.08949, pruned_loss=0.01805, audio_tagging_loss=0.01179, over 14372.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08935, pruned_loss=0.0128, audio_tagging_loss=0.008827, over 3049014.07 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 16:38:25,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436500 2023-11-24 16:38:29,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-24 16:38:37,884 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3650, loss[loss=0.07646, simple_loss=0.1029, pruned_loss=0.01598, audio_tagging_loss=0.009056, over 16100.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08945, pruned_loss=0.01274, audio_tagging_loss=0.008854, over 3056405.08 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 16:38:44,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2910053.3333333335, ans=0.125 2023-11-24 16:38:45,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2910053.3333333335, ans=0.0 2023-11-24 16:38:46,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2910053.3333333335, ans=0.125 2023-11-24 16:38:54,062 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.295e+01 9.068e+01 9.649e+01 1.086e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 16:38:57,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2910120.0, ans=0.125 2023-11-24 16:39:27,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436550 2023-11-24 16:39:34,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-24 16:39:39,411 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3700, loss[loss=0.06241, simple_loss=0.08253, pruned_loss=0.01094, audio_tagging_loss=0.01021, over 14967.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09068, pruned_loss=0.01298, audio_tagging_loss=0.008818, over 3058154.68 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:39:57,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2910453.3333333335, ans=0.2 2023-11-24 16:40:05,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2910520.0, ans=0.2 2023-11-24 16:40:16,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-24 16:40:17,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2910586.6666666665, ans=0.0 2023-11-24 16:40:25,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2910586.6666666665, ans=15.0 2023-11-24 16:40:28,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2910653.3333333335, ans=0.125 2023-11-24 16:40:29,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436600 2023-11-24 16:40:41,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2910720.0, ans=0.1 2023-11-24 16:40:43,384 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3750, loss[loss=0.06683, simple_loss=0.09, pruned_loss=0.0132, audio_tagging_loss=0.008624, over 15000.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09041, pruned_loss=0.0128, audio_tagging_loss=0.008763, over 3051378.69 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:40:50,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2910720.0, ans=0.2 2023-11-24 16:40:56,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2910786.6666666665, ans=0.125 2023-11-24 16:41:01,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.745e+01 9.267e+01 9.947e+01 1.281e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-24 16:41:04,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2910786.6666666665, ans=0.125 2023-11-24 16:41:06,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2910786.6666666665, ans=0.125 2023-11-24 16:41:08,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2910853.3333333335, ans=0.0 2023-11-24 16:41:13,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2910853.3333333335, ans=0.0 2023-11-24 16:41:19,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2910920.0, ans=0.125 2023-11-24 16:41:25,463 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:41:33,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436650 2023-11-24 16:41:36,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2023-11-24 16:41:45,767 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3800, loss[loss=0.07528, simple_loss=0.108, pruned_loss=0.01343, audio_tagging_loss=0.007846, over 16420.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09095, pruned_loss=0.01298, audio_tagging_loss=0.008893, over 3054181.41 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:41:50,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911053.3333333335, ans=0.1 2023-11-24 16:42:05,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2911120.0, ans=0.0 2023-11-24 16:42:32,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2911253.3333333335, ans=0.125 2023-11-24 16:42:36,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436700 2023-11-24 16:42:37,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2911320.0, ans=0.2 2023-11-24 16:42:48,428 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3850, loss[loss=0.06137, simple_loss=0.08478, pruned_loss=0.01089, audio_tagging_loss=0.008092, over 14932.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09107, pruned_loss=0.01283, audio_tagging_loss=0.00892, over 3055980.08 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:42:53,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2911386.6666666665, ans=0.125 2023-11-24 16:43:06,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.511e+01 9.199e+01 9.867e+01 1.160e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-24 16:43:13,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2911520.0, ans=0.0 2023-11-24 16:43:35,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2911586.6666666665, ans=6.0 2023-11-24 16:43:38,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436750 2023-11-24 16:43:40,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2911653.3333333335, ans=0.2 2023-11-24 16:43:41,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2911653.3333333335, ans=0.0 2023-11-24 16:43:50,755 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3900, loss[loss=0.06315, simple_loss=0.08379, pruned_loss=0.01425, audio_tagging_loss=0.006999, over 13161.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09129, pruned_loss=0.01291, audio_tagging_loss=0.008954, over 3041534.04 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:43:56,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2911720.0, ans=0.125 2023-11-24 16:44:13,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2911786.6666666665, ans=0.2 2023-11-24 16:44:41,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436800 2023-11-24 16:44:54,261 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 3950, loss[loss=0.0614, simple_loss=0.08386, pruned_loss=0.008721, audio_tagging_loss=0.01075, over 15229.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09183, pruned_loss=0.01312, audio_tagging_loss=0.008986, over 3043156.26 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:45:11,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.639e+01 9.043e+01 9.863e+01 1.208e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 16:45:11,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2912120.0, ans=0.125 2023-11-24 16:45:29,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912186.6666666665, ans=0.1 2023-11-24 16:45:42,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912253.3333333335, ans=0.1 2023-11-24 16:45:44,216 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436850 2023-11-24 16:45:49,633 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:45:56,543 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4000, loss[loss=0.05764, simple_loss=0.0839, pruned_loss=0.008338, audio_tagging_loss=0.007355, over 14750.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09252, pruned_loss=0.01331, audio_tagging_loss=0.00901, over 3040385.87 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 16:46:13,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2912453.3333333335, ans=0.0 2023-11-24 16:46:19,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2912453.3333333335, ans=0.125 2023-11-24 16:46:27,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2912520.0, ans=0.95 2023-11-24 16:46:35,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2023-11-24 16:46:46,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436900 2023-11-24 16:46:47,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=12.0 2023-11-24 16:46:58,169 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4050, loss[loss=0.05903, simple_loss=0.08028, pruned_loss=0.009459, audio_tagging_loss=0.009423, over 14952.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09266, pruned_loss=0.01335, audio_tagging_loss=0.009048, over 3039124.25 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 16:47:00,518 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:47:08,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912720.0, ans=0.1 2023-11-24 16:47:16,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.690e+01 9.276e+01 1.004e+02 1.184e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-24 16:47:23,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-24 16:47:44,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2912920.0, ans=0.0 2023-11-24 16:47:47,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 436950 2023-11-24 16:47:59,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-24 16:48:01,365 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4100, loss[loss=0.06352, simple_loss=0.08036, pruned_loss=0.01397, audio_tagging_loss=0.009367, over 15597.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09174, pruned_loss=0.01315, audio_tagging_loss=0.009076, over 3044014.67 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 16:48:05,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2913053.3333333335, ans=0.0 2023-11-24 16:48:16,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2913120.0, ans=0.125 2023-11-24 16:48:38,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-24 16:48:51,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437000 2023-11-24 16:48:56,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2913320.0, ans=0.125 2023-11-24 16:49:04,129 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4150, loss[loss=0.05842, simple_loss=0.07538, pruned_loss=0.01122, audio_tagging_loss=0.009518, over 14711.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09093, pruned_loss=0.013, audio_tagging_loss=0.009052, over 3040093.59 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:49:22,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.740e+01 9.379e+01 9.988e+01 1.190e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-24 16:49:44,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2913586.6666666665, ans=0.5 2023-11-24 16:49:47,810 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:49:53,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437050 2023-11-24 16:49:55,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2913653.3333333335, ans=0.1 2023-11-24 16:49:56,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2913653.3333333335, ans=0.125 2023-11-24 16:49:56,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2913653.3333333335, ans=0.125 2023-11-24 16:50:05,828 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4200, loss[loss=0.05955, simple_loss=0.07553, pruned_loss=0.01281, audio_tagging_loss=0.008973, over 14709.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09044, pruned_loss=0.01286, audio_tagging_loss=0.008954, over 3039648.64 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:50:12,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2913720.0, ans=0.07 2023-11-24 16:50:23,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.10 vs. limit=15.0 2023-11-24 16:50:27,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2913786.6666666665, ans=0.125 2023-11-24 16:50:38,529 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:50:40,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2913853.3333333335, ans=0.2 2023-11-24 16:50:46,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2913920.0, ans=0.1 2023-11-24 16:50:55,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2913986.6666666665, ans=0.125 2023-11-24 16:50:56,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437100 2023-11-24 16:51:07,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914053.3333333335, ans=0.1 2023-11-24 16:51:08,509 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4250, loss[loss=0.07048, simple_loss=0.09701, pruned_loss=0.01169, audio_tagging_loss=0.01029, over 15427.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09096, pruned_loss=0.01287, audio_tagging_loss=0.008907, over 3041041.52 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:51:12,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2914053.3333333335, ans=0.0 2023-11-24 16:51:22,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2023-11-24 16:51:27,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.707e+01 9.618e+01 1.026e+02 1.343e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-24 16:51:27,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914120.0, ans=0.1 2023-11-24 16:51:31,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-11-24 16:51:35,890 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:51:44,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.31 vs. limit=10.0 2023-11-24 16:51:55,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2914253.3333333335, ans=0.125 2023-11-24 16:51:57,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437150 2023-11-24 16:52:03,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2914320.0, ans=0.0 2023-11-24 16:52:10,474 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4300, loss[loss=0.05573, simple_loss=0.07504, pruned_loss=0.01201, audio_tagging_loss=0.0062, over 14570.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09145, pruned_loss=0.01311, audio_tagging_loss=0.008785, over 3034652.11 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:52:56,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914586.6666666665, ans=0.1 2023-11-24 16:52:58,396 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:52:59,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2914653.3333333335, ans=15.0 2023-11-24 16:53:00,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437200 2023-11-24 16:53:07,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-24 16:53:12,542 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4350, loss[loss=0.06985, simple_loss=0.1001, pruned_loss=0.0117, audio_tagging_loss=0.008087, over 15389.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09205, pruned_loss=0.01313, audio_tagging_loss=0.008684, over 3043080.48 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:53:17,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914720.0, ans=0.1 2023-11-24 16:53:24,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2914786.6666666665, ans=0.0 2023-11-24 16:53:32,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.693e+01 9.378e+01 1.028e+02 1.311e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-24 16:53:58,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2914920.0, ans=0.125 2023-11-24 16:54:01,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437250 2023-11-24 16:54:03,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2914986.6666666665, ans=0.0 2023-11-24 16:54:13,955 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4400, loss[loss=0.05152, simple_loss=0.06691, pruned_loss=0.009264, audio_tagging_loss=0.0088, over 15354.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09177, pruned_loss=0.01297, audio_tagging_loss=0.008673, over 3042181.37 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 16:54:15,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2023-11-24 16:54:21,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2915053.3333333335, ans=0.1 2023-11-24 16:54:53,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-24 16:54:57,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2915253.3333333335, ans=0.0 2023-11-24 16:55:03,449 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437300 2023-11-24 16:55:16,828 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4450, loss[loss=0.08151, simple_loss=0.1155, pruned_loss=0.01686, audio_tagging_loss=0.006915, over 15516.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09181, pruned_loss=0.01297, audio_tagging_loss=0.008598, over 3050850.67 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:55:19,513 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:55:21,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2915386.6666666665, ans=0.0 2023-11-24 16:55:29,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.53 vs. limit=10.0 2023-11-24 16:55:36,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.463e+01 9.107e+01 9.644e+01 1.325e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 16:55:47,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-11-24 16:55:50,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 16:56:06,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437350 2023-11-24 16:56:13,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2915653.3333333335, ans=0.07 2023-11-24 16:56:18,383 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4500, loss[loss=0.06771, simple_loss=0.09337, pruned_loss=0.01215, audio_tagging_loss=0.008878, over 15081.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09229, pruned_loss=0.01301, audio_tagging_loss=0.008619, over 3053546.37 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:56:43,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2915853.3333333335, ans=0.0 2023-11-24 16:56:44,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2915853.3333333335, ans=0.0 2023-11-24 16:56:49,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2915853.3333333335, ans=0.0 2023-11-24 16:57:07,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437400 2023-11-24 16:57:07,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2915986.6666666665, ans=0.125 2023-11-24 16:57:13,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2915986.6666666665, ans=0.2 2023-11-24 16:57:16,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2915986.6666666665, ans=0.125 2023-11-24 16:57:20,276 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4550, loss[loss=0.06345, simple_loss=0.08226, pruned_loss=0.01207, audio_tagging_loss=0.01024, over 14789.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09154, pruned_loss=0.01292, audio_tagging_loss=0.008673, over 3055115.90 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 4.0 2023-11-24 16:57:21,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2916053.3333333335, ans=0.1 2023-11-24 16:57:43,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.634e+01 9.216e+01 9.790e+01 1.251e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 16:57:51,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-24 16:57:55,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2916186.6666666665, ans=0.95 2023-11-24 16:57:57,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-11-24 16:58:06,740 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 16:58:07,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-24 16:58:10,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437450 2023-11-24 16:58:23,347 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4600, loss[loss=0.07504, simple_loss=0.09554, pruned_loss=0.01652, audio_tagging_loss=0.01074, over 15775.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0912, pruned_loss=0.01278, audio_tagging_loss=0.008749, over 3056453.70 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:58:31,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-24 16:58:49,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-24 16:59:05,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2916586.6666666665, ans=0.0 2023-11-24 16:59:05,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2916586.6666666665, ans=0.1 2023-11-24 16:59:07,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2916586.6666666665, ans=0.1 2023-11-24 16:59:08,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2023-11-24 16:59:12,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437500 2023-11-24 16:59:25,197 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4650, loss[loss=0.08036, simple_loss=0.1081, pruned_loss=0.01665, audio_tagging_loss=0.009649, over 14044.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09087, pruned_loss=0.01286, audio_tagging_loss=0.008852, over 3053620.01 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 16:59:33,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2916720.0, ans=0.125 2023-11-24 16:59:43,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2916786.6666666665, ans=0.09899494936611666 2023-11-24 16:59:46,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=12.0 2023-11-24 16:59:46,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.291e+01 9.050e+01 9.890e+01 1.291e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-24 16:59:52,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2916853.3333333335, ans=0.125 2023-11-24 16:59:55,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-24 17:00:14,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437550 2023-11-24 17:00:23,250 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:00:26,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2917053.3333333335, ans=0.1 2023-11-24 17:00:27,364 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4700, loss[loss=0.07502, simple_loss=0.1028, pruned_loss=0.01571, audio_tagging_loss=0.007921, over 15515.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08977, pruned_loss=0.01287, audio_tagging_loss=0.008932, over 3051605.64 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:00:37,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2917053.3333333335, ans=0.125 2023-11-24 17:01:17,471 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437600 2023-11-24 17:01:30,294 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4750, loss[loss=0.04913, simple_loss=0.06158, pruned_loss=0.008011, audio_tagging_loss=0.01033, over 15135.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09017, pruned_loss=0.01303, audio_tagging_loss=0.008924, over 3045913.39 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:01:38,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-24 17:01:52,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.751e+01 9.417e+01 1.046e+02 1.255e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-24 17:02:11,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2917586.6666666665, ans=0.125 2023-11-24 17:02:15,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2917586.6666666665, ans=0.04949747468305833 2023-11-24 17:02:20,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437650 2023-11-24 17:02:31,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2917720.0, ans=0.125 2023-11-24 17:02:32,349 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4800, loss[loss=0.05924, simple_loss=0.06207, pruned_loss=0.01473, audio_tagging_loss=0.01348, over 15269.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09091, pruned_loss=0.01321, audio_tagging_loss=0.009009, over 3052407.75 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:02:52,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-24 17:03:13,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2917920.0, ans=0.125 2023-11-24 17:03:14,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2023-11-24 17:03:15,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2917920.0, ans=0.1 2023-11-24 17:03:21,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437700 2023-11-24 17:03:34,098 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4850, loss[loss=0.06015, simple_loss=0.08597, pruned_loss=0.007953, audio_tagging_loss=0.009209, over 14272.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09089, pruned_loss=0.01308, audio_tagging_loss=0.009064, over 3049873.76 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:03:48,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2023-11-24 17:03:58,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.659e+01 9.301e+01 9.915e+01 1.482e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-24 17:04:24,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437750 2023-11-24 17:04:36,850 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4900, loss[loss=0.06832, simple_loss=0.09276, pruned_loss=0.01094, audio_tagging_loss=0.011, over 15507.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09042, pruned_loss=0.01289, audio_tagging_loss=0.009104, over 3043072.25 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:04:39,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-24 17:04:40,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2918386.6666666665, ans=0.015 2023-11-24 17:04:47,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2918386.6666666665, ans=0.125 2023-11-24 17:04:51,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2918453.3333333335, ans=0.0 2023-11-24 17:04:53,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-24 17:04:57,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2918453.3333333335, ans=0.0 2023-11-24 17:04:58,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2918453.3333333335, ans=0.2 2023-11-24 17:05:10,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2918520.0, ans=0.035 2023-11-24 17:05:19,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2918586.6666666665, ans=0.0 2023-11-24 17:05:24,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:05:27,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437800 2023-11-24 17:05:35,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2918653.3333333335, ans=0.125 2023-11-24 17:05:39,981 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 4950, loss[loss=0.04504, simple_loss=0.05459, pruned_loss=0.006163, audio_tagging_loss=0.01158, over 13860.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0902, pruned_loss=0.01279, audio_tagging_loss=0.008931, over 3047078.63 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:05:40,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2918720.0, ans=0.0 2023-11-24 17:05:52,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2918786.6666666665, ans=0.125 2023-11-24 17:06:04,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.424e+01 9.084e+01 9.605e+01 1.394e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 17:06:30,257 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437850 2023-11-24 17:06:42,627 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5000, loss[loss=0.06049, simple_loss=0.08028, pruned_loss=0.01205, audio_tagging_loss=0.008301, over 14805.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08996, pruned_loss=0.01283, audio_tagging_loss=0.008817, over 3040076.64 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:07:05,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2919120.0, ans=0.125 2023-11-24 17:07:32,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437900 2023-11-24 17:07:40,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2919320.0, ans=0.125 2023-11-24 17:07:40,572 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:07:45,490 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5050, loss[loss=0.05835, simple_loss=0.08144, pruned_loss=0.009966, audio_tagging_loss=0.007664, over 13793.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09001, pruned_loss=0.013, audio_tagging_loss=0.008709, over 3045108.72 frames. ], batch size: 51, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:07:49,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2919386.6666666665, ans=0.1 2023-11-24 17:08:08,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 8.595e+01 9.107e+01 9.818e+01 1.374e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 17:08:27,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2919586.6666666665, ans=0.0 2023-11-24 17:08:27,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2919586.6666666665, ans=0.125 2023-11-24 17:08:35,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 437950 2023-11-24 17:08:45,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2919653.3333333335, ans=0.125 2023-11-24 17:08:47,794 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5100, loss[loss=0.06558, simple_loss=0.09217, pruned_loss=0.007442, audio_tagging_loss=0.01205, over 15113.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09037, pruned_loss=0.01304, audio_tagging_loss=0.008681, over 3051744.18 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:09:37,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438000 2023-11-24 17:09:37,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-24 17:09:45,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2919986.6666666665, ans=0.125 2023-11-24 17:09:49,268 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5150, loss[loss=0.07343, simple_loss=0.1023, pruned_loss=0.01363, audio_tagging_loss=0.008654, over 14783.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09002, pruned_loss=0.01301, audio_tagging_loss=0.008775, over 3051579.03 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:09:49,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2920053.3333333335, ans=0.2 2023-11-24 17:10:13,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.605e+01 9.201e+01 1.005e+02 1.322e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-24 17:10:39,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438050 2023-11-24 17:10:51,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2920386.6666666665, ans=0.125 2023-11-24 17:10:51,995 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5200, loss[loss=0.05377, simple_loss=0.06977, pruned_loss=0.01113, audio_tagging_loss=0.007753, over 15419.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09012, pruned_loss=0.01305, audio_tagging_loss=0.008781, over 3049945.19 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:11:05,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2920453.3333333335, ans=0.05 2023-11-24 17:11:10,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2920453.3333333335, ans=0.2 2023-11-24 17:11:21,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2920520.0, ans=0.125 2023-11-24 17:11:42,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438100 2023-11-24 17:11:46,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2920653.3333333335, ans=0.2 2023-11-24 17:11:50,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2920653.3333333335, ans=0.125 2023-11-24 17:11:55,028 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5250, loss[loss=0.06561, simple_loss=0.08624, pruned_loss=0.01347, audio_tagging_loss=0.009019, over 14657.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09067, pruned_loss=0.01312, audio_tagging_loss=0.008769, over 3045224.82 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:12:17,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.449e+01 8.931e+01 9.765e+01 1.225e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-24 17:12:44,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2920986.6666666665, ans=0.1 2023-11-24 17:12:45,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438150 2023-11-24 17:12:51,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-24 17:12:57,270 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5300, loss[loss=0.0797, simple_loss=0.1184, pruned_loss=0.01433, audio_tagging_loss=0.006195, over 15785.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09173, pruned_loss=0.01316, audio_tagging_loss=0.008711, over 3046228.82 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:13:00,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-11-24 17:13:07,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2023-11-24 17:13:35,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2921253.3333333335, ans=0.125 2023-11-24 17:13:46,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2921320.0, ans=0.0 2023-11-24 17:13:47,404 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438200 2023-11-24 17:13:47,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2921320.0, ans=0.125 2023-11-24 17:13:56,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2921320.0, ans=0.0 2023-11-24 17:14:00,114 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5350, loss[loss=0.1025, simple_loss=0.1386, pruned_loss=0.02756, audio_tagging_loss=0.005664, over 15700.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09161, pruned_loss=0.01313, audio_tagging_loss=0.008742, over 3049188.21 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:14:17,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2921453.3333333335, ans=0.125 2023-11-24 17:14:24,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.013e+01 8.487e+01 9.193e+01 9.991e+01 1.472e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 17:14:31,922 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:14:46,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2921586.6666666665, ans=0.1 2023-11-24 17:14:47,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2023-11-24 17:14:49,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438250 2023-11-24 17:14:59,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2921653.3333333335, ans=0.0 2023-11-24 17:15:03,521 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5400, loss[loss=0.08472, simple_loss=0.1054, pruned_loss=0.02071, audio_tagging_loss=0.01133, over 15205.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.092, pruned_loss=0.01325, audio_tagging_loss=0.008892, over 3042440.16 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:15:07,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2921720.0, ans=10.0 2023-11-24 17:15:09,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2921720.0, ans=0.125 2023-11-24 17:15:13,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2921720.0, ans=0.1 2023-11-24 17:15:16,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2921786.6666666665, ans=0.125 2023-11-24 17:15:33,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2921853.3333333335, ans=0.1 2023-11-24 17:15:34,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2921853.3333333335, ans=0.125 2023-11-24 17:15:48,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-11-24 17:15:53,245 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438300 2023-11-24 17:16:04,903 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5450, loss[loss=0.06733, simple_loss=0.09739, pruned_loss=0.01042, audio_tagging_loss=0.008219, over 14969.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09221, pruned_loss=0.01321, audio_tagging_loss=0.008989, over 3041092.75 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:16:28,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.393e+01 9.258e+01 1.002e+02 1.486e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 17:16:33,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2922186.6666666665, ans=0.0 2023-11-24 17:16:43,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2922253.3333333335, ans=0.125 2023-11-24 17:16:46,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2922253.3333333335, ans=0.1 2023-11-24 17:16:55,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438350 2023-11-24 17:17:08,718 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5500, loss[loss=0.06863, simple_loss=0.08805, pruned_loss=0.01382, audio_tagging_loss=0.01079, over 14922.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.09272, pruned_loss=0.01336, audio_tagging_loss=0.008949, over 3042943.49 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:17:10,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2023-11-24 17:17:31,974 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:17:58,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438400 2023-11-24 17:18:11,628 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5550, loss[loss=0.06805, simple_loss=0.09246, pruned_loss=0.01242, audio_tagging_loss=0.0094, over 15370.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09242, pruned_loss=0.01331, audio_tagging_loss=0.009075, over 3051822.39 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:18:18,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922720.0, ans=0.1 2023-11-24 17:18:29,117 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:18:34,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.681e+01 9.449e+01 1.002e+02 1.118e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-24 17:18:42,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2922853.3333333335, ans=0.0 2023-11-24 17:18:49,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2922920.0, ans=0.0 2023-11-24 17:18:54,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2922920.0, ans=0.0 2023-11-24 17:19:01,993 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438450 2023-11-24 17:19:13,616 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5600, loss[loss=0.05633, simple_loss=0.07478, pruned_loss=0.01068, audio_tagging_loss=0.008269, over 15263.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.09184, pruned_loss=0.01316, audio_tagging_loss=0.009079, over 3045953.45 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:19:20,958 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:19:25,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2923120.0, ans=0.125 2023-11-24 17:19:36,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-11-24 17:19:46,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2923186.6666666665, ans=0.2 2023-11-24 17:19:57,411 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 17:20:03,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438500 2023-11-24 17:20:08,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2923320.0, ans=0.0 2023-11-24 17:20:15,352 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5650, loss[loss=0.09046, simple_loss=0.1178, pruned_loss=0.02134, audio_tagging_loss=0.0102, over 15732.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09159, pruned_loss=0.01312, audio_tagging_loss=0.009086, over 3049119.98 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:20:21,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2923386.6666666665, ans=0.125 2023-11-24 17:20:26,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2923386.6666666665, ans=0.125 2023-11-24 17:20:40,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.503e+01 9.097e+01 9.762e+01 1.252e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 17:21:05,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438550 2023-11-24 17:21:07,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2923653.3333333335, ans=0.125 2023-11-24 17:21:16,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2923653.3333333335, ans=0.0 2023-11-24 17:21:17,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2023-11-24 17:21:18,234 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5700, loss[loss=0.07828, simple_loss=0.1098, pruned_loss=0.0166, audio_tagging_loss=0.006786, over 16016.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09128, pruned_loss=0.01305, audio_tagging_loss=0.00901, over 3039390.17 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:21:23,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2923720.0, ans=0.2 2023-11-24 17:21:26,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2923720.0, ans=0.125 2023-11-24 17:21:27,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2923720.0, ans=0.125 2023-11-24 17:21:32,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-24 17:21:41,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2923786.6666666665, ans=0.125 2023-11-24 17:21:41,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2923786.6666666665, ans=0.125 2023-11-24 17:21:43,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=12.0 2023-11-24 17:21:45,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2923853.3333333335, ans=0.125 2023-11-24 17:22:01,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2923920.0, ans=0.0 2023-11-24 17:22:05,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2923920.0, ans=0.1 2023-11-24 17:22:08,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438600 2023-11-24 17:22:10,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2923986.6666666665, ans=0.125 2023-11-24 17:22:21,769 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5750, loss[loss=0.06273, simple_loss=0.0771, pruned_loss=0.01097, audio_tagging_loss=0.01321, over 14790.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09041, pruned_loss=0.01286, audio_tagging_loss=0.009004, over 3045339.94 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:22:27,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2924053.3333333335, ans=0.125 2023-11-24 17:22:32,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2924120.0, ans=0.0 2023-11-24 17:22:37,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2924120.0, ans=0.0 2023-11-24 17:22:44,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.615e+01 9.261e+01 9.857e+01 1.309e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 17:22:45,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2924186.6666666665, ans=0.2 2023-11-24 17:22:47,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2924186.6666666665, ans=0.025 2023-11-24 17:22:49,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2924186.6666666665, ans=0.125 2023-11-24 17:22:57,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2924253.3333333335, ans=0.125 2023-11-24 17:23:11,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438650 2023-11-24 17:23:12,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2924320.0, ans=0.125 2023-11-24 17:23:21,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2924320.0, ans=0.125 2023-11-24 17:23:23,085 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5800, loss[loss=0.07413, simple_loss=0.09602, pruned_loss=0.01849, audio_tagging_loss=0.007637, over 15432.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09008, pruned_loss=0.01293, audio_tagging_loss=0.008917, over 3038708.03 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:23:23,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2924386.6666666665, ans=0.125 2023-11-24 17:23:32,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2924386.6666666665, ans=0.0 2023-11-24 17:23:37,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2924453.3333333335, ans=0.2 2023-11-24 17:23:59,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2924586.6666666665, ans=0.125 2023-11-24 17:24:06,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2924586.6666666665, ans=0.1 2023-11-24 17:24:12,470 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438700 2023-11-24 17:24:16,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2924653.3333333335, ans=0.125 2023-11-24 17:24:23,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-24 17:24:25,407 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5850, loss[loss=0.07278, simple_loss=0.1069, pruned_loss=0.01315, audio_tagging_loss=0.006172, over 16294.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09041, pruned_loss=0.013, audio_tagging_loss=0.008807, over 3037487.96 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:24:36,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2924786.6666666665, ans=0.2 2023-11-24 17:24:37,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2924786.6666666665, ans=0.125 2023-11-24 17:24:50,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.501e+01 8.938e+01 9.729e+01 1.237e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-24 17:25:01,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2924920.0, ans=0.125 2023-11-24 17:25:07,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2924920.0, ans=0.125 2023-11-24 17:25:15,072 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438750 2023-11-24 17:25:27,318 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5900, loss[loss=0.05686, simple_loss=0.06845, pruned_loss=0.01239, audio_tagging_loss=0.01025, over 14205.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09074, pruned_loss=0.013, audio_tagging_loss=0.008883, over 3033143.01 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:25:34,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2925053.3333333335, ans=0.0 2023-11-24 17:25:43,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:26:16,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438800 2023-11-24 17:26:16,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2925320.0, ans=0.1 2023-11-24 17:26:17,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2925320.0, ans=0.125 2023-11-24 17:26:21,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2925320.0, ans=0.0 2023-11-24 17:26:29,245 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 5950, loss[loss=0.07041, simple_loss=0.09437, pruned_loss=0.01441, audio_tagging_loss=0.008818, over 14611.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09061, pruned_loss=0.01293, audio_tagging_loss=0.008896, over 3036839.62 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:26:38,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2925386.6666666665, ans=0.125 2023-11-24 17:26:47,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-24 17:26:54,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.647e+01 9.220e+01 9.867e+01 1.210e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 17:27:07,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2925586.6666666665, ans=0.125 2023-11-24 17:27:18,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438850 2023-11-24 17:27:31,683 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6000, loss[loss=0.0494, simple_loss=0.07126, pruned_loss=0.005944, audio_tagging_loss=0.007823, over 15562.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09058, pruned_loss=0.0129, audio_tagging_loss=0.008843, over 3041712.43 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:27:31,684 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 17:28:13,997 INFO [train_asr.py:1253] (1/4) Epoch 37, validation: loss=0.05829, simple_loss=0.05083, pruned_loss=0.00526, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-24 17:28:13,998 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 17:28:24,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2925720.0, ans=0.0 2023-11-24 17:28:37,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2023-11-24 17:28:39,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2925853.3333333335, ans=0.125 2023-11-24 17:28:53,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-24 17:28:59,041 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 17:28:59,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2925920.0, ans=0.125 2023-11-24 17:29:03,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438900 2023-11-24 17:29:16,132 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6050, loss[loss=0.069, simple_loss=0.09249, pruned_loss=0.01303, audio_tagging_loss=0.009733, over 15574.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09084, pruned_loss=0.01299, audio_tagging_loss=0.008774, over 3039123.92 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:29:29,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2926120.0, ans=0.04949747468305833 2023-11-24 17:29:35,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2926120.0, ans=0.0 2023-11-24 17:29:37,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2926120.0, ans=0.0 2023-11-24 17:29:41,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.485e+01 9.097e+01 9.869e+01 1.305e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 17:29:51,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2926186.6666666665, ans=0.125 2023-11-24 17:29:56,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2926253.3333333335, ans=0.2 2023-11-24 17:30:06,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 438950 2023-11-24 17:30:18,679 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6100, loss[loss=0.08752, simple_loss=0.1205, pruned_loss=0.02005, audio_tagging_loss=0.007219, over 14829.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09033, pruned_loss=0.013, audio_tagging_loss=0.008863, over 3039058.59 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:30:23,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2926386.6666666665, ans=0.125 2023-11-24 17:31:08,389 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439000 2023-11-24 17:31:21,184 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6150, loss[loss=0.07356, simple_loss=0.09668, pruned_loss=0.01588, audio_tagging_loss=0.009343, over 15419.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09045, pruned_loss=0.01304, audio_tagging_loss=0.008895, over 3042025.92 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:31:36,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2926786.6666666665, ans=10.0 2023-11-24 17:31:46,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.702e+01 9.525e+01 1.018e+02 1.166e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-24 17:31:51,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2926853.3333333335, ans=0.1 2023-11-24 17:31:51,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2926853.3333333335, ans=0.1 2023-11-24 17:31:55,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2926853.3333333335, ans=0.0 2023-11-24 17:32:10,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439050 2023-11-24 17:32:16,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2926986.6666666665, ans=0.125 2023-11-24 17:32:20,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-24 17:32:22,650 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6200, loss[loss=0.07327, simple_loss=0.09265, pruned_loss=0.01619, audio_tagging_loss=0.01075, over 14953.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09028, pruned_loss=0.01295, audio_tagging_loss=0.008946, over 3041920.09 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:32:24,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2927053.3333333335, ans=0.0 2023-11-24 17:32:33,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2927053.3333333335, ans=0.0 2023-11-24 17:32:37,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2927120.0, ans=0.125 2023-11-24 17:32:41,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2927120.0, ans=0.0 2023-11-24 17:33:06,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2927253.3333333335, ans=0.04949747468305833 2023-11-24 17:33:12,150 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439100 2023-11-24 17:33:19,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2927320.0, ans=0.125 2023-11-24 17:33:20,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-24 17:33:25,882 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6250, loss[loss=0.07587, simple_loss=0.1044, pruned_loss=0.01532, audio_tagging_loss=0.008375, over 14947.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09022, pruned_loss=0.01286, audio_tagging_loss=0.008966, over 3037127.50 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 17:33:34,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2927386.6666666665, ans=0.125 2023-11-24 17:33:50,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.139e+01 8.621e+01 9.212e+01 1.010e+02 1.296e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 17:34:01,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2927586.6666666665, ans=0.125 2023-11-24 17:34:05,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=22.5 2023-11-24 17:34:13,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-11-24 17:34:15,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439150 2023-11-24 17:34:27,450 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6300, loss[loss=0.05957, simple_loss=0.07075, pruned_loss=0.01197, audio_tagging_loss=0.01222, over 14040.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08922, pruned_loss=0.0128, audio_tagging_loss=0.009092, over 3031488.94 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:34:29,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-24 17:34:38,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2927786.6666666665, ans=0.1 2023-11-24 17:34:41,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2927786.6666666665, ans=0.125 2023-11-24 17:34:51,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2927853.3333333335, ans=0.125 2023-11-24 17:34:53,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2023-11-24 17:35:05,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2927920.0, ans=0.125 2023-11-24 17:35:17,219 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439200 2023-11-24 17:35:29,263 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6350, loss[loss=0.07, simple_loss=0.09377, pruned_loss=0.0139, audio_tagging_loss=0.009214, over 15460.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08931, pruned_loss=0.01265, audio_tagging_loss=0.009221, over 3032456.14 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:35:31,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2928053.3333333335, ans=0.1 2023-11-24 17:35:37,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2928053.3333333335, ans=0.2 2023-11-24 17:35:45,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2023-11-24 17:35:56,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.531e+01 9.107e+01 9.829e+01 2.915e+02, threshold=1.821e+02, percent-clipped=1.0 2023-11-24 17:36:01,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2928186.6666666665, ans=0.0 2023-11-24 17:36:04,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2928186.6666666665, ans=0.125 2023-11-24 17:36:08,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2928253.3333333335, ans=0.1 2023-11-24 17:36:16,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2928253.3333333335, ans=0.1 2023-11-24 17:36:18,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439250 2023-11-24 17:36:22,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2928320.0, ans=0.125 2023-11-24 17:36:24,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2928320.0, ans=0.05 2023-11-24 17:36:30,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2928386.6666666665, ans=0.1 2023-11-24 17:36:31,542 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6400, loss[loss=0.06738, simple_loss=0.09296, pruned_loss=0.01164, audio_tagging_loss=0.009252, over 15793.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08917, pruned_loss=0.01258, audio_tagging_loss=0.009235, over 3038049.35 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:36:52,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2928453.3333333335, ans=0.0 2023-11-24 17:36:55,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2928520.0, ans=0.125 2023-11-24 17:37:06,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2928586.6666666665, ans=0.09899494936611666 2023-11-24 17:37:07,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-24 17:37:13,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2928586.6666666665, ans=0.0 2023-11-24 17:37:21,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439300 2023-11-24 17:37:25,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-24 17:37:29,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2928653.3333333335, ans=0.125 2023-11-24 17:37:33,833 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6450, loss[loss=0.04742, simple_loss=0.06118, pruned_loss=0.007137, audio_tagging_loss=0.009692, over 15732.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09015, pruned_loss=0.01277, audio_tagging_loss=0.009196, over 3039470.65 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:37:36,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2928720.0, ans=0.125 2023-11-24 17:37:41,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2928720.0, ans=0.0 2023-11-24 17:37:42,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2928720.0, ans=0.125 2023-11-24 17:37:43,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2928720.0, ans=0.0 2023-11-24 17:37:43,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2928720.0, ans=0.0 2023-11-24 17:37:56,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2928853.3333333335, ans=0.125 2023-11-24 17:38:00,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.603e+01 9.218e+01 1.012e+02 1.215e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 17:38:04,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2928853.3333333335, ans=0.125 2023-11-24 17:38:22,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439350 2023-11-24 17:38:31,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2928986.6666666665, ans=0.125 2023-11-24 17:38:34,343 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6500, loss[loss=0.06603, simple_loss=0.08428, pruned_loss=0.0144, audio_tagging_loss=0.009489, over 14938.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09015, pruned_loss=0.01282, audio_tagging_loss=0.009249, over 3032548.41 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:38:36,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2929053.3333333335, ans=0.125 2023-11-24 17:38:42,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2929053.3333333335, ans=0.2 2023-11-24 17:38:53,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-11-24 17:38:55,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2929120.0, ans=0.0 2023-11-24 17:38:57,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2929120.0, ans=0.025 2023-11-24 17:39:23,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2929320.0, ans=0.07 2023-11-24 17:39:24,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439400 2023-11-24 17:39:27,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2929320.0, ans=0.0 2023-11-24 17:39:28,113 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:39:34,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2929320.0, ans=0.0 2023-11-24 17:39:36,738 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6550, loss[loss=0.08427, simple_loss=0.1082, pruned_loss=0.02406, audio_tagging_loss=0.006124, over 16078.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09198, pruned_loss=0.01325, audio_tagging_loss=0.008988, over 3035097.35 frames. ], batch size: 60, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:39:43,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2929386.6666666665, ans=0.0 2023-11-24 17:39:50,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2929453.3333333335, ans=0.025 2023-11-24 17:40:05,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.772e+01 9.362e+01 9.895e+01 1.833e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-24 17:40:17,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2929586.6666666665, ans=0.125 2023-11-24 17:40:19,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2929586.6666666665, ans=0.2 2023-11-24 17:40:22,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-24 17:40:26,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439450 2023-11-24 17:40:38,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2929720.0, ans=0.125 2023-11-24 17:40:39,576 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6600, loss[loss=0.04715, simple_loss=0.0639, pruned_loss=0.005866, audio_tagging_loss=0.009335, over 14936.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09141, pruned_loss=0.01311, audio_tagging_loss=0.008862, over 3032925.39 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:40:44,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2929720.0, ans=0.025 2023-11-24 17:40:56,868 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 17:41:08,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2929853.3333333335, ans=0.09899494936611666 2023-11-24 17:41:19,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2929920.0, ans=0.1 2023-11-24 17:41:29,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439500 2023-11-24 17:41:41,327 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6650, loss[loss=0.05712, simple_loss=0.08062, pruned_loss=0.008006, audio_tagging_loss=0.008797, over 15711.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09111, pruned_loss=0.013, audio_tagging_loss=0.008788, over 3034832.77 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:41:41,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2930053.3333333335, ans=0.125 2023-11-24 17:41:46,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-24 17:41:47,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2930053.3333333335, ans=0.0 2023-11-24 17:42:01,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2930120.0, ans=0.125 2023-11-24 17:42:03,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2930120.0, ans=0.125 2023-11-24 17:42:09,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2930186.6666666665, ans=0.0 2023-11-24 17:42:10,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.532e+01 9.137e+01 1.001e+02 1.205e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-24 17:42:31,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439550 2023-11-24 17:42:44,153 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6700, loss[loss=0.0844, simple_loss=0.1194, pruned_loss=0.01793, audio_tagging_loss=0.006768, over 14865.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09098, pruned_loss=0.01295, audio_tagging_loss=0.008738, over 3040613.27 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:43:02,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2930453.3333333335, ans=0.025 2023-11-24 17:43:04,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2930453.3333333335, ans=6.0 2023-11-24 17:43:26,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2930586.6666666665, ans=0.125 2023-11-24 17:43:33,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439600 2023-11-24 17:43:46,502 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6750, loss[loss=0.07869, simple_loss=0.105, pruned_loss=0.0175, audio_tagging_loss=0.008659, over 15396.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09131, pruned_loss=0.013, audio_tagging_loss=0.008799, over 3035412.50 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 8.0 2023-11-24 17:43:50,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2930720.0, ans=0.04949747468305833 2023-11-24 17:44:15,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.260e+01 8.901e+01 9.551e+01 1.159e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-24 17:44:18,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-24 17:44:37,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439650 2023-11-24 17:44:42,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2930986.6666666665, ans=0.1 2023-11-24 17:44:44,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2930986.6666666665, ans=0.07 2023-11-24 17:44:49,440 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6800, loss[loss=0.07582, simple_loss=0.1044, pruned_loss=0.01597, audio_tagging_loss=0.007641, over 15327.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09024, pruned_loss=0.01282, audio_tagging_loss=0.008858, over 3041089.56 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:44:49,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2931053.3333333335, ans=10.0 2023-11-24 17:45:33,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2931253.3333333335, ans=10.0 2023-11-24 17:45:34,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2931253.3333333335, ans=6.0 2023-11-24 17:45:39,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439700 2023-11-24 17:45:51,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2931386.6666666665, ans=0.1 2023-11-24 17:45:51,916 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6850, loss[loss=0.06914, simple_loss=0.09957, pruned_loss=0.009019, audio_tagging_loss=0.01034, over 14860.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09032, pruned_loss=0.01282, audio_tagging_loss=0.008875, over 3042619.03 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:46:16,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2931520.0, ans=0.1 2023-11-24 17:46:17,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2931520.0, ans=0.1 2023-11-24 17:46:20,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2931520.0, ans=0.125 2023-11-24 17:46:21,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.185e+01 8.495e+01 8.934e+01 9.864e+01 1.145e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-24 17:46:28,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2931586.6666666665, ans=0.0 2023-11-24 17:46:37,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2931586.6666666665, ans=0.0 2023-11-24 17:46:41,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2931653.3333333335, ans=0.125 2023-11-24 17:46:42,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439750 2023-11-24 17:46:55,045 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6900, loss[loss=0.05962, simple_loss=0.07261, pruned_loss=0.01132, audio_tagging_loss=0.01199, over 16861.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09089, pruned_loss=0.0129, audio_tagging_loss=0.008806, over 3040620.14 frames. ], batch size: 64, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:47:11,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2931786.6666666665, ans=0.0 2023-11-24 17:47:33,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2931920.0, ans=0.0 2023-11-24 17:47:39,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2931920.0, ans=0.1 2023-11-24 17:47:42,854 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 17:47:45,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439800 2023-11-24 17:47:49,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2931986.6666666665, ans=0.125 2023-11-24 17:47:51,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-24 17:47:52,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-24 17:47:55,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2023-11-24 17:47:55,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2931986.6666666665, ans=0.125 2023-11-24 17:47:58,604 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 6950, loss[loss=0.06154, simple_loss=0.08404, pruned_loss=0.01221, audio_tagging_loss=0.007314, over 15306.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09061, pruned_loss=0.01292, audio_tagging_loss=0.008826, over 3045352.85 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:47:58,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2932053.3333333335, ans=0.0 2023-11-24 17:48:01,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2932053.3333333335, ans=0.0 2023-11-24 17:48:03,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-24 17:48:07,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2932053.3333333335, ans=0.2 2023-11-24 17:48:11,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2932120.0, ans=0.2 2023-11-24 17:48:16,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2932120.0, ans=0.125 2023-11-24 17:48:22,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2932186.6666666665, ans=0.125 2023-11-24 17:48:27,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.692e+01 9.069e+01 1.003e+02 1.264e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 17:48:34,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2932253.3333333335, ans=0.125 2023-11-24 17:48:44,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2932253.3333333335, ans=0.015 2023-11-24 17:48:48,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439850 2023-11-24 17:49:00,500 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7000, loss[loss=0.05193, simple_loss=0.06769, pruned_loss=0.009132, audio_tagging_loss=0.008949, over 14798.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08951, pruned_loss=0.0127, audio_tagging_loss=0.008903, over 3046968.82 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:49:03,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2932386.6666666665, ans=0.2 2023-11-24 17:49:08,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2932386.6666666665, ans=0.0 2023-11-24 17:49:14,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2932453.3333333335, ans=0.1 2023-11-24 17:49:34,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2932520.0, ans=0.125 2023-11-24 17:49:43,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2932586.6666666665, ans=0.0 2023-11-24 17:49:45,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2932586.6666666665, ans=0.02 2023-11-24 17:49:45,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2932586.6666666665, ans=0.0 2023-11-24 17:49:51,105 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439900 2023-11-24 17:50:03,450 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7050, loss[loss=0.06302, simple_loss=0.09175, pruned_loss=0.006576, audio_tagging_loss=0.01057, over 15488.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08953, pruned_loss=0.01251, audio_tagging_loss=0.008949, over 3049485.15 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:50:11,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2932720.0, ans=0.125 2023-11-24 17:50:12,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2932720.0, ans=0.0 2023-11-24 17:50:17,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2932786.6666666665, ans=0.125 2023-11-24 17:50:26,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-24 17:50:31,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.508e+01 9.071e+01 9.763e+01 1.227e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-24 17:50:33,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2932853.3333333335, ans=0.125 2023-11-24 17:50:37,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2932853.3333333335, ans=0.125 2023-11-24 17:50:53,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 439950 2023-11-24 17:51:04,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2933053.3333333335, ans=0.125 2023-11-24 17:51:05,589 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7100, loss[loss=0.06417, simple_loss=0.0807, pruned_loss=0.00932, audio_tagging_loss=0.0145, over 15117.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08914, pruned_loss=0.01234, audio_tagging_loss=0.009023, over 3047566.53 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:51:12,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2933053.3333333335, ans=0.125 2023-11-24 17:51:39,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2933186.6666666665, ans=0.125 2023-11-24 17:51:55,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440000 2023-11-24 17:51:56,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2933320.0, ans=0.1 2023-11-24 17:52:02,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2933320.0, ans=0.125 2023-11-24 17:52:12,087 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7150, loss[loss=0.06943, simple_loss=0.0925, pruned_loss=0.01417, audio_tagging_loss=0.009011, over 15271.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09065, pruned_loss=0.01258, audio_tagging_loss=0.008931, over 3044920.89 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:52:27,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2933453.3333333335, ans=0.1 2023-11-24 17:52:38,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2933520.0, ans=0.07 2023-11-24 17:52:40,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.675e+01 9.215e+01 9.966e+01 1.271e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 17:52:44,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2933520.0, ans=10.0 2023-11-24 17:52:56,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2933586.6666666665, ans=0.0 2023-11-24 17:53:01,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440050 2023-11-24 17:53:03,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-24 17:53:14,526 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7200, loss[loss=0.06866, simple_loss=0.09155, pruned_loss=0.01398, audio_tagging_loss=0.00891, over 14772.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09046, pruned_loss=0.01265, audio_tagging_loss=0.009058, over 3043435.21 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:53:16,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2933720.0, ans=0.0 2023-11-24 17:53:29,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2933786.6666666665, ans=0.0 2023-11-24 17:53:35,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2933786.6666666665, ans=0.125 2023-11-24 17:53:44,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-24 17:53:46,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2933853.3333333335, ans=0.1 2023-11-24 17:53:52,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2933920.0, ans=0.125 2023-11-24 17:53:56,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2933920.0, ans=0.0 2023-11-24 17:54:04,230 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440100 2023-11-24 17:54:11,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2933986.6666666665, ans=0.125 2023-11-24 17:54:16,751 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7250, loss[loss=0.06888, simple_loss=0.09373, pruned_loss=0.01496, audio_tagging_loss=0.007062, over 13570.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09106, pruned_loss=0.01284, audio_tagging_loss=0.009056, over 3044964.50 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:54:21,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2934053.3333333335, ans=0.125 2023-11-24 17:54:26,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2934053.3333333335, ans=0.1 2023-11-24 17:54:47,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 8.488e+01 9.107e+01 9.916e+01 1.399e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 17:54:47,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2934186.6666666665, ans=0.125 2023-11-24 17:55:06,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440150 2023-11-24 17:55:18,548 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7300, loss[loss=0.06078, simple_loss=0.07983, pruned_loss=0.01447, audio_tagging_loss=0.006399, over 14723.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08941, pruned_loss=0.01247, audio_tagging_loss=0.009077, over 3043515.60 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:55:27,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2934386.6666666665, ans=0.0 2023-11-24 17:55:44,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2023-11-24 17:55:51,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2934520.0, ans=0.125 2023-11-24 17:55:53,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2934520.0, ans=0.1 2023-11-24 17:55:54,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2934520.0, ans=0.0 2023-11-24 17:55:55,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2934586.6666666665, ans=0.0 2023-11-24 17:55:58,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-24 17:56:04,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2934586.6666666665, ans=0.0 2023-11-24 17:56:09,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440200 2023-11-24 17:56:12,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2934653.3333333335, ans=0.125 2023-11-24 17:56:22,508 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7350, loss[loss=0.05857, simple_loss=0.08249, pruned_loss=0.01108, audio_tagging_loss=0.006241, over 14949.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09055, pruned_loss=0.01257, audio_tagging_loss=0.008845, over 3047636.34 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:56:46,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2934853.3333333335, ans=0.125 2023-11-24 17:56:46,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2934853.3333333335, ans=0.125 2023-11-24 17:56:51,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.582e+01 8.969e+01 9.550e+01 1.265e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-24 17:57:11,771 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440250 2023-11-24 17:57:24,027 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7400, loss[loss=0.0776, simple_loss=0.1163, pruned_loss=0.01332, audio_tagging_loss=0.00611, over 14962.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09049, pruned_loss=0.01264, audio_tagging_loss=0.008828, over 3050424.09 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:57:26,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2935053.3333333335, ans=0.1 2023-11-24 17:57:36,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2935120.0, ans=0.0 2023-11-24 17:57:40,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2935120.0, ans=0.125 2023-11-24 17:57:56,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2935186.6666666665, ans=0.5 2023-11-24 17:57:58,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2935186.6666666665, ans=10.0 2023-11-24 17:58:08,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2935253.3333333335, ans=0.0 2023-11-24 17:58:12,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-24 17:58:14,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440300 2023-11-24 17:58:26,486 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7450, loss[loss=0.07057, simple_loss=0.09866, pruned_loss=0.01372, audio_tagging_loss=0.007512, over 14994.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08993, pruned_loss=0.01257, audio_tagging_loss=0.008907, over 3043493.93 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:58:56,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.534e+01 9.145e+01 9.758e+01 1.407e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 17:59:00,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2935520.0, ans=0.1 2023-11-24 17:59:04,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2935586.6666666665, ans=0.1 2023-11-24 17:59:16,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440350 2023-11-24 17:59:28,787 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7500, loss[loss=0.05904, simple_loss=0.07569, pruned_loss=0.01168, audio_tagging_loss=0.009511, over 14854.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09097, pruned_loss=0.01276, audio_tagging_loss=0.008782, over 3045247.12 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 17:59:34,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2935720.0, ans=0.0 2023-11-24 17:59:40,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2935786.6666666665, ans=0.125 2023-11-24 18:00:08,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=22.5 2023-11-24 18:00:18,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440400 2023-11-24 18:00:20,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-24 18:00:31,109 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7550, loss[loss=0.08723, simple_loss=0.1166, pruned_loss=0.02207, audio_tagging_loss=0.006843, over 15106.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09, pruned_loss=0.01269, audio_tagging_loss=0.008804, over 3049058.38 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:00:36,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2936053.3333333335, ans=0.125 2023-11-24 18:00:40,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-11-24 18:00:48,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-24 18:01:00,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.524e+01 9.162e+01 9.767e+01 1.233e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 18:01:00,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-24 18:01:17,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-24 18:01:21,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440450 2023-11-24 18:01:30,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-24 18:01:30,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2936320.0, ans=0.125 2023-11-24 18:01:32,930 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7600, loss[loss=0.05882, simple_loss=0.08061, pruned_loss=0.01102, audio_tagging_loss=0.007491, over 14309.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09034, pruned_loss=0.01259, audio_tagging_loss=0.008782, over 3042093.41 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:01:35,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2936386.6666666665, ans=0.125 2023-11-24 18:02:00,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-24 18:02:04,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2936520.0, ans=0.125 2023-11-24 18:02:23,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440500 2023-11-24 18:02:35,609 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7650, loss[loss=0.05677, simple_loss=0.07569, pruned_loss=0.006532, audio_tagging_loss=0.0124, over 16310.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0902, pruned_loss=0.01253, audio_tagging_loss=0.008785, over 3048260.03 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:02:37,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-24 18:02:46,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2936720.0, ans=0.0 2023-11-24 18:03:05,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.945e+01 8.749e+01 9.173e+01 9.808e+01 1.366e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-24 18:03:24,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2936986.6666666665, ans=0.0 2023-11-24 18:03:25,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440550 2023-11-24 18:03:33,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2936986.6666666665, ans=10.0 2023-11-24 18:03:37,739 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7700, loss[loss=0.08244, simple_loss=0.1186, pruned_loss=0.01835, audio_tagging_loss=0.00478, over 15180.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09068, pruned_loss=0.01255, audio_tagging_loss=0.008754, over 3048232.07 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:03:38,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2023-11-24 18:03:44,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2937053.3333333335, ans=0.125 2023-11-24 18:03:53,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2937120.0, ans=0.125 2023-11-24 18:04:22,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2937253.3333333335, ans=0.0 2023-11-24 18:04:27,256 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440600 2023-11-24 18:04:35,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2937320.0, ans=0.125 2023-11-24 18:04:36,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-24 18:04:39,840 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7750, loss[loss=0.06031, simple_loss=0.0953, pruned_loss=0.005988, audio_tagging_loss=0.006672, over 14737.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09048, pruned_loss=0.01263, audio_tagging_loss=0.008766, over 3042007.61 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:04:40,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2937386.6666666665, ans=0.125 2023-11-24 18:04:40,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-24 18:04:49,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2937386.6666666665, ans=0.1 2023-11-24 18:04:53,206 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:05:04,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2937520.0, ans=0.125 2023-11-24 18:05:10,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.710e+01 9.479e+01 9.926e+01 1.240e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-24 18:05:30,274 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440650 2023-11-24 18:05:36,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2937653.3333333335, ans=0.0 2023-11-24 18:05:42,081 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7800, loss[loss=0.06628, simple_loss=0.08964, pruned_loss=0.01087, audio_tagging_loss=0.01059, over 14878.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09012, pruned_loss=0.01277, audio_tagging_loss=0.008795, over 3037180.31 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:05:59,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2937786.6666666665, ans=0.0 2023-11-24 18:06:16,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2937853.3333333335, ans=0.04949747468305833 2023-11-24 18:06:32,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440700 2023-11-24 18:06:39,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2937986.6666666665, ans=0.0 2023-11-24 18:06:42,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-24 18:06:43,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2938053.3333333335, ans=0.125 2023-11-24 18:06:45,218 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7850, loss[loss=0.06946, simple_loss=0.09022, pruned_loss=0.01519, audio_tagging_loss=0.009164, over 15852.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09036, pruned_loss=0.0129, audio_tagging_loss=0.008894, over 3039524.54 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:07:02,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2938120.0, ans=0.2 2023-11-24 18:07:04,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2938120.0, ans=0.125 2023-11-24 18:07:10,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2938186.6666666665, ans=0.125 2023-11-24 18:07:14,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.824e+01 9.632e+01 1.042e+02 1.245e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-24 18:07:29,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2938253.3333333335, ans=0.0 2023-11-24 18:07:29,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2938253.3333333335, ans=0.125 2023-11-24 18:07:34,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440750 2023-11-24 18:07:47,152 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7900, loss[loss=0.06651, simple_loss=0.08507, pruned_loss=0.01548, audio_tagging_loss=0.008491, over 15173.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08997, pruned_loss=0.01269, audio_tagging_loss=0.009075, over 3033624.12 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:07:53,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2938386.6666666665, ans=0.0 2023-11-24 18:08:05,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2938453.3333333335, ans=0.0 2023-11-24 18:08:08,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-24 18:08:10,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2938520.0, ans=0.125 2023-11-24 18:08:10,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-24 18:08:23,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938586.6666666665, ans=0.1 2023-11-24 18:08:27,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2938586.6666666665, ans=0.2 2023-11-24 18:08:32,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2938586.6666666665, ans=0.125 2023-11-24 18:08:36,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2938653.3333333335, ans=0.0 2023-11-24 18:08:37,027 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440800 2023-11-24 18:08:38,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2938653.3333333335, ans=0.0 2023-11-24 18:08:39,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2938653.3333333335, ans=0.035 2023-11-24 18:08:48,995 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 7950, loss[loss=0.05925, simple_loss=0.08355, pruned_loss=0.008844, audio_tagging_loss=0.00863, over 15521.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09011, pruned_loss=0.01269, audio_tagging_loss=0.009125, over 3038607.72 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:09:04,175 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:09:17,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2938853.3333333335, ans=0.125 2023-11-24 18:09:20,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.522e+01 9.138e+01 9.780e+01 1.184e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 18:09:28,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2938920.0, ans=0.0 2023-11-24 18:09:30,460 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:09:38,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440850 2023-11-24 18:09:50,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2023-11-24 18:09:51,422 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8000, loss[loss=0.06347, simple_loss=0.08597, pruned_loss=0.0114, audio_tagging_loss=0.009091, over 15269.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.0889, pruned_loss=0.01252, audio_tagging_loss=0.009194, over 3031926.95 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:10:01,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-24 18:10:11,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2939120.0, ans=10.0 2023-11-24 18:10:11,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2939120.0, ans=0.0 2023-11-24 18:10:17,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2939186.6666666665, ans=0.125 2023-11-24 18:10:21,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2023-11-24 18:10:37,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2939253.3333333335, ans=0.125 2023-11-24 18:10:41,056 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440900 2023-11-24 18:10:49,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-11-24 18:10:52,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-24 18:10:53,876 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8050, loss[loss=0.05461, simple_loss=0.06945, pruned_loss=0.00991, audio_tagging_loss=0.009975, over 14760.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08947, pruned_loss=0.01272, audio_tagging_loss=0.009178, over 3026418.85 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 32.0 2023-11-24 18:11:04,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2939453.3333333335, ans=0.0 2023-11-24 18:11:08,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2939453.3333333335, ans=0.125 2023-11-24 18:11:23,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2939520.0, ans=0.0 2023-11-24 18:11:25,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.492e+01 9.262e+01 1.003e+02 1.241e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 18:11:39,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2939586.6666666665, ans=0.0 2023-11-24 18:11:43,206 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 440950 2023-11-24 18:11:45,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2023-11-24 18:11:54,748 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8100, loss[loss=0.06744, simple_loss=0.08347, pruned_loss=0.01561, audio_tagging_loss=0.0101, over 14672.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08955, pruned_loss=0.01282, audio_tagging_loss=0.009092, over 3039324.20 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:11:58,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2939720.0, ans=0.1 2023-11-24 18:11:59,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2939720.0, ans=0.015 2023-11-24 18:12:08,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2939786.6666666665, ans=0.125 2023-11-24 18:12:14,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2939786.6666666665, ans=0.2 2023-11-24 18:12:25,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2939853.3333333335, ans=0.125 2023-11-24 18:12:27,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2939853.3333333335, ans=0.125 2023-11-24 18:12:36,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2939920.0, ans=0.09899494936611666 2023-11-24 18:12:38,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2939920.0, ans=0.0 2023-11-24 18:12:41,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-24 18:12:44,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441000 2023-11-24 18:12:52,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2939986.6666666665, ans=0.125 2023-11-24 18:12:57,403 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8150, loss[loss=0.06998, simple_loss=0.09153, pruned_loss=0.01419, audio_tagging_loss=0.01003, over 15868.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08898, pruned_loss=0.01286, audio_tagging_loss=0.009044, over 3033073.66 frames. ], batch size: 64, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:13:00,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2940053.3333333335, ans=0.1 2023-11-24 18:13:24,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2940186.6666666665, ans=0.125 2023-11-24 18:13:29,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.802e+01 9.331e+01 9.892e+01 1.682e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-24 18:13:30,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2940186.6666666665, ans=0.125 2023-11-24 18:13:46,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441050 2023-11-24 18:13:59,370 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8200, loss[loss=0.06399, simple_loss=0.09347, pruned_loss=0.009767, audio_tagging_loss=0.007491, over 14931.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09022, pruned_loss=0.01315, audio_tagging_loss=0.008874, over 3038852.09 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:13:59,417 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:14:01,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2940386.6666666665, ans=0.125 2023-11-24 18:14:19,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2023-11-24 18:14:21,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2940453.3333333335, ans=0.125 2023-11-24 18:14:25,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2940520.0, ans=0.125 2023-11-24 18:14:26,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2940520.0, ans=0.125 2023-11-24 18:14:48,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441100 2023-11-24 18:14:55,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2940653.3333333335, ans=0.0 2023-11-24 18:14:59,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2940653.3333333335, ans=0.0 2023-11-24 18:15:01,167 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8250, loss[loss=0.07793, simple_loss=0.1086, pruned_loss=0.01545, audio_tagging_loss=0.008195, over 15524.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09017, pruned_loss=0.01302, audio_tagging_loss=0.008775, over 3037466.82 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:15:21,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2940786.6666666665, ans=0.0 2023-11-24 18:15:21,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2940786.6666666665, ans=0.125 2023-11-24 18:15:33,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.417e+01 9.119e+01 9.803e+01 1.778e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 18:15:47,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2940920.0, ans=0.125 2023-11-24 18:15:48,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-11-24 18:15:50,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441150 2023-11-24 18:16:00,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=22.5 2023-11-24 18:16:03,871 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8300, loss[loss=0.0986, simple_loss=0.1356, pruned_loss=0.02342, audio_tagging_loss=0.007367, over 14879.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09056, pruned_loss=0.01297, audio_tagging_loss=0.008786, over 3045729.19 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2023-11-24 18:16:19,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2023-11-24 18:16:25,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-24 18:16:27,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2941186.6666666665, ans=0.125 2023-11-24 18:16:29,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2941186.6666666665, ans=0.0 2023-11-24 18:16:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2941253.3333333335, ans=0.0 2023-11-24 18:16:54,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441200 2023-11-24 18:17:07,491 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8350, loss[loss=0.05562, simple_loss=0.07015, pruned_loss=0.01307, audio_tagging_loss=0.007468, over 14088.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.0908, pruned_loss=0.01312, audio_tagging_loss=0.008738, over 3046327.54 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:17:12,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2941386.6666666665, ans=0.04949747468305833 2023-11-24 18:17:16,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2941386.6666666665, ans=0.05 2023-11-24 18:17:28,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2941453.3333333335, ans=0.125 2023-11-24 18:17:34,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2941520.0, ans=0.125 2023-11-24 18:17:38,840 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:17:40,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.581e+01 9.277e+01 1.007e+02 1.908e+02, threshold=1.855e+02, percent-clipped=1.0 2023-11-24 18:17:40,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2941520.0, ans=0.0 2023-11-24 18:17:42,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2941520.0, ans=10.0 2023-11-24 18:17:44,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-11-24 18:17:46,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-24 18:17:47,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2023-11-24 18:17:51,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2941586.6666666665, ans=0.125 2023-11-24 18:17:53,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2941586.6666666665, ans=0.0 2023-11-24 18:17:56,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2941653.3333333335, ans=0.025 2023-11-24 18:17:57,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441250 2023-11-24 18:17:58,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2941653.3333333335, ans=0.125 2023-11-24 18:18:09,317 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8400, loss[loss=0.06033, simple_loss=0.07667, pruned_loss=0.01047, audio_tagging_loss=0.01153, over 14887.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09037, pruned_loss=0.0129, audio_tagging_loss=0.008763, over 3047627.62 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:18:14,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-24 18:18:26,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2941786.6666666665, ans=0.0 2023-11-24 18:18:29,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.44 vs. limit=10.0 2023-11-24 18:18:30,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2941786.6666666665, ans=0.125 2023-11-24 18:18:33,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=8.0 2023-11-24 18:18:33,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2023-11-24 18:18:59,404 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441300 2023-11-24 18:19:11,669 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8450, loss[loss=0.06298, simple_loss=0.08551, pruned_loss=0.0114, audio_tagging_loss=0.00883, over 15669.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08919, pruned_loss=0.01278, audio_tagging_loss=0.00881, over 3050329.99 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:19:39,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2942186.6666666665, ans=0.125 2023-11-24 18:19:39,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2942186.6666666665, ans=0.0 2023-11-24 18:19:43,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 8.739e+01 9.324e+01 1.024e+02 1.265e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 18:19:47,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2942186.6666666665, ans=0.125 2023-11-24 18:19:48,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2942253.3333333335, ans=0.015 2023-11-24 18:20:02,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441350 2023-11-24 18:20:13,957 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8500, loss[loss=0.05505, simple_loss=0.07517, pruned_loss=0.006441, audio_tagging_loss=0.01102, over 14174.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08952, pruned_loss=0.01292, audio_tagging_loss=0.008869, over 3050980.86 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:20:14,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2942386.6666666665, ans=0.1 2023-11-24 18:20:19,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2942386.6666666665, ans=0.1 2023-11-24 18:20:37,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2942453.3333333335, ans=0.125 2023-11-24 18:21:05,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441400 2023-11-24 18:21:10,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2942653.3333333335, ans=0.1 2023-11-24 18:21:17,643 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8550, loss[loss=0.06017, simple_loss=0.07897, pruned_loss=0.01069, audio_tagging_loss=0.009993, over 16581.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0899, pruned_loss=0.01282, audio_tagging_loss=0.008871, over 3059798.32 frames. ], batch size: 62, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:21:19,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2942720.0, ans=0.1 2023-11-24 18:21:52,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.580e+01 9.059e+01 9.638e+01 1.247e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 18:21:54,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2942920.0, ans=0.125 2023-11-24 18:22:07,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441450 2023-11-24 18:22:09,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2942986.6666666665, ans=0.125 2023-11-24 18:22:21,117 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8600, loss[loss=0.04121, simple_loss=0.04765, pruned_loss=0.005345, audio_tagging_loss=0.01204, over 15526.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08968, pruned_loss=0.01286, audio_tagging_loss=0.008939, over 3053596.96 frames. ], batch size: 63, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:23:00,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2943253.3333333335, ans=0.2 2023-11-24 18:23:12,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441500 2023-11-24 18:23:25,585 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8650, loss[loss=0.08018, simple_loss=0.1008, pruned_loss=0.02221, audio_tagging_loss=0.007547, over 14582.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09025, pruned_loss=0.01281, audio_tagging_loss=0.008999, over 3055834.38 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:23:29,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2943386.6666666665, ans=0.125 2023-11-24 18:23:32,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943386.6666666665, ans=0.1 2023-11-24 18:23:42,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2943453.3333333335, ans=0.125 2023-11-24 18:23:57,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2023-11-24 18:23:59,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.635e+01 9.102e+01 9.726e+01 1.317e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 18:24:02,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943586.6666666665, ans=0.1 2023-11-24 18:24:11,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=22.5 2023-11-24 18:24:16,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441550 2023-11-24 18:24:28,902 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8700, loss[loss=0.07003, simple_loss=0.09432, pruned_loss=0.0124, audio_tagging_loss=0.01047, over 14238.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09065, pruned_loss=0.01276, audio_tagging_loss=0.009025, over 3053445.47 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:24:30,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2943720.0, ans=0.0 2023-11-24 18:25:19,306 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441600 2023-11-24 18:25:23,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2943986.6666666665, ans=0.1 2023-11-24 18:25:24,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-11-24 18:25:31,360 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8750, loss[loss=0.06243, simple_loss=0.08833, pruned_loss=0.008976, audio_tagging_loss=0.009285, over 15595.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09179, pruned_loss=0.01298, audio_tagging_loss=0.008919, over 3052911.22 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:25:42,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2944053.3333333335, ans=0.0 2023-11-24 18:26:01,429 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:26:05,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.931e+01 9.559e+01 1.034e+02 1.434e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-24 18:26:22,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441650 2023-11-24 18:26:34,971 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8800, loss[loss=0.06719, simple_loss=0.08576, pruned_loss=0.01329, audio_tagging_loss=0.01103, over 17626.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09309, pruned_loss=0.01329, audio_tagging_loss=0.008963, over 3057863.56 frames. ], batch size: 68, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:26:47,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2944453.3333333335, ans=10.0 2023-11-24 18:26:50,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2944453.3333333335, ans=0.09899494936611666 2023-11-24 18:27:03,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2944520.0, ans=0.1 2023-11-24 18:27:07,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2944520.0, ans=0.2 2023-11-24 18:27:11,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2944586.6666666665, ans=0.0 2023-11-24 18:27:25,837 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441700 2023-11-24 18:27:33,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-24 18:27:37,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2944720.0, ans=0.125 2023-11-24 18:27:38,739 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8850, loss[loss=0.04288, simple_loss=0.05265, pruned_loss=0.004898, audio_tagging_loss=0.01166, over 14274.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09238, pruned_loss=0.0132, audio_tagging_loss=0.00907, over 3058192.19 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:27:40,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2944720.0, ans=0.125 2023-11-24 18:27:48,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2944720.0, ans=0.035 2023-11-24 18:27:49,678 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:27:57,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2944786.6666666665, ans=0.1 2023-11-24 18:28:12,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.596e+01 9.206e+01 9.875e+01 1.365e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-24 18:28:21,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2944920.0, ans=0.125 2023-11-24 18:28:26,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2944920.0, ans=0.0 2023-11-24 18:28:28,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441750 2023-11-24 18:28:30,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-24 18:28:31,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2944986.6666666665, ans=0.025 2023-11-24 18:28:35,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2944986.6666666665, ans=0.125 2023-11-24 18:28:40,903 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8900, loss[loss=0.05981, simple_loss=0.08191, pruned_loss=0.01047, audio_tagging_loss=0.008383, over 15331.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09283, pruned_loss=0.01332, audio_tagging_loss=0.008818, over 3056800.96 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:28:50,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2945053.3333333335, ans=0.1 2023-11-24 18:28:56,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2945120.0, ans=0.2 2023-11-24 18:28:56,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2945120.0, ans=0.125 2023-11-24 18:29:07,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2945186.6666666665, ans=0.2 2023-11-24 18:29:32,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441800 2023-11-24 18:29:38,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2945320.0, ans=0.0 2023-11-24 18:29:39,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2945320.0, ans=0.125 2023-11-24 18:29:41,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-24 18:29:46,775 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 8950, loss[loss=0.05347, simple_loss=0.07075, pruned_loss=0.00965, audio_tagging_loss=0.008447, over 14902.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09326, pruned_loss=0.01346, audio_tagging_loss=0.008735, over 3053441.05 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:29:53,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2945386.6666666665, ans=0.0 2023-11-24 18:29:58,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2945453.3333333335, ans=0.1 2023-11-24 18:30:07,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2945453.3333333335, ans=0.125 2023-11-24 18:30:20,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.707e+01 9.494e+01 1.019e+02 1.254e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-24 18:30:25,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2945586.6666666665, ans=0.1 2023-11-24 18:30:37,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441850 2023-11-24 18:30:50,102 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9000, loss[loss=0.05529, simple_loss=0.06744, pruned_loss=0.008395, audio_tagging_loss=0.01318, over 14867.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.09208, pruned_loss=0.01334, audio_tagging_loss=0.008844, over 3048238.84 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:30:50,103 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 18:31:31,474 INFO [train_asr.py:1253] (1/4) Epoch 37, validation: loss=0.05871, simple_loss=0.05072, pruned_loss=0.005135, audio_tagging_loss=0.02821, over 4681554.00 frames. 2023-11-24 18:31:31,475 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 18:31:55,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2945786.6666666665, ans=0.0 2023-11-24 18:32:09,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2945920.0, ans=0.1 2023-11-24 18:32:10,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2945920.0, ans=0.0 2023-11-24 18:32:11,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2945920.0, ans=0.0 2023-11-24 18:32:22,547 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441900 2023-11-24 18:32:30,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2945986.6666666665, ans=0.125 2023-11-24 18:32:30,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=22.5 2023-11-24 18:32:34,936 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9050, loss[loss=0.06465, simple_loss=0.09851, pruned_loss=0.009762, audio_tagging_loss=0.005628, over 14948.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09181, pruned_loss=0.01329, audio_tagging_loss=0.008773, over 3049885.74 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:32:44,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2946053.3333333335, ans=0.2 2023-11-24 18:32:51,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2946120.0, ans=0.0 2023-11-24 18:32:51,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2946120.0, ans=0.0 2023-11-24 18:33:09,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.396e+01 9.037e+01 9.861e+01 1.283e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 18:33:24,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 441950 2023-11-24 18:33:37,586 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9100, loss[loss=0.08544, simple_loss=0.1188, pruned_loss=0.01914, audio_tagging_loss=0.006923, over 14367.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09165, pruned_loss=0.01316, audio_tagging_loss=0.008739, over 3052818.46 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:33:52,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-24 18:33:56,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2946453.3333333335, ans=0.015 2023-11-24 18:34:06,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2946520.0, ans=0.125 2023-11-24 18:34:10,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2946520.0, ans=0.125 2023-11-24 18:34:13,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2946586.6666666665, ans=0.1 2023-11-24 18:34:27,586 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442000 2023-11-24 18:34:39,638 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9150, loss[loss=0.06394, simple_loss=0.08242, pruned_loss=0.01215, audio_tagging_loss=0.01058, over 15290.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09174, pruned_loss=0.01301, audio_tagging_loss=0.008692, over 3054253.12 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:34:47,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2946720.0, ans=0.125 2023-11-24 18:34:48,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2946720.0, ans=0.125 2023-11-24 18:34:56,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2946786.6666666665, ans=0.0 2023-11-24 18:34:56,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2946786.6666666665, ans=0.125 2023-11-24 18:35:08,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2946853.3333333335, ans=0.0 2023-11-24 18:35:15,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.510e+01 9.159e+01 9.829e+01 1.251e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 18:35:20,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2946920.0, ans=0.125 2023-11-24 18:35:30,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442050 2023-11-24 18:35:40,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2946986.6666666665, ans=0.125 2023-11-24 18:35:42,490 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9200, loss[loss=0.06512, simple_loss=0.09082, pruned_loss=0.01306, audio_tagging_loss=0.006657, over 14669.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09224, pruned_loss=0.01315, audio_tagging_loss=0.00859, over 3051052.74 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:36:04,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2947120.0, ans=0.125 2023-11-24 18:36:15,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-24 18:36:18,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2947186.6666666665, ans=0.1 2023-11-24 18:36:22,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-24 18:36:32,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442100 2023-11-24 18:36:40,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-11-24 18:36:43,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2947320.0, ans=0.125 2023-11-24 18:36:45,628 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9250, loss[loss=0.07646, simple_loss=0.1103, pruned_loss=0.01547, audio_tagging_loss=0.00584, over 15124.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09161, pruned_loss=0.01301, audio_tagging_loss=0.008615, over 3046976.37 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:36:51,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2947386.6666666665, ans=0.0 2023-11-24 18:36:54,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2947386.6666666665, ans=0.125 2023-11-24 18:36:56,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2947453.3333333335, ans=0.2 2023-11-24 18:37:10,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2947520.0, ans=15.0 2023-11-24 18:37:19,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.726e+01 9.340e+01 9.852e+01 1.345e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-24 18:37:35,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442150 2023-11-24 18:37:40,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-24 18:37:47,422 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9300, loss[loss=0.05492, simple_loss=0.07965, pruned_loss=0.007619, audio_tagging_loss=0.007471, over 14979.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09026, pruned_loss=0.01281, audio_tagging_loss=0.00871, over 3047269.98 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:37:47,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2947720.0, ans=0.0 2023-11-24 18:37:53,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2947720.0, ans=0.0 2023-11-24 18:38:05,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2947786.6666666665, ans=0.125 2023-11-24 18:38:14,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2947853.3333333335, ans=0.09899494936611666 2023-11-24 18:38:24,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2947920.0, ans=0.125 2023-11-24 18:38:34,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2947920.0, ans=0.1 2023-11-24 18:38:37,555 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442200 2023-11-24 18:38:51,062 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9350, loss[loss=0.07466, simple_loss=0.1062, pruned_loss=0.01444, audio_tagging_loss=0.007103, over 14346.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09067, pruned_loss=0.01288, audio_tagging_loss=0.008706, over 3048208.70 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:39:08,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2948120.0, ans=0.1 2023-11-24 18:39:12,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2948120.0, ans=0.0 2023-11-24 18:39:14,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2023-11-24 18:39:25,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2948186.6666666665, ans=0.125 2023-11-24 18:39:26,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.625e+01 9.145e+01 9.875e+01 1.374e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 18:39:40,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442250 2023-11-24 18:39:46,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2023-11-24 18:39:53,302 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9400, loss[loss=0.0422, simple_loss=0.05154, pruned_loss=0.004778, audio_tagging_loss=0.01165, over 14821.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09016, pruned_loss=0.01288, audio_tagging_loss=0.008899, over 3049807.01 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:39:54,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2948386.6666666665, ans=0.125 2023-11-24 18:39:54,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2948386.6666666665, ans=0.04949747468305833 2023-11-24 18:40:15,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2948453.3333333335, ans=0.125 2023-11-24 18:40:31,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2948586.6666666665, ans=0.2 2023-11-24 18:40:35,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2948586.6666666665, ans=0.0 2023-11-24 18:40:41,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2948586.6666666665, ans=0.0 2023-11-24 18:40:42,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-24 18:40:44,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442300 2023-11-24 18:40:50,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2948653.3333333335, ans=0.09899494936611666 2023-11-24 18:40:54,220 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:40:56,574 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9450, loss[loss=0.07352, simple_loss=0.09841, pruned_loss=0.01363, audio_tagging_loss=0.01068, over 14756.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09084, pruned_loss=0.01301, audio_tagging_loss=0.009051, over 3048176.73 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:41:20,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=2948853.3333333335, ans=22.5 2023-11-24 18:41:23,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-24 18:41:26,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2948853.3333333335, ans=0.125 2023-11-24 18:41:30,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2948853.3333333335, ans=0.1 2023-11-24 18:41:33,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.857e+01 9.505e+01 1.052e+02 1.346e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-24 18:41:47,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442350 2023-11-24 18:41:52,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-24 18:41:59,893 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9500, loss[loss=0.05514, simple_loss=0.0708, pruned_loss=0.01286, audio_tagging_loss=0.006873, over 15406.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09112, pruned_loss=0.01305, audio_tagging_loss=0.009029, over 3047482.82 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:42:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2949053.3333333335, ans=0.0 2023-11-24 18:42:45,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2949253.3333333335, ans=0.0 2023-11-24 18:42:49,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442400 2023-11-24 18:42:49,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2949320.0, ans=0.1 2023-11-24 18:42:58,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2949320.0, ans=0.2 2023-11-24 18:42:59,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-24 18:43:01,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2949386.6666666665, ans=0.2 2023-11-24 18:43:02,619 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9550, loss[loss=0.0887, simple_loss=0.1147, pruned_loss=0.0217, audio_tagging_loss=0.009645, over 15219.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09098, pruned_loss=0.01286, audio_tagging_loss=0.009071, over 3049326.61 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:43:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=2949453.3333333335, ans=0.1 2023-11-24 18:43:21,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2949453.3333333335, ans=0.0 2023-11-24 18:43:22,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2949453.3333333335, ans=0.0 2023-11-24 18:43:25,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-24 18:43:28,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2949520.0, ans=10.0 2023-11-24 18:43:32,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2949520.0, ans=0.125 2023-11-24 18:43:38,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.675e+01 9.207e+01 9.929e+01 1.581e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-24 18:43:52,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442450 2023-11-24 18:44:04,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2023-11-24 18:44:04,421 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9600, loss[loss=0.08656, simple_loss=0.1213, pruned_loss=0.01761, audio_tagging_loss=0.008287, over 15204.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09103, pruned_loss=0.01275, audio_tagging_loss=0.009089, over 3045571.91 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:44:17,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2949786.6666666665, ans=0.0 2023-11-24 18:44:49,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2949920.0, ans=0.125 2023-11-24 18:44:52,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2949986.6666666665, ans=0.2 2023-11-24 18:44:54,659 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442500 2023-11-24 18:44:59,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2949986.6666666665, ans=0.125 2023-11-24 18:45:02,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2949986.6666666665, ans=0.025 2023-11-24 18:45:07,135 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9650, loss[loss=0.05545, simple_loss=0.07179, pruned_loss=0.009465, audio_tagging_loss=0.01009, over 14836.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09016, pruned_loss=0.01266, audio_tagging_loss=0.009056, over 3048216.49 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:45:10,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2950053.3333333335, ans=0.1 2023-11-24 18:45:15,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2950053.3333333335, ans=0.125 2023-11-24 18:45:22,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2950120.0, ans=0.0 2023-11-24 18:45:27,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2950120.0, ans=0.125 2023-11-24 18:45:35,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2950186.6666666665, ans=0.125 2023-11-24 18:45:37,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2950186.6666666665, ans=0.125 2023-11-24 18:45:40,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2950186.6666666665, ans=0.125 2023-11-24 18:45:43,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.425e+01 9.108e+01 9.681e+01 1.366e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-24 18:45:55,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-24 18:45:57,318 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442550 2023-11-24 18:46:09,222 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9700, loss[loss=0.08063, simple_loss=0.1097, pruned_loss=0.01844, audio_tagging_loss=0.007362, over 15221.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09058, pruned_loss=0.01275, audio_tagging_loss=0.008962, over 3053622.62 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:46:36,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2950520.0, ans=0.0 2023-11-24 18:46:43,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-24 18:46:49,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2950586.6666666665, ans=0.1 2023-11-24 18:46:53,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2950586.6666666665, ans=0.125 2023-11-24 18:47:00,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442600 2023-11-24 18:47:00,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2950653.3333333335, ans=0.04949747468305833 2023-11-24 18:47:12,412 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9750, loss[loss=0.06987, simple_loss=0.102, pruned_loss=0.01323, audio_tagging_loss=0.005633, over 14706.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09059, pruned_loss=0.01284, audio_tagging_loss=0.008894, over 3053249.42 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:47:41,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2950853.3333333335, ans=0.125 2023-11-24 18:47:42,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2023-11-24 18:47:48,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.439e+01 9.144e+01 9.970e+01 1.220e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 18:47:54,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2950920.0, ans=0.0 2023-11-24 18:48:01,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2950986.6666666665, ans=0.09899494936611666 2023-11-24 18:48:02,038 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442650 2023-11-24 18:48:14,349 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9800, loss[loss=0.07743, simple_loss=0.112, pruned_loss=0.01627, audio_tagging_loss=0.005156, over 14744.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09089, pruned_loss=0.01297, audio_tagging_loss=0.008769, over 3048383.71 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:48:18,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2951053.3333333335, ans=0.125 2023-11-24 18:48:19,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2951053.3333333335, ans=0.2 2023-11-24 18:48:22,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2951053.3333333335, ans=0.1 2023-11-24 18:48:48,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2951186.6666666665, ans=0.025 2023-11-24 18:49:03,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442700 2023-11-24 18:49:05,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2951320.0, ans=0.125 2023-11-24 18:49:09,070 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:49:16,067 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9850, loss[loss=0.07257, simple_loss=0.1039, pruned_loss=0.01438, audio_tagging_loss=0.006247, over 15780.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09048, pruned_loss=0.01308, audio_tagging_loss=0.008688, over 3044030.70 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:49:27,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2951453.3333333335, ans=0.125 2023-11-24 18:49:50,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-24 18:49:53,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.535e+01 9.041e+01 9.752e+01 1.279e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 18:50:01,053 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:50:02,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2951586.6666666665, ans=0.1 2023-11-24 18:50:05,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442750 2023-11-24 18:50:11,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-24 18:50:13,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2951653.3333333335, ans=0.125 2023-11-24 18:50:16,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2951720.0, ans=0.2 2023-11-24 18:50:17,752 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9900, loss[loss=0.06976, simple_loss=0.09871, pruned_loss=0.0118, audio_tagging_loss=0.008602, over 15668.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09042, pruned_loss=0.01289, audio_tagging_loss=0.008632, over 3046298.21 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:50:32,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2951786.6666666665, ans=0.5 2023-11-24 18:50:41,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2951786.6666666665, ans=0.125 2023-11-24 18:50:47,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2951853.3333333335, ans=0.04949747468305833 2023-11-24 18:50:48,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2951853.3333333335, ans=0.0 2023-11-24 18:51:02,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2951920.0, ans=0.2 2023-11-24 18:51:03,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2951920.0, ans=0.125 2023-11-24 18:51:04,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2951920.0, ans=0.1 2023-11-24 18:51:07,631 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442800 2023-11-24 18:51:12,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2951986.6666666665, ans=0.0 2023-11-24 18:51:12,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2951986.6666666665, ans=0.0 2023-11-24 18:51:19,592 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 9950, loss[loss=0.06353, simple_loss=0.08513, pruned_loss=0.01257, audio_tagging_loss=0.008393, over 16106.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09041, pruned_loss=0.01284, audio_tagging_loss=0.008699, over 3049071.34 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:51:28,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-24 18:51:57,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.463e+01 9.019e+01 9.662e+01 1.211e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-24 18:51:59,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-24 18:52:10,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442850 2023-11-24 18:52:16,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2952320.0, ans=0.2 2023-11-24 18:52:21,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2023-11-24 18:52:22,997 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10000, loss[loss=0.09488, simple_loss=0.1165, pruned_loss=0.02933, audio_tagging_loss=0.007285, over 14642.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09013, pruned_loss=0.01288, audio_tagging_loss=0.008771, over 3045078.67 frames. ], batch size: 52, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 18:52:44,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2952453.3333333335, ans=0.0 2023-11-24 18:52:44,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2952453.3333333335, ans=0.125 2023-11-24 18:52:54,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2952520.0, ans=0.0 2023-11-24 18:53:11,969 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442900 2023-11-24 18:53:13,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-24 18:53:24,187 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10050, loss[loss=0.05502, simple_loss=0.07465, pruned_loss=0.008642, audio_tagging_loss=0.009048, over 14992.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09067, pruned_loss=0.01289, audio_tagging_loss=0.008768, over 3049486.39 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:53:31,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-24 18:53:33,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2952720.0, ans=0.0 2023-11-24 18:53:36,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2952786.6666666665, ans=0.0 2023-11-24 18:53:40,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2952786.6666666665, ans=0.1 2023-11-24 18:53:43,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2952786.6666666665, ans=0.0 2023-11-24 18:53:50,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2952853.3333333335, ans=0.1 2023-11-24 18:54:02,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.442e+01 9.039e+01 9.729e+01 1.134e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 18:54:07,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2952920.0, ans=0.1 2023-11-24 18:54:13,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 442950 2023-11-24 18:54:25,232 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10100, loss[loss=0.08007, simple_loss=0.1064, pruned_loss=0.01872, audio_tagging_loss=0.008152, over 14458.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09064, pruned_loss=0.01284, audio_tagging_loss=0.008871, over 3054339.99 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:54:33,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2953053.3333333335, ans=0.2 2023-11-24 18:54:41,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2953120.0, ans=0.2 2023-11-24 18:55:14,620 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:55:14,698 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443000 2023-11-24 18:55:25,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2953320.0, ans=0.0 2023-11-24 18:55:28,591 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10150, loss[loss=0.06798, simple_loss=0.09486, pruned_loss=0.01219, audio_tagging_loss=0.008367, over 15643.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09159, pruned_loss=0.013, audio_tagging_loss=0.008834, over 3059372.58 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:55:34,799 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:55:56,461 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:56:06,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.628e+01 9.087e+01 9.762e+01 1.339e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 18:56:18,462 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443050 2023-11-24 18:56:28,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2953653.3333333335, ans=0.125 2023-11-24 18:56:30,710 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10200, loss[loss=0.06671, simple_loss=0.08881, pruned_loss=0.0125, audio_tagging_loss=0.009807, over 15403.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09156, pruned_loss=0.01317, audio_tagging_loss=0.008915, over 3058280.44 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:56:31,083 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 18:56:31,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-11-24 18:56:36,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2953720.0, ans=0.015 2023-11-24 18:56:52,738 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 18:56:54,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2953853.3333333335, ans=0.125 2023-11-24 18:57:00,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-24 18:57:05,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2953853.3333333335, ans=0.1 2023-11-24 18:57:15,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2953920.0, ans=0.2 2023-11-24 18:57:19,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2953986.6666666665, ans=0.125 2023-11-24 18:57:20,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443100 2023-11-24 18:57:32,184 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10250, loss[loss=0.05282, simple_loss=0.07018, pruned_loss=0.00762, audio_tagging_loss=0.01011, over 14215.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09102, pruned_loss=0.01309, audio_tagging_loss=0.008956, over 3047591.02 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:57:52,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-11-24 18:57:54,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2954120.0, ans=0.1 2023-11-24 18:58:11,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.337e+01 8.466e+01 9.132e+01 9.893e+01 1.266e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 18:58:13,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2954253.3333333335, ans=0.1 2023-11-24 18:58:22,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443150 2023-11-24 18:58:22,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2954320.0, ans=0.1 2023-11-24 18:58:28,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2954320.0, ans=0.0 2023-11-24 18:58:35,788 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10300, loss[loss=0.06172, simple_loss=0.08033, pruned_loss=0.01215, audio_tagging_loss=0.0094, over 14571.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09151, pruned_loss=0.01304, audio_tagging_loss=0.008908, over 3054805.38 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:59:25,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443200 2023-11-24 18:59:39,366 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10350, loss[loss=0.06014, simple_loss=0.07689, pruned_loss=0.01109, audio_tagging_loss=0.01061, over 15444.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.0917, pruned_loss=0.01304, audio_tagging_loss=0.008921, over 3050641.32 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 18:59:45,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2954720.0, ans=0.125 2023-11-24 18:59:50,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2954786.6666666665, ans=0.125 2023-11-24 18:59:55,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2954786.6666666665, ans=0.125 2023-11-24 18:59:56,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2954786.6666666665, ans=0.1 2023-11-24 18:59:57,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2954786.6666666665, ans=0.0 2023-11-24 18:59:59,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2954786.6666666665, ans=0.2 2023-11-24 19:00:01,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2954786.6666666665, ans=0.0 2023-11-24 19:00:17,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 8.606e+01 9.110e+01 9.960e+01 1.328e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-24 19:00:28,632 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443250 2023-11-24 19:00:32,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2954986.6666666665, ans=0.125 2023-11-24 19:00:40,312 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10400, loss[loss=0.03215, simple_loss=0.03062, pruned_loss=0.003944, audio_tagging_loss=0.0129, over 15130.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09139, pruned_loss=0.01312, audio_tagging_loss=0.008992, over 3045942.09 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:00:46,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2955053.3333333335, ans=0.1 2023-11-24 19:00:48,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2955053.3333333335, ans=0.125 2023-11-24 19:00:50,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2955053.3333333335, ans=0.0 2023-11-24 19:01:02,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2955120.0, ans=0.1 2023-11-24 19:01:03,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2955120.0, ans=0.0 2023-11-24 19:01:16,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2955186.6666666665, ans=0.0 2023-11-24 19:01:18,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2023-11-24 19:01:22,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=12.0 2023-11-24 19:01:29,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2023-11-24 19:01:30,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443300 2023-11-24 19:01:32,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2955320.0, ans=0.125 2023-11-24 19:01:35,640 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:01:39,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-24 19:01:43,030 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10450, loss[loss=0.06005, simple_loss=0.08199, pruned_loss=0.009074, audio_tagging_loss=0.00998, over 14845.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.0907, pruned_loss=0.01282, audio_tagging_loss=0.008935, over 3048226.63 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:01:46,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2955386.6666666665, ans=0.0 2023-11-24 19:01:52,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-24 19:01:56,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2955453.3333333335, ans=0.0 2023-11-24 19:01:57,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2955453.3333333335, ans=0.0 2023-11-24 19:01:59,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2955453.3333333335, ans=0.025 2023-11-24 19:02:22,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.670e+01 9.376e+01 1.004e+02 1.489e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-24 19:02:23,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2955586.6666666665, ans=0.125 2023-11-24 19:02:24,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2955586.6666666665, ans=0.0 2023-11-24 19:02:28,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2955586.6666666665, ans=0.125 2023-11-24 19:02:30,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2955586.6666666665, ans=0.125 2023-11-24 19:02:33,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443350 2023-11-24 19:02:36,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2955653.3333333335, ans=0.125 2023-11-24 19:02:36,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2955653.3333333335, ans=0.0 2023-11-24 19:02:45,214 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10500, loss[loss=0.0546, simple_loss=0.07542, pruned_loss=0.008486, audio_tagging_loss=0.008408, over 14738.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09076, pruned_loss=0.01292, audio_tagging_loss=0.00879, over 3053177.72 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:03:01,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2955786.6666666665, ans=0.0 2023-11-24 19:03:25,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2955920.0, ans=0.125 2023-11-24 19:03:35,024 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443400 2023-11-24 19:03:48,020 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10550, loss[loss=0.05811, simple_loss=0.07665, pruned_loss=0.01095, audio_tagging_loss=0.008836, over 15587.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09072, pruned_loss=0.01288, audio_tagging_loss=0.008705, over 3053537.81 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:03:48,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2956053.3333333335, ans=0.1 2023-11-24 19:03:59,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2956120.0, ans=0.125 2023-11-24 19:04:00,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2956120.0, ans=15.0 2023-11-24 19:04:06,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2956120.0, ans=0.125 2023-11-24 19:04:07,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2956120.0, ans=0.2 2023-11-24 19:04:13,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2956186.6666666665, ans=0.125 2023-11-24 19:04:16,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2956186.6666666665, ans=0.125 2023-11-24 19:04:21,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2956186.6666666665, ans=0.125 2023-11-24 19:04:29,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.503e+01 9.037e+01 9.861e+01 1.136e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 19:04:37,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443450 2023-11-24 19:04:37,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2956320.0, ans=0.1 2023-11-24 19:04:37,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2956320.0, ans=0.05 2023-11-24 19:04:37,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2956320.0, ans=0.125 2023-11-24 19:04:41,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2956320.0, ans=0.1 2023-11-24 19:04:49,851 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10600, loss[loss=0.05709, simple_loss=0.07598, pruned_loss=0.00933, audio_tagging_loss=0.009773, over 15405.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08977, pruned_loss=0.01265, audio_tagging_loss=0.008738, over 3052663.20 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:05:03,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2956453.3333333335, ans=0.1 2023-11-24 19:05:03,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-24 19:05:08,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2956453.3333333335, ans=0.07 2023-11-24 19:05:28,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2956586.6666666665, ans=0.125 2023-11-24 19:05:36,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-24 19:05:39,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443500 2023-11-24 19:05:51,792 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10650, loss[loss=0.06499, simple_loss=0.08606, pruned_loss=0.01408, audio_tagging_loss=0.007875, over 15021.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09044, pruned_loss=0.01294, audio_tagging_loss=0.008704, over 3051566.52 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:05:53,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2956720.0, ans=0.125 2023-11-24 19:05:57,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2956720.0, ans=0.1 2023-11-24 19:06:11,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2956786.6666666665, ans=0.1 2023-11-24 19:06:11,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-24 19:06:18,590 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:06:20,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2956853.3333333335, ans=0.2 2023-11-24 19:06:24,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2956853.3333333335, ans=0.2 2023-11-24 19:06:33,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.793e+01 9.512e+01 1.031e+02 1.281e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-24 19:06:42,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443550 2023-11-24 19:06:55,213 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10700, loss[loss=0.06128, simple_loss=0.08202, pruned_loss=0.01026, audio_tagging_loss=0.01001, over 14352.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09073, pruned_loss=0.01296, audio_tagging_loss=0.008766, over 3044905.40 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:07:19,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2957186.6666666665, ans=0.1 2023-11-24 19:07:20,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2957186.6666666665, ans=0.0 2023-11-24 19:07:22,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2957186.6666666665, ans=0.125 2023-11-24 19:07:44,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2957320.0, ans=0.1 2023-11-24 19:07:45,000 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443600 2023-11-24 19:07:46,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-11-24 19:07:57,780 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10750, loss[loss=0.05735, simple_loss=0.07544, pruned_loss=0.01147, audio_tagging_loss=0.00816, over 15181.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09057, pruned_loss=0.01291, audio_tagging_loss=0.008784, over 3059331.10 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 8.0 2023-11-24 19:08:00,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2957386.6666666665, ans=0.0 2023-11-24 19:08:06,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2957386.6666666665, ans=0.1 2023-11-24 19:08:32,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2957520.0, ans=0.0 2023-11-24 19:08:33,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2957586.6666666665, ans=0.1 2023-11-24 19:08:37,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.444e+01 9.135e+01 9.725e+01 3.439e+02, threshold=1.827e+02, percent-clipped=1.0 2023-11-24 19:08:46,970 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443650 2023-11-24 19:08:59,153 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10800, loss[loss=0.07119, simple_loss=0.1057, pruned_loss=0.01284, audio_tagging_loss=0.005485, over 15529.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09033, pruned_loss=0.01276, audio_tagging_loss=0.008777, over 3056705.70 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:09:10,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2957786.6666666665, ans=0.2 2023-11-24 19:09:38,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2023-11-24 19:09:48,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443700 2023-11-24 19:09:49,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2023-11-24 19:10:01,065 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10850, loss[loss=0.08166, simple_loss=0.1059, pruned_loss=0.02021, audio_tagging_loss=0.00848, over 16429.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09121, pruned_loss=0.01294, audio_tagging_loss=0.008783, over 3058418.88 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:10:03,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2958053.3333333335, ans=0.125 2023-11-24 19:10:10,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2958053.3333333335, ans=0.125 2023-11-24 19:10:13,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2958120.0, ans=0.0 2023-11-24 19:10:28,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2958186.6666666665, ans=0.2 2023-11-24 19:10:43,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.404e+01 9.026e+01 9.467e+01 1.240e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-24 19:10:51,557 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443750 2023-11-24 19:10:53,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-24 19:10:59,087 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 19:11:04,353 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10900, loss[loss=0.07222, simple_loss=0.09615, pruned_loss=0.01263, audio_tagging_loss=0.01151, over 15099.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09088, pruned_loss=0.01289, audio_tagging_loss=0.008873, over 3055149.56 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:11:16,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=12.0 2023-11-24 19:11:20,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2958453.3333333335, ans=0.2 2023-11-24 19:11:46,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2958586.6666666665, ans=0.1 2023-11-24 19:11:50,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2958586.6666666665, ans=0.125 2023-11-24 19:11:54,374 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443800 2023-11-24 19:12:01,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2958653.3333333335, ans=0.1 2023-11-24 19:12:06,961 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 10950, loss[loss=0.06622, simple_loss=0.0938, pruned_loss=0.01206, audio_tagging_loss=0.007259, over 16076.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09026, pruned_loss=0.0127, audio_tagging_loss=0.008894, over 3057968.61 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:12:48,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.471e+01 9.149e+01 9.860e+01 1.271e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 19:12:54,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-24 19:12:56,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443850 2023-11-24 19:13:01,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2958986.6666666665, ans=0.125 2023-11-24 19:13:08,961 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11000, loss[loss=0.06677, simple_loss=0.08648, pruned_loss=0.01298, audio_tagging_loss=0.01055, over 15485.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09015, pruned_loss=0.01281, audio_tagging_loss=0.008896, over 3056476.53 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:13:14,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2959053.3333333335, ans=0.2 2023-11-24 19:13:17,564 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 19:13:48,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2959253.3333333335, ans=0.07 2023-11-24 19:13:57,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2959320.0, ans=0.1 2023-11-24 19:13:58,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443900 2023-11-24 19:14:08,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2959320.0, ans=0.0 2023-11-24 19:14:10,678 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11050, loss[loss=0.07253, simple_loss=0.09368, pruned_loss=0.01345, audio_tagging_loss=0.01224, over 15030.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09074, pruned_loss=0.01288, audio_tagging_loss=0.008966, over 3051449.80 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:14:21,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2959386.6666666665, ans=0.1 2023-11-24 19:14:34,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2959453.3333333335, ans=0.125 2023-11-24 19:14:34,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2959453.3333333335, ans=0.125 2023-11-24 19:14:39,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2959520.0, ans=0.125 2023-11-24 19:14:44,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2959520.0, ans=0.0 2023-11-24 19:14:52,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.793e+01 9.326e+01 1.000e+02 1.252e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 19:14:52,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2959586.6666666665, ans=0.125 2023-11-24 19:15:01,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 443950 2023-11-24 19:15:04,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2959653.3333333335, ans=0.125 2023-11-24 19:15:09,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2959653.3333333335, ans=0.125 2023-11-24 19:15:14,594 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11100, loss[loss=0.06193, simple_loss=0.07063, pruned_loss=0.01167, audio_tagging_loss=0.01494, over 14830.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09138, pruned_loss=0.01306, audio_tagging_loss=0.009064, over 3046804.46 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:15:29,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2959786.6666666665, ans=0.0 2023-11-24 19:15:34,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2959786.6666666665, ans=0.0 2023-11-24 19:15:51,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2959920.0, ans=0.1 2023-11-24 19:15:54,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2959920.0, ans=0.0 2023-11-24 19:16:04,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444000 2023-11-24 19:16:21,561 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11150, loss[loss=0.08139, simple_loss=0.104, pruned_loss=0.01801, audio_tagging_loss=0.01136, over 15587.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09079, pruned_loss=0.01308, audio_tagging_loss=0.009129, over 3049408.15 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:16:25,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2960053.3333333335, ans=0.0 2023-11-24 19:16:26,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2960053.3333333335, ans=0.02 2023-11-24 19:17:02,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.502e+01 9.253e+01 1.001e+02 1.275e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 19:17:08,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2960253.3333333335, ans=0.125 2023-11-24 19:17:11,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444050 2023-11-24 19:17:13,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2960320.0, ans=0.125 2023-11-24 19:17:18,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2960320.0, ans=0.09899494936611666 2023-11-24 19:17:23,206 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11200, loss[loss=0.05889, simple_loss=0.07909, pruned_loss=0.01092, audio_tagging_loss=0.008427, over 15637.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09006, pruned_loss=0.0129, audio_tagging_loss=0.009205, over 3056276.15 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:17:54,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2960520.0, ans=0.0 2023-11-24 19:17:59,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2960520.0, ans=0.0 2023-11-24 19:18:06,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2960586.6666666665, ans=0.0 2023-11-24 19:18:06,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2960586.6666666665, ans=0.125 2023-11-24 19:18:13,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444100 2023-11-24 19:18:23,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-11-24 19:18:26,582 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11250, loss[loss=0.0601, simple_loss=0.08192, pruned_loss=0.01192, audio_tagging_loss=0.007218, over 14281.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09012, pruned_loss=0.01296, audio_tagging_loss=0.009133, over 3048422.28 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:18:28,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-24 19:18:52,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2960853.3333333335, ans=0.125 2023-11-24 19:19:07,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2960920.0, ans=0.1 2023-11-24 19:19:08,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.458e+01 9.183e+01 1.013e+02 1.310e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-24 19:19:16,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444150 2023-11-24 19:19:16,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2960986.6666666665, ans=0.1 2023-11-24 19:19:23,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-24 19:19:25,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2960986.6666666665, ans=0.1 2023-11-24 19:19:28,392 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11300, loss[loss=0.0624, simple_loss=0.08507, pruned_loss=0.0126, audio_tagging_loss=0.00726, over 15053.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09076, pruned_loss=0.01304, audio_tagging_loss=0.008906, over 3046007.09 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:19:50,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2023-11-24 19:20:14,989 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:20:18,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444200 2023-11-24 19:20:28,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2961320.0, ans=0.1 2023-11-24 19:20:30,535 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11350, loss[loss=0.03929, simple_loss=0.04084, pruned_loss=0.006011, audio_tagging_loss=0.01286, over 14944.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08956, pruned_loss=0.0129, audio_tagging_loss=0.008949, over 3046602.04 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:20:38,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2961386.6666666665, ans=0.125 2023-11-24 19:20:43,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-11-24 19:20:52,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2961453.3333333335, ans=0.2 2023-11-24 19:21:07,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2961586.6666666665, ans=0.125 2023-11-24 19:21:10,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2961586.6666666665, ans=0.125 2023-11-24 19:21:12,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.844e+01 9.278e+01 1.039e+02 2.071e+02, threshold=1.856e+02, percent-clipped=1.0 2023-11-24 19:21:20,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444250 2023-11-24 19:21:32,750 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11400, loss[loss=0.06797, simple_loss=0.09478, pruned_loss=0.01225, audio_tagging_loss=0.008333, over 14280.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08957, pruned_loss=0.01292, audio_tagging_loss=0.008801, over 3046999.06 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:21:59,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-24 19:22:14,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2961920.0, ans=0.2 2023-11-24 19:22:14,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2961920.0, ans=0.2 2023-11-24 19:22:19,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2961920.0, ans=0.04949747468305833 2023-11-24 19:22:23,243 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444300 2023-11-24 19:22:36,133 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11450, loss[loss=0.06277, simple_loss=0.08171, pruned_loss=0.01244, audio_tagging_loss=0.009476, over 16007.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0892, pruned_loss=0.01276, audio_tagging_loss=0.00877, over 3050665.12 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:22:39,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2962053.3333333335, ans=0.1 2023-11-24 19:22:45,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-24 19:22:49,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2962120.0, ans=0.1 2023-11-24 19:23:18,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.582e+01 9.161e+01 1.011e+02 1.144e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 19:23:26,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444350 2023-11-24 19:23:29,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2962320.0, ans=0.07 2023-11-24 19:23:38,057 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11500, loss[loss=0.08534, simple_loss=0.1141, pruned_loss=0.0207, audio_tagging_loss=0.007593, over 15663.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09015, pruned_loss=0.01292, audio_tagging_loss=0.008725, over 3049051.34 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:23:38,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2962386.6666666665, ans=0.2 2023-11-24 19:23:41,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2962386.6666666665, ans=0.0 2023-11-24 19:23:42,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-24 19:23:46,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-24 19:24:18,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2962586.6666666665, ans=0.0 2023-11-24 19:24:28,134 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444400 2023-11-24 19:24:38,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2962653.3333333335, ans=0.1 2023-11-24 19:24:40,761 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11550, loss[loss=0.06746, simple_loss=0.08919, pruned_loss=0.01497, audio_tagging_loss=0.007898, over 14390.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08925, pruned_loss=0.01278, audio_tagging_loss=0.008812, over 3054244.25 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:24:51,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2023-11-24 19:24:53,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2962786.6666666665, ans=0.125 2023-11-24 19:24:54,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-24 19:24:59,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2962786.6666666665, ans=0.125 2023-11-24 19:25:17,130 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 19:25:23,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.528e+01 9.073e+01 9.782e+01 1.310e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-24 19:25:28,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2962920.0, ans=15.0 2023-11-24 19:25:31,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444450 2023-11-24 19:25:43,431 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11600, loss[loss=0.07571, simple_loss=0.1038, pruned_loss=0.01509, audio_tagging_loss=0.008703, over 15913.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08939, pruned_loss=0.01273, audio_tagging_loss=0.008829, over 3053105.72 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:25:46,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2963053.3333333335, ans=0.1 2023-11-24 19:26:12,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-11-24 19:26:18,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2963186.6666666665, ans=0.0 2023-11-24 19:26:18,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-11-24 19:26:33,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444500 2023-11-24 19:26:33,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=12.0 2023-11-24 19:26:38,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2963320.0, ans=0.2 2023-11-24 19:26:45,180 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11650, loss[loss=0.04767, simple_loss=0.06475, pruned_loss=0.006611, audio_tagging_loss=0.008683, over 16092.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08969, pruned_loss=0.01266, audio_tagging_loss=0.00883, over 3055253.11 frames. ], batch size: 62, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:26:53,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2963386.6666666665, ans=0.0 2023-11-24 19:26:55,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-24 19:27:03,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2963453.3333333335, ans=0.125 2023-11-24 19:27:03,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2963453.3333333335, ans=0.2 2023-11-24 19:27:06,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2963453.3333333335, ans=0.125 2023-11-24 19:27:13,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2963520.0, ans=0.025 2023-11-24 19:27:13,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2963520.0, ans=0.0 2023-11-24 19:27:28,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.230e+01 8.998e+01 9.783e+01 1.223e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-24 19:27:34,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444550 2023-11-24 19:27:46,802 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11700, loss[loss=0.06905, simple_loss=0.1023, pruned_loss=0.01198, audio_tagging_loss=0.005938, over 14779.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09076, pruned_loss=0.0127, audio_tagging_loss=0.008703, over 3049322.81 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:27:49,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2963720.0, ans=0.025 2023-11-24 19:27:56,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-24 19:28:36,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444600 2023-11-24 19:28:44,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2963986.6666666665, ans=0.1 2023-11-24 19:28:47,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2963986.6666666665, ans=0.125 2023-11-24 19:28:49,222 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11750, loss[loss=0.06153, simple_loss=0.08022, pruned_loss=0.009486, audio_tagging_loss=0.01194, over 14940.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09037, pruned_loss=0.01266, audio_tagging_loss=0.008815, over 3045966.10 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:29:11,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2964120.0, ans=0.0 2023-11-24 19:29:14,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2964186.6666666665, ans=0.0 2023-11-24 19:29:15,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2964186.6666666665, ans=0.0 2023-11-24 19:29:32,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.583e+01 9.295e+01 1.006e+02 1.151e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-24 19:29:38,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444650 2023-11-24 19:29:52,052 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11800, loss[loss=0.06685, simple_loss=0.0894, pruned_loss=0.01298, audio_tagging_loss=0.009172, over 14558.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.08998, pruned_loss=0.01284, audio_tagging_loss=0.008971, over 3044828.15 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:29:54,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2964386.6666666665, ans=0.125 2023-11-24 19:29:59,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2964386.6666666665, ans=0.1 2023-11-24 19:30:01,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2964386.6666666665, ans=0.1 2023-11-24 19:30:03,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-24 19:30:05,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2964453.3333333335, ans=0.0 2023-11-24 19:30:09,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2964453.3333333335, ans=0.125 2023-11-24 19:30:13,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-24 19:30:37,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2964586.6666666665, ans=0.0 2023-11-24 19:30:42,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444700 2023-11-24 19:30:54,557 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11850, loss[loss=0.08541, simple_loss=0.1151, pruned_loss=0.0199, audio_tagging_loss=0.007962, over 15313.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08984, pruned_loss=0.01278, audio_tagging_loss=0.009021, over 3038401.99 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:30:55,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2023-11-24 19:30:56,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-24 19:31:10,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2964786.6666666665, ans=0.125 2023-11-24 19:31:12,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2964786.6666666665, ans=0.0 2023-11-24 19:31:31,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2964920.0, ans=0.0 2023-11-24 19:31:32,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2964920.0, ans=0.2 2023-11-24 19:31:37,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.569e+01 9.195e+01 9.795e+01 1.501e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 19:31:44,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444750 2023-11-24 19:31:56,647 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11900, loss[loss=0.0579, simple_loss=0.07791, pruned_loss=0.01086, audio_tagging_loss=0.008087, over 15930.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08996, pruned_loss=0.01282, audio_tagging_loss=0.009005, over 3043687.47 frames. ], batch size: 62, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:32:12,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2965120.0, ans=0.0 2023-11-24 19:32:20,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2965186.6666666665, ans=0.1 2023-11-24 19:32:46,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444800 2023-11-24 19:32:59,207 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 11950, loss[loss=0.06577, simple_loss=0.09026, pruned_loss=0.01232, audio_tagging_loss=0.008318, over 15583.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08993, pruned_loss=0.01285, audio_tagging_loss=0.009094, over 3046526.99 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2023-11-24 19:33:10,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-24 19:33:14,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-24 19:33:18,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2965453.3333333335, ans=0.125 2023-11-24 19:33:23,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2965520.0, ans=0.0 2023-11-24 19:33:34,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2965520.0, ans=0.125 2023-11-24 19:33:41,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.644e+01 9.247e+01 9.996e+01 1.290e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 19:33:47,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444850 2023-11-24 19:33:59,351 INFO [train_asr.py:1221] (1/4) Epoch 37, batch 12000, loss[loss=0.09675, simple_loss=0.1417, pruned_loss=0.01929, audio_tagging_loss=0.006608, over 16173.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.08964, pruned_loss=0.01275, audio_tagging_loss=0.009277, over 3046964.02 frames. ], batch size: 60, lr: 1.82e-03, grad_scale: 32.0 2023-11-24 19:33:59,352 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 19:34:20,272 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.4901, 6.3200, 6.1449, 6.1431], device='cuda:1') 2023-11-24 19:34:23,025 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3569, 3.5388, 3.0992, 3.1501], device='cuda:1') 2023-11-24 19:34:29,458 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8012, 5.8412, 5.8977, 5.8679], device='cuda:1') 2023-11-24 19:34:41,831 INFO [train_asr.py:1253] (1/4) Epoch 37, validation: loss=0.058, simple_loss=0.05081, pruned_loss=0.005169, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-24 19:34:41,832 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 19:34:42,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2965720.0, ans=0.125 2023-11-24 19:34:44,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2965720.0, ans=0.1 2023-11-24 19:34:55,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2965786.6666666665, ans=0.1 2023-11-24 19:35:40,407 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 0, loss[loss=0.0736, simple_loss=0.0897, pruned_loss=0.009527, audio_tagging_loss=0.01923, over 15524.00 frames. ], tot_loss[loss=0.0736, simple_loss=0.0897, pruned_loss=0.009527, audio_tagging_loss=0.01923, over 15524.00 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:35:40,408 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 19:36:03,136 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9647, 3.1571, 2.8980, 3.1910, 3.3648, 2.8021, 3.3548, 2.6402], device='cuda:1') 2023-11-24 19:36:08,166 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3409, 5.0181, 4.6371, 5.1919], device='cuda:1') 2023-11-24 19:36:09,537 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1688, 3.9407, 3.7318, 3.2037], device='cuda:1') 2023-11-24 19:36:10,285 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8105, 4.9501, 5.1251, 4.9062], device='cuda:1') 2023-11-24 19:36:12,241 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3347, 5.0151, 4.6384, 5.1534], device='cuda:1') 2023-11-24 19:36:13,034 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8057, 4.9636, 5.0644, 4.9161], device='cuda:1') 2023-11-24 19:36:16,561 INFO [train_asr.py:1253] (1/4) Epoch 38, validation: loss=0.05758, simple_loss=0.05072, pruned_loss=0.005057, audio_tagging_loss=0.02716, over 4681554.00 frames. 2023-11-24 19:36:16,562 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 19:36:20,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2965873.3333333335, ans=0.2 2023-11-24 19:36:32,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2965940.0, ans=0.0 2023-11-24 19:36:37,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444900 2023-11-24 19:37:17,824 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:37:18,849 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 50, loss[loss=0.06852, simple_loss=0.08647, pruned_loss=0.008477, audio_tagging_loss=0.0168, over 15773.00 frames. ], tot_loss[loss=0.07463, simple_loss=0.08839, pruned_loss=0.01297, audio_tagging_loss=0.01747, over 686340.95 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:37:19,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2966206.6666666665, ans=0.0 2023-11-24 19:37:20,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2966206.6666666665, ans=0.125 2023-11-24 19:37:21,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2966206.6666666665, ans=0.0 2023-11-24 19:37:28,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2966206.6666666665, ans=0.1 2023-11-24 19:37:35,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.245e+01 9.441e+01 1.025e+02 1.118e+02 1.388e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-24 19:37:40,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 444950 2023-11-24 19:37:53,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2966340.0, ans=0.0 2023-11-24 19:37:53,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-24 19:37:55,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-24 19:38:00,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2966406.6666666665, ans=0.025 2023-11-24 19:38:18,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2966473.3333333335, ans=0.125 2023-11-24 19:38:18,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2966473.3333333335, ans=0.0 2023-11-24 19:38:21,596 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 100, loss[loss=0.06433, simple_loss=0.07909, pruned_loss=0.007701, audio_tagging_loss=0.01708, over 14756.00 frames. ], tot_loss[loss=0.07513, simple_loss=0.09021, pruned_loss=0.01342, audio_tagging_loss=0.01661, over 1207842.78 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:38:23,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=12.0 2023-11-24 19:38:28,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2966540.0, ans=0.035 2023-11-24 19:38:43,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445000 2023-11-24 19:39:07,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-24 19:39:15,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2966806.6666666665, ans=0.125 2023-11-24 19:39:24,348 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 150, loss[loss=0.07294, simple_loss=0.1001, pruned_loss=0.01397, audio_tagging_loss=0.008906, over 13886.00 frames. ], tot_loss[loss=0.07465, simple_loss=0.09325, pruned_loss=0.01346, audio_tagging_loss=0.01457, over 1618731.16 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:39:39,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 9.070e+01 9.567e+01 1.041e+02 1.259e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-24 19:39:40,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2966940.0, ans=0.0 2023-11-24 19:39:44,642 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445050 2023-11-24 19:39:47,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2967006.6666666665, ans=0.125 2023-11-24 19:40:05,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2967073.3333333335, ans=0.125 2023-11-24 19:40:26,142 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 200, loss[loss=0.04909, simple_loss=0.06212, pruned_loss=0.009475, audio_tagging_loss=0.008558, over 14830.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.09115, pruned_loss=0.01306, audio_tagging_loss=0.01303, over 1933817.11 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:40:29,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2023-11-24 19:40:32,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-24 19:40:33,651 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:40:48,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445100 2023-11-24 19:40:49,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2967273.3333333335, ans=0.035 2023-11-24 19:40:54,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2967340.0, ans=0.125 2023-11-24 19:41:28,675 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 250, loss[loss=0.05357, simple_loss=0.06743, pruned_loss=0.009873, audio_tagging_loss=0.009979, over 15304.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.09086, pruned_loss=0.01275, audio_tagging_loss=0.0118, over 2179053.07 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:41:38,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2967540.0, ans=0.1 2023-11-24 19:41:39,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-24 19:41:43,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2967606.6666666665, ans=0.04949747468305833 2023-11-24 19:41:45,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.613e+01 9.234e+01 1.003e+02 1.300e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 19:41:50,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445150 2023-11-24 19:41:50,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2967606.6666666665, ans=0.125 2023-11-24 19:42:01,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2967673.3333333335, ans=0.125 2023-11-24 19:42:19,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2967806.6666666665, ans=0.125 2023-11-24 19:42:24,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2967806.6666666665, ans=0.04949747468305833 2023-11-24 19:42:31,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-11-24 19:42:31,600 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 300, loss[loss=0.06169, simple_loss=0.0826, pruned_loss=0.01064, audio_tagging_loss=0.009745, over 14748.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.09036, pruned_loss=0.01267, audio_tagging_loss=0.0111, over 2370615.45 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:42:42,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2967873.3333333335, ans=0.125 2023-11-24 19:42:52,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445200 2023-11-24 19:42:52,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-24 19:43:03,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2968006.6666666665, ans=0.0 2023-11-24 19:43:11,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2968073.3333333335, ans=0.0 2023-11-24 19:43:13,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2968073.3333333335, ans=0.2 2023-11-24 19:43:34,418 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 350, loss[loss=0.05424, simple_loss=0.06837, pruned_loss=0.008064, audio_tagging_loss=0.01199, over 15944.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09064, pruned_loss=0.01258, audio_tagging_loss=0.0104, over 2521226.54 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:43:40,828 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:43:42,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-24 19:43:45,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-11-24 19:43:51,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.795e+01 8.774e+01 9.424e+01 1.013e+02 1.303e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-24 19:43:54,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2968273.3333333335, ans=0.125 2023-11-24 19:43:55,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445250 2023-11-24 19:44:09,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-24 19:44:30,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2968473.3333333335, ans=0.1 2023-11-24 19:44:30,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2968473.3333333335, ans=0.2 2023-11-24 19:44:31,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2968473.3333333335, ans=0.1 2023-11-24 19:44:34,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2968473.3333333335, ans=0.0 2023-11-24 19:44:36,724 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 400, loss[loss=0.06291, simple_loss=0.08423, pruned_loss=0.01175, audio_tagging_loss=0.00905, over 15394.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09096, pruned_loss=0.01267, audio_tagging_loss=0.009989, over 2644461.96 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:44:37,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2968540.0, ans=15.0 2023-11-24 19:44:40,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2968540.0, ans=0.2 2023-11-24 19:44:43,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2968540.0, ans=0.2 2023-11-24 19:44:58,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445300 2023-11-24 19:45:03,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2968673.3333333335, ans=15.0 2023-11-24 19:45:09,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2968673.3333333335, ans=0.2 2023-11-24 19:45:16,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-11-24 19:45:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2968740.0, ans=0.125 2023-11-24 19:45:39,741 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 450, loss[loss=0.06311, simple_loss=0.08731, pruned_loss=0.009839, audio_tagging_loss=0.009616, over 14930.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09076, pruned_loss=0.01261, audio_tagging_loss=0.009688, over 2727222.94 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:45:46,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2968873.3333333335, ans=0.0 2023-11-24 19:45:56,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.512e+01 9.246e+01 9.935e+01 1.248e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 19:46:00,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445350 2023-11-24 19:46:07,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2969006.6666666665, ans=0.125 2023-11-24 19:46:17,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2969073.3333333335, ans=0.0 2023-11-24 19:46:41,815 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 500, loss[loss=0.05114, simple_loss=0.07105, pruned_loss=0.007086, audio_tagging_loss=0.008535, over 15381.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09079, pruned_loss=0.01281, audio_tagging_loss=0.009455, over 2798152.08 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:46:54,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2969273.3333333335, ans=0.1 2023-11-24 19:47:00,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2969273.3333333335, ans=0.125 2023-11-24 19:47:01,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-24 19:47:03,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445400 2023-11-24 19:47:04,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2969273.3333333335, ans=0.0 2023-11-24 19:47:07,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2969340.0, ans=0.0 2023-11-24 19:47:16,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2969340.0, ans=0.05 2023-11-24 19:47:16,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2969340.0, ans=0.125 2023-11-24 19:47:44,338 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 550, loss[loss=0.07091, simple_loss=0.08869, pruned_loss=0.01887, audio_tagging_loss=0.007698, over 14626.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09076, pruned_loss=0.01286, audio_tagging_loss=0.009434, over 2853125.50 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:47:46,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2969540.0, ans=0.1 2023-11-24 19:47:53,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2969540.0, ans=0.125 2023-11-24 19:47:56,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2969606.6666666665, ans=0.125 2023-11-24 19:48:04,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.415e+01 9.096e+01 9.896e+01 1.420e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-24 19:48:06,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445450 2023-11-24 19:48:14,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2969673.3333333335, ans=0.0 2023-11-24 19:48:47,446 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 600, loss[loss=0.06043, simple_loss=0.08186, pruned_loss=0.008857, audio_tagging_loss=0.01064, over 15379.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09005, pruned_loss=0.01269, audio_tagging_loss=0.009401, over 2889173.47 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:48:51,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2969873.3333333335, ans=0.0 2023-11-24 19:48:55,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2969873.3333333335, ans=0.125 2023-11-24 19:48:55,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2969873.3333333335, ans=0.1 2023-11-24 19:48:56,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-24 19:49:01,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2969940.0, ans=0.125 2023-11-24 19:49:08,918 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445500 2023-11-24 19:49:12,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2970006.6666666665, ans=0.1 2023-11-24 19:49:13,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2970006.6666666665, ans=0.2 2023-11-24 19:49:16,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2970006.6666666665, ans=0.2 2023-11-24 19:49:21,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2970006.6666666665, ans=0.07 2023-11-24 19:49:25,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2970073.3333333335, ans=0.1 2023-11-24 19:49:25,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2970073.3333333335, ans=0.1 2023-11-24 19:49:34,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2970073.3333333335, ans=0.0 2023-11-24 19:49:49,970 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 650, loss[loss=0.08098, simple_loss=0.1072, pruned_loss=0.01813, audio_tagging_loss=0.009237, over 15540.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09064, pruned_loss=0.01283, audio_tagging_loss=0.009291, over 2927978.66 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:49:52,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2970206.6666666665, ans=0.2 2023-11-24 19:49:58,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2970206.6666666665, ans=0.125 2023-11-24 19:50:07,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.559e+01 9.252e+01 1.019e+02 1.352e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 19:50:08,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2970273.3333333335, ans=0.0 2023-11-24 19:50:10,975 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445550 2023-11-24 19:50:41,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2970473.3333333335, ans=0.5 2023-11-24 19:50:50,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2970540.0, ans=0.125 2023-11-24 19:50:51,217 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 700, loss[loss=0.06707, simple_loss=0.09271, pruned_loss=0.01157, audio_tagging_loss=0.009149, over 15130.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08966, pruned_loss=0.01271, audio_tagging_loss=0.009153, over 2952635.03 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:51:08,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2970606.6666666665, ans=0.125 2023-11-24 19:51:13,019 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445600 2023-11-24 19:51:25,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2970673.3333333335, ans=0.09899494936611666 2023-11-24 19:51:35,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2970740.0, ans=0.04949747468305833 2023-11-24 19:51:43,561 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 19:51:46,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2970806.6666666665, ans=0.125 2023-11-24 19:51:55,293 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 750, loss[loss=0.0753, simple_loss=0.09833, pruned_loss=0.01665, audio_tagging_loss=0.00948, over 15435.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09038, pruned_loss=0.01283, audio_tagging_loss=0.009107, over 2978335.72 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:52:06,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-24 19:52:09,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2970940.0, ans=0.125 2023-11-24 19:52:14,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.682e+01 9.156e+01 1.004e+02 1.177e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 19:52:16,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445650 2023-11-24 19:52:22,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-24 19:52:26,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2971006.6666666665, ans=0.0 2023-11-24 19:52:31,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2971073.3333333335, ans=0.125 2023-11-24 19:52:58,210 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 800, loss[loss=0.06792, simple_loss=0.08819, pruned_loss=0.01304, audio_tagging_loss=0.01078, over 14229.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09095, pruned_loss=0.0129, audio_tagging_loss=0.009145, over 2996064.80 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:52:58,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2971206.6666666665, ans=0.1 2023-11-24 19:53:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2971273.3333333335, ans=0.125 2023-11-24 19:53:19,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445700 2023-11-24 19:53:31,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2971340.0, ans=0.0 2023-11-24 19:53:48,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2971473.3333333335, ans=0.1 2023-11-24 19:53:56,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2971473.3333333335, ans=10.0 2023-11-24 19:54:00,940 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 850, loss[loss=0.06594, simple_loss=0.09403, pruned_loss=0.0114, audio_tagging_loss=0.007519, over 16216.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09219, pruned_loss=0.01314, audio_tagging_loss=0.009125, over 3006060.02 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 19:54:08,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2971540.0, ans=0.125 2023-11-24 19:54:19,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.517e+01 9.234e+01 9.613e+01 1.155e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-24 19:54:22,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445750 2023-11-24 19:54:46,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2971740.0, ans=0.2 2023-11-24 19:54:47,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-24 19:54:59,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2971806.6666666665, ans=0.1 2023-11-24 19:54:59,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-24 19:55:03,975 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 900, loss[loss=0.05099, simple_loss=0.06393, pruned_loss=0.008731, audio_tagging_loss=0.0103, over 15418.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09125, pruned_loss=0.01311, audio_tagging_loss=0.009208, over 3021231.68 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:55:12,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2971873.3333333335, ans=0.125 2023-11-24 19:55:12,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2971873.3333333335, ans=0.125 2023-11-24 19:55:25,467 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445800 2023-11-24 19:55:31,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2972006.6666666665, ans=0.125 2023-11-24 19:55:32,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2972006.6666666665, ans=0.2 2023-11-24 19:55:52,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2972073.3333333335, ans=0.2 2023-11-24 19:56:03,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2972140.0, ans=0.2 2023-11-24 19:56:07,359 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 950, loss[loss=0.07017, simple_loss=0.09458, pruned_loss=0.01613, audio_tagging_loss=0.006749, over 14614.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09115, pruned_loss=0.01301, audio_tagging_loss=0.009163, over 3031584.79 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:56:11,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2972206.6666666665, ans=0.0 2023-11-24 19:56:26,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.519e+01 9.243e+01 9.717e+01 1.374e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-24 19:56:28,169 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445850 2023-11-24 19:56:30,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2972340.0, ans=0.2 2023-11-24 19:56:38,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2972340.0, ans=0.125 2023-11-24 19:57:09,556 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1000, loss[loss=0.06323, simple_loss=0.08287, pruned_loss=0.01049, audio_tagging_loss=0.01131, over 14927.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08957, pruned_loss=0.01276, audio_tagging_loss=0.009084, over 3029019.99 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:57:21,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2972606.6666666665, ans=0.125 2023-11-24 19:57:21,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2972606.6666666665, ans=0.125 2023-11-24 19:57:30,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445900 2023-11-24 19:57:30,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2972606.6666666665, ans=0.125 2023-11-24 19:57:34,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2972673.3333333335, ans=0.0 2023-11-24 19:57:35,500 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 19:57:35,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2972673.3333333335, ans=0.2 2023-11-24 19:58:07,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-11-24 19:58:11,859 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1050, loss[loss=0.05779, simple_loss=0.07915, pruned_loss=0.009462, audio_tagging_loss=0.008755, over 15384.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09028, pruned_loss=0.01287, audio_tagging_loss=0.008956, over 3032990.57 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:58:17,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2972873.3333333335, ans=0.09899494936611666 2023-11-24 19:58:17,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-24 19:58:28,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-11-24 19:58:32,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.733e+01 9.359e+01 1.012e+02 1.444e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-24 19:58:33,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 445950 2023-11-24 19:58:47,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2973006.6666666665, ans=0.0 2023-11-24 19:59:14,420 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1100, loss[loss=0.05356, simple_loss=0.06693, pruned_loss=0.01204, audio_tagging_loss=0.008053, over 14724.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08966, pruned_loss=0.01272, audio_tagging_loss=0.008928, over 3032704.97 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 19:59:17,466 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 19:59:28,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2973273.3333333335, ans=0.0 2023-11-24 19:59:30,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2023-11-24 19:59:32,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2973273.3333333335, ans=0.125 2023-11-24 19:59:35,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446000 2023-11-24 19:59:37,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.21 vs. limit=10.0 2023-11-24 19:59:47,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-24 20:00:16,978 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1150, loss[loss=0.08311, simple_loss=0.1112, pruned_loss=0.01874, audio_tagging_loss=0.008789, over 15414.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09122, pruned_loss=0.01299, audio_tagging_loss=0.008754, over 3040434.72 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:00:34,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2973606.6666666665, ans=0.035 2023-11-24 20:00:37,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.560e+01 9.055e+01 9.595e+01 1.732e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-24 20:00:38,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446050 2023-11-24 20:00:59,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=22.5 2023-11-24 20:01:06,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-11-24 20:01:19,751 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1200, loss[loss=0.07915, simple_loss=0.1098, pruned_loss=0.01547, audio_tagging_loss=0.008773, over 15193.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09095, pruned_loss=0.01291, audio_tagging_loss=0.008758, over 3037914.21 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:01:41,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446100 2023-11-24 20:01:41,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2973940.0, ans=0.0 2023-11-24 20:01:47,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2974006.6666666665, ans=0.125 2023-11-24 20:02:17,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2974140.0, ans=0.2 2023-11-24 20:02:21,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2974206.6666666665, ans=0.125 2023-11-24 20:02:22,288 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1250, loss[loss=0.06619, simple_loss=0.09067, pruned_loss=0.009874, audio_tagging_loss=0.01098, over 14284.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09057, pruned_loss=0.01276, audio_tagging_loss=0.008864, over 3041417.31 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:02:26,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2974206.6666666665, ans=0.0 2023-11-24 20:02:42,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.510e+01 9.127e+01 9.845e+01 1.206e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-24 20:02:42,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446150 2023-11-24 20:02:45,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2974340.0, ans=0.09899494936611666 2023-11-24 20:03:15,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2974473.3333333335, ans=0.0 2023-11-24 20:03:24,240 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1300, loss[loss=0.09072, simple_loss=0.1269, pruned_loss=0.01856, audio_tagging_loss=0.008701, over 14923.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09049, pruned_loss=0.01268, audio_tagging_loss=0.008865, over 3037729.10 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:03:46,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446200 2023-11-24 20:03:46,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-11-24 20:04:04,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2974740.0, ans=0.5 2023-11-24 20:04:20,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2023-11-24 20:04:26,369 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1350, loss[loss=0.08087, simple_loss=0.1155, pruned_loss=0.01437, audio_tagging_loss=0.008729, over 15392.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09059, pruned_loss=0.01265, audio_tagging_loss=0.008817, over 3035088.32 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:04:41,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2974940.0, ans=0.0 2023-11-24 20:04:48,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.554e+01 9.036e+01 9.639e+01 2.297e+02, threshold=1.807e+02, percent-clipped=1.0 2023-11-24 20:04:48,205 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446250 2023-11-24 20:05:06,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2975073.3333333335, ans=0.2 2023-11-24 20:05:10,665 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:05:12,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2975073.3333333335, ans=0.125 2023-11-24 20:05:22,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-24 20:05:29,108 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1400, loss[loss=0.05839, simple_loss=0.08218, pruned_loss=0.00927, audio_tagging_loss=0.00803, over 15051.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09022, pruned_loss=0.01253, audio_tagging_loss=0.008962, over 3036090.16 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:05:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2975206.6666666665, ans=0.1 2023-11-24 20:05:34,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2975206.6666666665, ans=0.09899494936611666 2023-11-24 20:05:36,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2975206.6666666665, ans=0.0 2023-11-24 20:05:46,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2975273.3333333335, ans=0.1 2023-11-24 20:05:49,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446300 2023-11-24 20:06:31,099 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1450, loss[loss=0.05496, simple_loss=0.07224, pruned_loss=0.00921, audio_tagging_loss=0.009629, over 15948.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09091, pruned_loss=0.01275, audio_tagging_loss=0.009027, over 3037524.74 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:06:38,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 20:06:45,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2975606.6666666665, ans=0.125 2023-11-24 20:06:45,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2975606.6666666665, ans=10.0 2023-11-24 20:06:51,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.634e+01 9.284e+01 1.040e+02 1.664e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-24 20:06:51,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446350 2023-11-24 20:07:03,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2975673.3333333335, ans=0.0 2023-11-24 20:07:11,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2975740.0, ans=0.2 2023-11-24 20:07:33,034 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1500, loss[loss=0.07336, simple_loss=0.08823, pruned_loss=0.01742, audio_tagging_loss=0.01183, over 14115.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09155, pruned_loss=0.01295, audio_tagging_loss=0.008975, over 3042739.69 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:07:46,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2975940.0, ans=0.125 2023-11-24 20:07:47,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2975940.0, ans=0.0 2023-11-24 20:07:54,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-24 20:07:54,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446400 2023-11-24 20:08:00,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2976006.6666666665, ans=0.2 2023-11-24 20:08:04,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2976006.6666666665, ans=0.05 2023-11-24 20:08:15,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2976073.3333333335, ans=0.1 2023-11-24 20:08:15,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2976073.3333333335, ans=0.05 2023-11-24 20:08:23,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2976140.0, ans=0.2 2023-11-24 20:08:30,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2976140.0, ans=0.0 2023-11-24 20:08:34,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2976140.0, ans=0.0 2023-11-24 20:08:36,373 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1550, loss[loss=0.06067, simple_loss=0.07776, pruned_loss=0.0123, audio_tagging_loss=0.00949, over 15189.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09185, pruned_loss=0.01296, audio_tagging_loss=0.008967, over 3042763.52 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:08:40,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2976206.6666666665, ans=0.125 2023-11-24 20:08:44,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2976206.6666666665, ans=0.125 2023-11-24 20:08:56,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446450 2023-11-24 20:08:58,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.670e+01 9.483e+01 1.014e+02 1.933e+02, threshold=1.897e+02, percent-clipped=2.0 2023-11-24 20:09:06,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2023-11-24 20:09:09,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2976340.0, ans=0.2 2023-11-24 20:09:35,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2976473.3333333335, ans=0.0 2023-11-24 20:09:37,979 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1600, loss[loss=0.06787, simple_loss=0.08893, pruned_loss=0.0145, audio_tagging_loss=0.008905, over 14928.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09247, pruned_loss=0.01307, audio_tagging_loss=0.008981, over 3041825.75 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:09:44,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2976540.0, ans=0.125 2023-11-24 20:09:54,821 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 20:09:57,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-24 20:09:58,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446500 2023-11-24 20:10:01,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2976673.3333333335, ans=0.125 2023-11-24 20:10:12,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2976673.3333333335, ans=0.1 2023-11-24 20:10:17,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2976740.0, ans=0.2 2023-11-24 20:10:30,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2976806.6666666665, ans=0.0 2023-11-24 20:10:39,340 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1650, loss[loss=0.05158, simple_loss=0.06824, pruned_loss=0.008807, audio_tagging_loss=0.00865, over 15078.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09257, pruned_loss=0.01323, audio_tagging_loss=0.009009, over 3042722.95 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:10:39,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2976873.3333333335, ans=0.125 2023-11-24 20:10:52,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2976940.0, ans=0.1 2023-11-24 20:11:01,277 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446550 2023-11-24 20:11:02,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.733e+01 9.209e+01 9.885e+01 1.194e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 20:11:14,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2977006.6666666665, ans=0.0 2023-11-24 20:11:34,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2977140.0, ans=0.1 2023-11-24 20:11:38,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2977140.0, ans=0.2 2023-11-24 20:11:41,954 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1700, loss[loss=0.08529, simple_loss=0.1245, pruned_loss=0.01626, audio_tagging_loss=0.006764, over 14959.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09267, pruned_loss=0.01306, audio_tagging_loss=0.009047, over 3053547.11 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:11:49,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2977206.6666666665, ans=0.0 2023-11-24 20:11:54,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2977273.3333333335, ans=0.125 2023-11-24 20:11:56,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2023-11-24 20:12:00,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2977273.3333333335, ans=0.125 2023-11-24 20:12:03,404 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446600 2023-11-24 20:12:11,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2977340.0, ans=0.04949747468305833 2023-11-24 20:12:22,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2977406.6666666665, ans=0.125 2023-11-24 20:12:29,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2977406.6666666665, ans=0.0 2023-11-24 20:12:45,062 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1750, loss[loss=0.08551, simple_loss=0.1196, pruned_loss=0.01869, audio_tagging_loss=0.007001, over 14059.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09218, pruned_loss=0.01293, audio_tagging_loss=0.009042, over 3053670.22 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:12:55,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2977606.6666666665, ans=0.0 2023-11-24 20:13:05,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446650 2023-11-24 20:13:08,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.455e+01 9.155e+01 9.735e+01 1.313e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 20:13:08,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2977673.3333333335, ans=15.0 2023-11-24 20:13:16,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-24 20:13:17,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2977673.3333333335, ans=0.1 2023-11-24 20:13:21,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=12.0 2023-11-24 20:13:23,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2977740.0, ans=0.0 2023-11-24 20:13:24,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2977740.0, ans=0.0 2023-11-24 20:13:39,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2977806.6666666665, ans=0.0 2023-11-24 20:13:39,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2977806.6666666665, ans=0.125 2023-11-24 20:13:46,499 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1800, loss[loss=0.07277, simple_loss=0.102, pruned_loss=0.01565, audio_tagging_loss=0.006133, over 15590.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09179, pruned_loss=0.01296, audio_tagging_loss=0.008929, over 3051922.72 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:13:47,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2977873.3333333335, ans=0.125 2023-11-24 20:13:59,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2977940.0, ans=0.0 2023-11-24 20:14:07,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446700 2023-11-24 20:14:49,176 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1850, loss[loss=0.06523, simple_loss=0.08175, pruned_loss=0.01614, audio_tagging_loss=0.008214, over 15321.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09156, pruned_loss=0.01298, audio_tagging_loss=0.008892, over 3037016.17 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:14:56,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2978206.6666666665, ans=0.125 2023-11-24 20:15:10,411 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446750 2023-11-24 20:15:12,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.609e+01 9.248e+01 1.012e+02 1.189e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 20:15:15,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2978340.0, ans=0.0 2023-11-24 20:15:22,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2978340.0, ans=0.1 2023-11-24 20:15:33,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2978406.6666666665, ans=0.125 2023-11-24 20:15:36,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2978406.6666666665, ans=0.125 2023-11-24 20:15:39,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-24 20:15:43,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2978473.3333333335, ans=0.125 2023-11-24 20:15:51,158 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1900, loss[loss=0.0579, simple_loss=0.07679, pruned_loss=0.01102, audio_tagging_loss=0.008475, over 16150.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09177, pruned_loss=0.01284, audio_tagging_loss=0.008777, over 3037180.14 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:16:02,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2978606.6666666665, ans=0.0 2023-11-24 20:16:11,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446800 2023-11-24 20:16:14,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2978673.3333333335, ans=0.5 2023-11-24 20:16:52,745 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 1950, loss[loss=0.05582, simple_loss=0.07787, pruned_loss=0.009543, audio_tagging_loss=0.007341, over 14943.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09074, pruned_loss=0.01276, audio_tagging_loss=0.008744, over 3039546.54 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 4.0 2023-11-24 20:17:09,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2978940.0, ans=0.125 2023-11-24 20:17:10,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2978940.0, ans=0.0 2023-11-24 20:17:13,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446850 2023-11-24 20:17:17,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.609e+01 9.375e+01 1.004e+02 1.642e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-24 20:17:47,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2979140.0, ans=0.035 2023-11-24 20:17:55,541 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2000, loss[loss=0.0843, simple_loss=0.1133, pruned_loss=0.02045, audio_tagging_loss=0.007213, over 15366.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09061, pruned_loss=0.01289, audio_tagging_loss=0.008799, over 3039779.03 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:18:15,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2979273.3333333335, ans=0.125 2023-11-24 20:18:17,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446900 2023-11-24 20:18:42,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2979406.6666666665, ans=0.125 2023-11-24 20:18:44,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-24 20:18:51,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2979473.3333333335, ans=0.0 2023-11-24 20:18:56,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2979540.0, ans=0.04949747468305833 2023-11-24 20:18:57,594 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2050, loss[loss=0.07133, simple_loss=0.108, pruned_loss=0.01183, audio_tagging_loss=0.005528, over 14982.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09077, pruned_loss=0.01294, audio_tagging_loss=0.008749, over 3043827.15 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:19:00,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2979540.0, ans=0.125 2023-11-24 20:19:19,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 446950 2023-11-24 20:19:22,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.521e+01 9.047e+01 9.602e+01 1.413e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 20:19:36,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2979740.0, ans=0.1 2023-11-24 20:19:44,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2979740.0, ans=0.2 2023-11-24 20:20:00,616 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2100, loss[loss=0.05858, simple_loss=0.08292, pruned_loss=0.009964, audio_tagging_loss=0.007152, over 14792.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08951, pruned_loss=0.01269, audio_tagging_loss=0.008793, over 3044960.84 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:20:22,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447000 2023-11-24 20:20:50,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2980140.0, ans=0.0 2023-11-24 20:21:03,517 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2150, loss[loss=0.08149, simple_loss=0.1108, pruned_loss=0.01867, audio_tagging_loss=0.00741, over 15155.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09037, pruned_loss=0.01263, audio_tagging_loss=0.008727, over 3049614.53 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:21:04,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.12 vs. limit=6.0 2023-11-24 20:21:24,848 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447050 2023-11-24 20:21:28,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.688e+01 9.386e+01 1.033e+02 1.815e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-24 20:21:40,953 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:21:47,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-24 20:21:51,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2980406.6666666665, ans=0.1 2023-11-24 20:22:05,666 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2200, loss[loss=0.07398, simple_loss=0.1035, pruned_loss=0.01762, audio_tagging_loss=0.004589, over 15159.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09109, pruned_loss=0.01298, audio_tagging_loss=0.008648, over 3045762.06 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:22:07,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2980540.0, ans=0.1 2023-11-24 20:22:15,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2980540.0, ans=0.5 2023-11-24 20:22:26,947 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447100 2023-11-24 20:22:35,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-24 20:22:43,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2980740.0, ans=0.125 2023-11-24 20:22:52,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.38 vs. limit=10.0 2023-11-24 20:22:54,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2980806.6666666665, ans=0.125 2023-11-24 20:23:07,826 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2250, loss[loss=0.06543, simple_loss=0.08357, pruned_loss=0.0148, audio_tagging_loss=0.008839, over 15057.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09103, pruned_loss=0.01296, audio_tagging_loss=0.008578, over 3045621.01 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:23:10,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2980873.3333333335, ans=0.125 2023-11-24 20:23:29,845 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447150 2023-11-24 20:23:33,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.657e+01 9.252e+01 1.018e+02 2.312e+02, threshold=1.850e+02, percent-clipped=2.0 2023-11-24 20:23:47,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2981073.3333333335, ans=0.04949747468305833 2023-11-24 20:23:56,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2981140.0, ans=0.125 2023-11-24 20:24:11,232 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2300, loss[loss=0.07355, simple_loss=0.09847, pruned_loss=0.01692, audio_tagging_loss=0.00739, over 14646.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08994, pruned_loss=0.01277, audio_tagging_loss=0.008693, over 3040771.35 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:24:12,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2981206.6666666665, ans=0.0 2023-11-24 20:24:12,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2981206.6666666665, ans=0.125 2023-11-24 20:24:15,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2981206.6666666665, ans=0.125 2023-11-24 20:24:25,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2981273.3333333335, ans=0.125 2023-11-24 20:24:32,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447200 2023-11-24 20:24:32,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2981273.3333333335, ans=0.0 2023-11-24 20:25:03,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2981473.3333333335, ans=0.07 2023-11-24 20:25:06,740 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:25:13,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2023-11-24 20:25:14,006 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2350, loss[loss=0.05832, simple_loss=0.07969, pruned_loss=0.00797, audio_tagging_loss=0.01051, over 14211.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08927, pruned_loss=0.01262, audio_tagging_loss=0.008777, over 3031970.05 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 8.0 2023-11-24 20:25:17,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2981540.0, ans=0.125 2023-11-24 20:25:20,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2981540.0, ans=0.125 2023-11-24 20:25:32,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2981606.6666666665, ans=0.2 2023-11-24 20:25:34,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447250 2023-11-24 20:25:35,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2981606.6666666665, ans=0.2 2023-11-24 20:25:39,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.625e+01 9.128e+01 9.764e+01 1.329e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 20:25:51,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2981740.0, ans=0.0 2023-11-24 20:26:09,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2981806.6666666665, ans=0.1 2023-11-24 20:26:10,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2981806.6666666665, ans=0.125 2023-11-24 20:26:16,226 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2400, loss[loss=0.05843, simple_loss=0.07745, pruned_loss=0.01121, audio_tagging_loss=0.008507, over 16355.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09067, pruned_loss=0.01279, audio_tagging_loss=0.008919, over 3041463.68 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:26:38,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447300 2023-11-24 20:26:39,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2981940.0, ans=0.125 2023-11-24 20:26:53,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2982073.3333333335, ans=0.0 2023-11-24 20:27:10,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2982140.0, ans=0.125 2023-11-24 20:27:18,565 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2450, loss[loss=0.05886, simple_loss=0.08062, pruned_loss=0.008152, audio_tagging_loss=0.0104, over 15259.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09117, pruned_loss=0.01292, audio_tagging_loss=0.009037, over 3042227.54 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:27:40,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447350 2023-11-24 20:27:44,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.597e+01 9.419e+01 1.015e+02 1.292e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-24 20:27:48,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2982340.0, ans=0.2 2023-11-24 20:27:59,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=12.0 2023-11-24 20:28:09,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2023-11-24 20:28:13,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2982473.3333333335, ans=0.125 2023-11-24 20:28:21,317 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2500, loss[loss=0.07645, simple_loss=0.1114, pruned_loss=0.01336, audio_tagging_loss=0.007414, over 14825.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09071, pruned_loss=0.01277, audio_tagging_loss=0.009067, over 3044418.26 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:28:35,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2982606.6666666665, ans=0.2 2023-11-24 20:28:42,237 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447400 2023-11-24 20:28:49,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-11-24 20:28:55,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-24 20:28:58,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2982740.0, ans=0.125 2023-11-24 20:28:59,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2982740.0, ans=0.125 2023-11-24 20:29:14,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2982806.6666666665, ans=0.1 2023-11-24 20:29:23,915 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2550, loss[loss=0.07101, simple_loss=0.09304, pruned_loss=0.01512, audio_tagging_loss=0.009377, over 15391.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09072, pruned_loss=0.0129, audio_tagging_loss=0.008982, over 3037397.07 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:29:25,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2982873.3333333335, ans=0.125 2023-11-24 20:29:26,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2982873.3333333335, ans=0.125 2023-11-24 20:29:29,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-24 20:29:39,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-24 20:29:42,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-24 20:29:44,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447450 2023-11-24 20:29:48,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.553e+01 9.176e+01 1.003e+02 1.865e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-24 20:30:09,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2983073.3333333335, ans=0.125 2023-11-24 20:30:19,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2023-11-24 20:30:20,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.10 vs. limit=22.5 2023-11-24 20:30:23,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2983140.0, ans=0.0 2023-11-24 20:30:25,778 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2600, loss[loss=0.0643, simple_loss=0.08956, pruned_loss=0.01238, audio_tagging_loss=0.007142, over 15414.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09099, pruned_loss=0.01306, audio_tagging_loss=0.008913, over 3041039.61 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:30:48,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447500 2023-11-24 20:30:55,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2983340.0, ans=0.035 2023-11-24 20:31:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2983473.3333333335, ans=0.125 2023-11-24 20:31:29,455 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2650, loss[loss=0.05477, simple_loss=0.07766, pruned_loss=0.00792, audio_tagging_loss=0.008022, over 14483.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09139, pruned_loss=0.01302, audio_tagging_loss=0.008847, over 3047592.27 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:31:30,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2983540.0, ans=0.0 2023-11-24 20:31:50,175 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447550 2023-11-24 20:31:53,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.637e+01 9.375e+01 1.000e+02 1.273e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-24 20:32:12,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2983740.0, ans=0.125 2023-11-24 20:32:12,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2983740.0, ans=0.1 2023-11-24 20:32:29,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2983873.3333333335, ans=0.125 2023-11-24 20:32:30,763 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2700, loss[loss=0.07244, simple_loss=0.0975, pruned_loss=0.01418, audio_tagging_loss=0.009506, over 15322.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09162, pruned_loss=0.01312, audio_tagging_loss=0.008739, over 3046038.15 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:32:32,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-24 20:32:35,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2983873.3333333335, ans=0.1 2023-11-24 20:32:52,234 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447600 2023-11-24 20:32:55,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2984006.6666666665, ans=0.125 2023-11-24 20:33:01,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2984006.6666666665, ans=0.125 2023-11-24 20:33:13,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2984073.3333333335, ans=0.125 2023-11-24 20:33:14,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2984073.3333333335, ans=0.0 2023-11-24 20:33:32,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2023-11-24 20:33:33,444 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2750, loss[loss=0.07504, simple_loss=0.09468, pruned_loss=0.01962, audio_tagging_loss=0.008089, over 14273.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09082, pruned_loss=0.01294, audio_tagging_loss=0.00881, over 3046061.37 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:33:47,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2984273.3333333335, ans=0.125 2023-11-24 20:33:49,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2984273.3333333335, ans=0.0 2023-11-24 20:33:55,321 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447650 2023-11-24 20:33:59,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.580e+01 9.104e+01 1.006e+02 1.298e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 20:34:20,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2984406.6666666665, ans=0.1 2023-11-24 20:34:22,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-24 20:34:25,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2984473.3333333335, ans=0.0 2023-11-24 20:34:26,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=22.5 2023-11-24 20:34:27,042 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:34:28,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-11-24 20:34:35,938 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2800, loss[loss=0.07891, simple_loss=0.09541, pruned_loss=0.01954, audio_tagging_loss=0.01166, over 15159.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09125, pruned_loss=0.01315, audio_tagging_loss=0.008723, over 3045009.42 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:34:41,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2984540.0, ans=0.0 2023-11-24 20:34:41,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2984540.0, ans=0.125 2023-11-24 20:34:42,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2984540.0, ans=0.0 2023-11-24 20:34:51,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2984606.6666666665, ans=0.2 2023-11-24 20:34:56,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2984606.6666666665, ans=0.1 2023-11-24 20:34:57,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447700 2023-11-24 20:35:01,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2984673.3333333335, ans=0.125 2023-11-24 20:35:06,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2984673.3333333335, ans=0.125 2023-11-24 20:35:09,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2984673.3333333335, ans=0.125 2023-11-24 20:35:09,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2984673.3333333335, ans=0.2 2023-11-24 20:35:16,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2984740.0, ans=0.0 2023-11-24 20:35:16,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2984740.0, ans=0.2 2023-11-24 20:35:31,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2984806.6666666665, ans=0.125 2023-11-24 20:35:38,990 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2850, loss[loss=0.0576, simple_loss=0.07833, pruned_loss=0.009929, audio_tagging_loss=0.00851, over 15254.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09082, pruned_loss=0.01299, audio_tagging_loss=0.008671, over 3040735.71 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:35:51,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2023-11-24 20:35:57,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2984940.0, ans=0.125 2023-11-24 20:35:59,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447750 2023-11-24 20:36:03,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2985006.6666666665, ans=0.0 2023-11-24 20:36:05,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.522e+01 9.152e+01 9.707e+01 1.376e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 20:36:16,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2985073.3333333335, ans=0.0 2023-11-24 20:36:21,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2985073.3333333335, ans=0.2 2023-11-24 20:36:29,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-24 20:36:41,871 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2900, loss[loss=0.07507, simple_loss=0.1051, pruned_loss=0.01592, audio_tagging_loss=0.006579, over 14385.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09078, pruned_loss=0.01301, audio_tagging_loss=0.008689, over 3039266.10 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:37:03,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447800 2023-11-24 20:37:10,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2985340.0, ans=0.125 2023-11-24 20:37:11,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2985340.0, ans=0.125 2023-11-24 20:37:34,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2985473.3333333335, ans=0.1 2023-11-24 20:37:45,066 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 2950, loss[loss=0.06145, simple_loss=0.08534, pruned_loss=0.009687, audio_tagging_loss=0.009091, over 14050.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09107, pruned_loss=0.013, audio_tagging_loss=0.008719, over 3049582.83 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:37:53,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2985540.0, ans=0.125 2023-11-24 20:37:55,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2985540.0, ans=0.015 2023-11-24 20:37:59,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2985606.6666666665, ans=0.125 2023-11-24 20:37:59,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2985606.6666666665, ans=0.125 2023-11-24 20:38:06,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447850 2023-11-24 20:38:10,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.599e+01 9.301e+01 9.933e+01 1.313e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-24 20:38:29,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2985740.0, ans=0.0 2023-11-24 20:38:41,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2985806.6666666665, ans=0.125 2023-11-24 20:38:47,534 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3000, loss[loss=0.06275, simple_loss=0.08633, pruned_loss=0.01082, audio_tagging_loss=0.008771, over 16925.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09019, pruned_loss=0.01281, audio_tagging_loss=0.008863, over 3048788.49 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:38:47,535 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 20:39:09,286 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9711, 2.5999, 4.4852, 2.0725], device='cuda:1') 2023-11-24 20:39:31,501 INFO [train_asr.py:1253] (1/4) Epoch 38, validation: loss=0.05738, simple_loss=0.0507, pruned_loss=0.005077, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-24 20:39:31,503 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 20:39:39,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2985873.3333333335, ans=0.0 2023-11-24 20:39:43,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2985940.0, ans=22.5 2023-11-24 20:39:46,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2985940.0, ans=0.04949747468305833 2023-11-24 20:39:46,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2985940.0, ans=0.0 2023-11-24 20:39:52,642 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447900 2023-11-24 20:40:05,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2986006.6666666665, ans=0.0 2023-11-24 20:40:13,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2986073.3333333335, ans=0.0 2023-11-24 20:40:18,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2986073.3333333335, ans=0.1 2023-11-24 20:40:23,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2986140.0, ans=0.0 2023-11-24 20:40:24,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2986140.0, ans=0.125 2023-11-24 20:40:34,329 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3050, loss[loss=0.08473, simple_loss=0.1257, pruned_loss=0.01573, audio_tagging_loss=0.006149, over 15196.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09004, pruned_loss=0.01273, audio_tagging_loss=0.009036, over 3047486.05 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:40:38,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2986206.6666666665, ans=0.035 2023-11-24 20:40:55,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 447950 2023-11-24 20:40:59,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2986340.0, ans=15.0 2023-11-24 20:41:00,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 8.702e+01 9.218e+01 9.877e+01 1.257e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-24 20:41:02,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2986340.0, ans=0.0 2023-11-24 20:41:04,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2986340.0, ans=0.0 2023-11-24 20:41:05,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-24 20:41:10,481 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:41:30,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2986473.3333333335, ans=0.1 2023-11-24 20:41:37,182 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3100, loss[loss=0.0638, simple_loss=0.08432, pruned_loss=0.009861, audio_tagging_loss=0.01178, over 16723.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.08994, pruned_loss=0.01274, audio_tagging_loss=0.009043, over 3049499.30 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:41:38,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2986540.0, ans=0.125 2023-11-24 20:41:54,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2986606.6666666665, ans=0.0 2023-11-24 20:41:58,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448000 2023-11-24 20:42:04,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2986673.3333333335, ans=0.125 2023-11-24 20:42:11,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2986673.3333333335, ans=0.125 2023-11-24 20:42:16,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2986740.0, ans=0.0 2023-11-24 20:42:42,696 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3150, loss[loss=0.06222, simple_loss=0.08781, pruned_loss=0.01115, audio_tagging_loss=0.007173, over 14754.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09173, pruned_loss=0.01286, audio_tagging_loss=0.009032, over 3055433.99 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:42:48,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2986873.3333333335, ans=0.125 2023-11-24 20:42:48,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2986873.3333333335, ans=0.125 2023-11-24 20:42:52,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2023-11-24 20:43:04,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448050 2023-11-24 20:43:08,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.654e+01 9.120e+01 9.857e+01 1.344e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 20:43:21,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2987073.3333333335, ans=0.125 2023-11-24 20:43:38,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2023-11-24 20:43:40,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2987140.0, ans=0.0 2023-11-24 20:43:45,949 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3200, loss[loss=0.07286, simple_loss=0.1017, pruned_loss=0.01453, audio_tagging_loss=0.007495, over 15559.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.0916, pruned_loss=0.01295, audio_tagging_loss=0.009107, over 3050612.48 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:43:48,521 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 20:43:59,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-24 20:44:07,212 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448100 2023-11-24 20:44:36,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=2987473.3333333335, ans=0.1 2023-11-24 20:44:42,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2987473.3333333335, ans=0.015 2023-11-24 20:44:42,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2987473.3333333335, ans=0.0 2023-11-24 20:44:45,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2987473.3333333335, ans=0.1 2023-11-24 20:44:47,845 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3250, loss[loss=0.07065, simple_loss=0.09779, pruned_loss=0.01201, audio_tagging_loss=0.009739, over 14869.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09118, pruned_loss=0.01284, audio_tagging_loss=0.009204, over 3048446.92 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:45:08,970 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448150 2023-11-24 20:45:15,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.565e+01 9.106e+01 9.950e+01 1.221e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 20:45:18,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2987673.3333333335, ans=0.125 2023-11-24 20:45:34,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2987740.0, ans=0.2 2023-11-24 20:45:35,880 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 20:45:50,291 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3300, loss[loss=0.04994, simple_loss=0.07323, pruned_loss=0.005219, audio_tagging_loss=0.008107, over 15205.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09026, pruned_loss=0.01268, audio_tagging_loss=0.009265, over 3043656.46 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:45:59,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2987873.3333333335, ans=0.0 2023-11-24 20:46:03,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-24 20:46:05,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2987940.0, ans=0.125 2023-11-24 20:46:06,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2987940.0, ans=0.015 2023-11-24 20:46:11,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448200 2023-11-24 20:46:28,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-24 20:46:41,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2988140.0, ans=0.125 2023-11-24 20:46:53,177 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3350, loss[loss=0.05479, simple_loss=0.06865, pruned_loss=0.01025, audio_tagging_loss=0.01022, over 13703.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09024, pruned_loss=0.01272, audio_tagging_loss=0.009163, over 3054530.89 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:46:57,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2988206.6666666665, ans=0.1 2023-11-24 20:47:09,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2988273.3333333335, ans=0.1 2023-11-24 20:47:13,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448250 2023-11-24 20:47:17,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2023-11-24 20:47:19,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.779e+01 9.397e+01 1.017e+02 1.316e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-24 20:47:34,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2988406.6666666665, ans=0.125 2023-11-24 20:47:41,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2988473.3333333335, ans=0.125 2023-11-24 20:47:47,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2988473.3333333335, ans=0.125 2023-11-24 20:47:54,741 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3400, loss[loss=0.04842, simple_loss=0.06482, pruned_loss=0.007527, audio_tagging_loss=0.008486, over 17032.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.0915, pruned_loss=0.01287, audio_tagging_loss=0.008996, over 3055997.30 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:47:59,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2988540.0, ans=0.2 2023-11-24 20:48:01,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2988540.0, ans=0.1 2023-11-24 20:48:07,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2988606.6666666665, ans=0.1 2023-11-24 20:48:16,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448300 2023-11-24 20:48:33,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2988740.0, ans=0.125 2023-11-24 20:48:40,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-24 20:48:42,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2023-11-24 20:48:57,290 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3450, loss[loss=0.06686, simple_loss=0.09435, pruned_loss=0.01126, audio_tagging_loss=0.008426, over 15205.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09188, pruned_loss=0.01285, audio_tagging_loss=0.008862, over 3058110.47 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:49:19,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448350 2023-11-24 20:49:25,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.675e+01 9.257e+01 9.906e+01 1.659e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 20:49:28,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2989006.6666666665, ans=0.0 2023-11-24 20:49:38,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2989073.3333333335, ans=0.125 2023-11-24 20:49:43,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2989073.3333333335, ans=0.125 2023-11-24 20:50:01,079 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3500, loss[loss=0.06094, simple_loss=0.0821, pruned_loss=0.01188, audio_tagging_loss=0.008018, over 15660.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09127, pruned_loss=0.0129, audio_tagging_loss=0.008842, over 3053521.29 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:50:11,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2989206.6666666665, ans=0.2 2023-11-24 20:50:22,481 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448400 2023-11-24 20:50:25,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2989340.0, ans=0.125 2023-11-24 20:50:32,098 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:50:44,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2989406.6666666665, ans=0.125 2023-11-24 20:50:54,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2989473.3333333335, ans=0.125 2023-11-24 20:51:03,939 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3550, loss[loss=0.07702, simple_loss=0.1066, pruned_loss=0.01569, audio_tagging_loss=0.008047, over 15712.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09105, pruned_loss=0.01292, audio_tagging_loss=0.008777, over 3055003.95 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 16.0 2023-11-24 20:51:06,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-24 20:51:10,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2989540.0, ans=0.0 2023-11-24 20:51:25,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448450 2023-11-24 20:51:32,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.760e+01 8.602e+01 9.062e+01 9.844e+01 1.252e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 20:51:40,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-24 20:51:42,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2989740.0, ans=0.125 2023-11-24 20:52:00,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2989806.6666666665, ans=0.1 2023-11-24 20:52:03,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2989806.6666666665, ans=0.125 2023-11-24 20:52:06,924 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3600, loss[loss=0.0819, simple_loss=0.1114, pruned_loss=0.01887, audio_tagging_loss=0.007312, over 15854.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09079, pruned_loss=0.01288, audio_tagging_loss=0.008789, over 3058536.06 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:52:22,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2989940.0, ans=0.0 2023-11-24 20:52:29,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448500 2023-11-24 20:52:43,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2990073.3333333335, ans=0.0 2023-11-24 20:52:54,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2990073.3333333335, ans=0.1 2023-11-24 20:53:09,701 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3650, loss[loss=0.05686, simple_loss=0.07661, pruned_loss=0.008828, audio_tagging_loss=0.009727, over 15179.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09075, pruned_loss=0.01284, audio_tagging_loss=0.008797, over 3051675.31 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:53:12,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2990206.6666666665, ans=0.0 2023-11-24 20:53:14,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2990206.6666666665, ans=0.1 2023-11-24 20:53:27,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2990273.3333333335, ans=0.09899494936611666 2023-11-24 20:53:27,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2990273.3333333335, ans=0.2 2023-11-24 20:53:30,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448550 2023-11-24 20:53:36,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.766e+01 9.382e+01 1.003e+02 1.165e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-24 20:54:11,705 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3700, loss[loss=0.05741, simple_loss=0.07399, pruned_loss=0.01044, audio_tagging_loss=0.009968, over 14123.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09093, pruned_loss=0.0128, audio_tagging_loss=0.008788, over 3057734.19 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:54:32,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448600 2023-11-24 20:54:56,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2990740.0, ans=0.2 2023-11-24 20:55:14,087 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3750, loss[loss=0.07463, simple_loss=0.1063, pruned_loss=0.01297, audio_tagging_loss=0.008518, over 16652.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09091, pruned_loss=0.01277, audio_tagging_loss=0.008802, over 3060930.18 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:55:35,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448650 2023-11-24 20:55:41,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.492e+01 9.238e+01 9.913e+01 1.162e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 20:55:41,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2991006.6666666665, ans=0.0 2023-11-24 20:55:41,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2991006.6666666665, ans=0.1 2023-11-24 20:55:44,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-11-24 20:55:51,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2991073.3333333335, ans=0.2 2023-11-24 20:55:54,927 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 20:55:56,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2991073.3333333335, ans=0.0 2023-11-24 20:56:15,250 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3800, loss[loss=0.05631, simple_loss=0.07235, pruned_loss=0.0109, audio_tagging_loss=0.009235, over 15094.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09122, pruned_loss=0.01287, audio_tagging_loss=0.008783, over 3060351.09 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:56:31,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-24 20:56:36,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448700 2023-11-24 20:57:00,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2991406.6666666665, ans=0.0 2023-11-24 20:57:17,964 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3850, loss[loss=0.0962, simple_loss=0.1319, pruned_loss=0.02354, audio_tagging_loss=0.006703, over 15655.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09091, pruned_loss=0.01272, audio_tagging_loss=0.008812, over 3060791.66 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:57:20,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2991540.0, ans=0.125 2023-11-24 20:57:20,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2991540.0, ans=0.125 2023-11-24 20:57:25,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-24 20:57:32,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2991606.6666666665, ans=0.125 2023-11-24 20:57:38,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448750 2023-11-24 20:57:44,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.712e+01 9.172e+01 9.948e+01 1.223e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 20:57:48,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2991673.3333333335, ans=0.1 2023-11-24 20:57:50,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2991673.3333333335, ans=0.1 2023-11-24 20:58:02,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2991740.0, ans=0.0 2023-11-24 20:58:08,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2991806.6666666665, ans=0.0 2023-11-24 20:58:19,598 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3900, loss[loss=0.07732, simple_loss=0.1063, pruned_loss=0.01421, audio_tagging_loss=0.009965, over 15100.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09071, pruned_loss=0.01257, audio_tagging_loss=0.008851, over 3047581.75 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2023-11-24 20:58:34,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2991940.0, ans=0.125 2023-11-24 20:58:40,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448800 2023-11-24 20:58:55,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-24 20:58:58,579 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 20:58:59,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2992073.3333333335, ans=0.125 2023-11-24 20:59:04,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-24 20:59:05,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2023-11-24 20:59:05,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2992073.3333333335, ans=0.035 2023-11-24 20:59:16,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2992140.0, ans=0.95 2023-11-24 20:59:20,991 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 3950, loss[loss=0.06718, simple_loss=0.08931, pruned_loss=0.01197, audio_tagging_loss=0.01057, over 15947.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09063, pruned_loss=0.01268, audio_tagging_loss=0.009054, over 3052010.58 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 20:59:27,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2992206.6666666665, ans=0.125 2023-11-24 20:59:29,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2992206.6666666665, ans=0.0 2023-11-24 20:59:35,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2992273.3333333335, ans=0.125 2023-11-24 20:59:40,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2992273.3333333335, ans=0.125 2023-11-24 20:59:41,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-24 20:59:42,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448850 2023-11-24 20:59:48,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.553e+01 9.326e+01 9.949e+01 1.288e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 20:59:52,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-24 20:59:53,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2992340.0, ans=0.125 2023-11-24 20:59:56,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2992340.0, ans=0.0 2023-11-24 21:00:16,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2992473.3333333335, ans=0.2 2023-11-24 21:00:23,985 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4000, loss[loss=0.04397, simple_loss=0.05946, pruned_loss=0.005743, audio_tagging_loss=0.008497, over 14708.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09132, pruned_loss=0.01304, audio_tagging_loss=0.009063, over 3046732.47 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:00:32,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2992540.0, ans=0.2 2023-11-24 21:00:34,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-24 21:00:36,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2992606.6666666665, ans=0.125 2023-11-24 21:00:44,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448900 2023-11-24 21:00:54,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2992673.3333333335, ans=0.0 2023-11-24 21:01:15,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2992806.6666666665, ans=0.1 2023-11-24 21:01:25,957 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4050, loss[loss=0.07436, simple_loss=0.1062, pruned_loss=0.01343, audio_tagging_loss=0.007836, over 15319.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09198, pruned_loss=0.01317, audio_tagging_loss=0.009031, over 3047702.73 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:01:27,217 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 21:01:29,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2992873.3333333335, ans=0.125 2023-11-24 21:01:47,013 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 448950 2023-11-24 21:01:54,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.711e+01 9.505e+01 1.006e+02 1.290e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-24 21:01:58,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2993006.6666666665, ans=0.04949747468305833 2023-11-24 21:02:14,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-24 21:02:27,587 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4100, loss[loss=0.06734, simple_loss=0.09102, pruned_loss=0.01029, audio_tagging_loss=0.01154, over 14394.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09149, pruned_loss=0.01311, audio_tagging_loss=0.00915, over 3046964.03 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:02:28,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2993206.6666666665, ans=0.125 2023-11-24 21:02:35,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2993206.6666666665, ans=0.125 2023-11-24 21:02:50,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449000 2023-11-24 21:03:00,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2993340.0, ans=0.125 2023-11-24 21:03:31,079 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4150, loss[loss=0.06248, simple_loss=0.07987, pruned_loss=0.01194, audio_tagging_loss=0.01061, over 15713.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09104, pruned_loss=0.01298, audio_tagging_loss=0.009033, over 3051985.99 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:03:45,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2993606.6666666665, ans=0.0 2023-11-24 21:03:52,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449050 2023-11-24 21:03:54,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2023-11-24 21:03:59,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.629e+01 9.149e+01 9.710e+01 1.194e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 21:04:00,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2023-11-24 21:04:00,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2993673.3333333335, ans=0.0 2023-11-24 21:04:03,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2993673.3333333335, ans=0.125 2023-11-24 21:04:04,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2023-11-24 21:04:15,005 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 21:04:22,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2993806.6666666665, ans=0.0 2023-11-24 21:04:29,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2993806.6666666665, ans=0.0 2023-11-24 21:04:31,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2993806.6666666665, ans=0.125 2023-11-24 21:04:33,285 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4200, loss[loss=0.07046, simple_loss=0.09601, pruned_loss=0.01287, audio_tagging_loss=0.009587, over 16761.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09055, pruned_loss=0.01283, audio_tagging_loss=0.008971, over 3049401.75 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:04:36,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2993873.3333333335, ans=0.125 2023-11-24 21:04:53,869 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449100 2023-11-24 21:04:58,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2023-11-24 21:05:02,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2994006.6666666665, ans=0.125 2023-11-24 21:05:15,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2994073.3333333335, ans=0.2 2023-11-24 21:05:17,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-11-24 21:05:18,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-24 21:05:23,805 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:05:35,510 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4250, loss[loss=0.06395, simple_loss=0.08592, pruned_loss=0.01192, audio_tagging_loss=0.009075, over 16260.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09055, pruned_loss=0.01272, audio_tagging_loss=0.008939, over 3057090.95 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:05:43,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2994206.6666666665, ans=0.125 2023-11-24 21:05:53,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2994273.3333333335, ans=0.125 2023-11-24 21:05:56,958 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449150 2023-11-24 21:06:05,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.581e+01 9.086e+01 9.848e+01 1.238e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-24 21:06:19,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2994406.6666666665, ans=0.125 2023-11-24 21:06:28,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2994473.3333333335, ans=0.1 2023-11-24 21:06:37,932 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4300, loss[loss=0.05168, simple_loss=0.06794, pruned_loss=0.008919, audio_tagging_loss=0.008797, over 14375.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09153, pruned_loss=0.01285, audio_tagging_loss=0.008869, over 3054267.01 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:06:38,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2994540.0, ans=0.125 2023-11-24 21:06:38,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-24 21:06:41,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-24 21:06:41,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2994540.0, ans=0.0 2023-11-24 21:06:43,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2994540.0, ans=0.0 2023-11-24 21:06:59,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449200 2023-11-24 21:07:05,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-24 21:07:05,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2994673.3333333335, ans=0.125 2023-11-24 21:07:14,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2023-11-24 21:07:22,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-24 21:07:25,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2994740.0, ans=0.125 2023-11-24 21:07:30,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2994806.6666666665, ans=0.125 2023-11-24 21:07:39,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2994873.3333333335, ans=0.2 2023-11-24 21:07:40,216 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4350, loss[loss=0.05929, simple_loss=0.07568, pruned_loss=0.01139, audio_tagging_loss=0.01006, over 15082.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09267, pruned_loss=0.01299, audio_tagging_loss=0.008809, over 3051924.87 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:07:56,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2994940.0, ans=0.1 2023-11-24 21:08:01,530 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449250 2023-11-24 21:08:10,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.676e+01 9.413e+01 1.023e+02 1.276e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-24 21:08:11,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-24 21:08:12,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2995006.6666666665, ans=0.0 2023-11-24 21:08:23,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-24 21:08:26,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2995073.3333333335, ans=0.125 2023-11-24 21:08:38,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2995140.0, ans=0.0 2023-11-24 21:08:42,975 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4400, loss[loss=0.07688, simple_loss=0.106, pruned_loss=0.01567, audio_tagging_loss=0.008221, over 13956.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09189, pruned_loss=0.01283, audio_tagging_loss=0.008815, over 3052303.23 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:08:54,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2995273.3333333335, ans=0.07 2023-11-24 21:08:59,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2995273.3333333335, ans=0.025 2023-11-24 21:09:04,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449300 2023-11-24 21:09:24,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2995406.6666666665, ans=15.0 2023-11-24 21:09:24,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2995406.6666666665, ans=0.125 2023-11-24 21:09:30,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2995406.6666666665, ans=0.015 2023-11-24 21:09:42,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-11-24 21:09:45,346 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4450, loss[loss=0.06261, simple_loss=0.09441, pruned_loss=0.009397, audio_tagging_loss=0.006012, over 17273.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09182, pruned_loss=0.01297, audio_tagging_loss=0.008749, over 3055766.61 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:09:57,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2023-11-24 21:09:59,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-11-24 21:10:02,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2995606.6666666665, ans=0.0 2023-11-24 21:10:06,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449350 2023-11-24 21:10:15,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2995673.3333333335, ans=0.125 2023-11-24 21:10:16,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.708e+01 9.388e+01 1.016e+02 1.957e+02, threshold=1.878e+02, percent-clipped=1.0 2023-11-24 21:10:22,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2995740.0, ans=0.125 2023-11-24 21:10:29,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2995740.0, ans=0.1 2023-11-24 21:10:36,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2995806.6666666665, ans=0.1 2023-11-24 21:10:36,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2995806.6666666665, ans=0.125 2023-11-24 21:10:37,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2995806.6666666665, ans=0.0 2023-11-24 21:10:47,836 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4500, loss[loss=0.08826, simple_loss=0.123, pruned_loss=0.02108, audio_tagging_loss=0.00569, over 14485.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09217, pruned_loss=0.01301, audio_tagging_loss=0.008763, over 3060277.19 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:10:59,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2995940.0, ans=0.2 2023-11-24 21:11:09,499 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449400 2023-11-24 21:11:18,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-24 21:11:18,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-11-24 21:11:20,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2996006.6666666665, ans=0.125 2023-11-24 21:11:36,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-24 21:11:45,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2996140.0, ans=0.125 2023-11-24 21:11:51,465 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4550, loss[loss=0.07278, simple_loss=0.1001, pruned_loss=0.01373, audio_tagging_loss=0.008984, over 15848.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09162, pruned_loss=0.01289, audio_tagging_loss=0.008814, over 3064411.60 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:11:55,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2996206.6666666665, ans=0.09899494936611666 2023-11-24 21:12:04,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2996273.3333333335, ans=0.0 2023-11-24 21:12:12,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449450 2023-11-24 21:12:16,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2996340.0, ans=0.1 2023-11-24 21:12:17,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2996340.0, ans=0.125 2023-11-24 21:12:22,244 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.617e+01 8.939e+01 9.828e+01 1.230e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-24 21:12:32,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2996406.6666666665, ans=0.125 2023-11-24 21:12:35,855 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 21:12:39,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996473.3333333335, ans=0.1 2023-11-24 21:12:43,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2996473.3333333335, ans=0.2 2023-11-24 21:12:53,660 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4600, loss[loss=0.09011, simple_loss=0.12, pruned_loss=0.0205, audio_tagging_loss=0.009618, over 14994.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09155, pruned_loss=0.01302, audio_tagging_loss=0.008901, over 3051382.70 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:13:04,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2996606.6666666665, ans=0.0 2023-11-24 21:13:06,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2996606.6666666665, ans=0.0 2023-11-24 21:13:09,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996606.6666666665, ans=0.1 2023-11-24 21:13:14,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449500 2023-11-24 21:13:26,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-24 21:13:34,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-24 21:13:55,714 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4650, loss[loss=0.09137, simple_loss=0.1257, pruned_loss=0.01848, audio_tagging_loss=0.01003, over 16035.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09173, pruned_loss=0.01308, audio_tagging_loss=0.008957, over 3049210.69 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:14:00,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2996873.3333333335, ans=0.0 2023-11-24 21:14:02,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2996873.3333333335, ans=0.0 2023-11-24 21:14:16,911 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449550 2023-11-24 21:14:17,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2996940.0, ans=0.125 2023-11-24 21:14:18,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2996940.0, ans=0.2 2023-11-24 21:14:27,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.430e+01 9.300e+01 1.028e+02 1.616e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-24 21:14:39,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2997073.3333333335, ans=0.0 2023-11-24 21:14:58,450 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4700, loss[loss=0.0547, simple_loss=0.07547, pruned_loss=0.008867, audio_tagging_loss=0.008101, over 14989.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09198, pruned_loss=0.01309, audio_tagging_loss=0.008939, over 3053239.26 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:15:15,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2997273.3333333335, ans=0.125 2023-11-24 21:15:20,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449600 2023-11-24 21:15:27,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2997340.0, ans=0.0 2023-11-24 21:15:51,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2997473.3333333335, ans=0.125 2023-11-24 21:15:56,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2997473.3333333335, ans=0.125 2023-11-24 21:15:59,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2997473.3333333335, ans=0.125 2023-11-24 21:16:02,245 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4750, loss[loss=0.0776, simple_loss=0.1004, pruned_loss=0.01801, audio_tagging_loss=0.0094, over 15555.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09222, pruned_loss=0.01309, audio_tagging_loss=0.008914, over 3054399.48 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:16:23,101 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449650 2023-11-24 21:16:33,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.730e+01 9.403e+01 9.994e+01 1.287e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-24 21:16:47,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2023-11-24 21:16:48,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2997740.0, ans=0.0 2023-11-24 21:17:04,665 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4800, loss[loss=0.08142, simple_loss=0.1141, pruned_loss=0.01672, audio_tagging_loss=0.007668, over 15604.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09114, pruned_loss=0.01299, audio_tagging_loss=0.009021, over 3059260.64 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:17:09,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2997873.3333333335, ans=0.5 2023-11-24 21:17:25,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449700 2023-11-24 21:17:30,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2998006.6666666665, ans=0.1 2023-11-24 21:17:34,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2998006.6666666665, ans=0.125 2023-11-24 21:18:06,862 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4850, loss[loss=0.06635, simple_loss=0.08796, pruned_loss=0.01329, audio_tagging_loss=0.009085, over 14495.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09083, pruned_loss=0.01292, audio_tagging_loss=0.009151, over 3046949.68 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:18:09,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2998206.6666666665, ans=0.125 2023-11-24 21:18:11,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2998206.6666666665, ans=0.04949747468305833 2023-11-24 21:18:26,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-11-24 21:18:28,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449750 2023-11-24 21:18:38,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.567e+01 9.194e+01 1.005e+02 1.604e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 21:18:44,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2998406.6666666665, ans=0.125 2023-11-24 21:18:46,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2998406.6666666665, ans=0.2 2023-11-24 21:18:49,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2998406.6666666665, ans=0.125 2023-11-24 21:19:09,498 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4900, loss[loss=0.05636, simple_loss=0.08105, pruned_loss=0.006849, audio_tagging_loss=0.00898, over 15612.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08992, pruned_loss=0.01268, audio_tagging_loss=0.00914, over 3039045.84 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:19:31,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449800 2023-11-24 21:19:35,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2998673.3333333335, ans=0.2 2023-11-24 21:19:39,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2998673.3333333335, ans=0.0 2023-11-24 21:19:43,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-11-24 21:20:00,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2998806.6666666665, ans=0.125 2023-11-24 21:20:09,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=2998806.6666666665, ans=8.0 2023-11-24 21:20:11,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-24 21:20:13,065 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 4950, loss[loss=0.07267, simple_loss=0.1038, pruned_loss=0.01376, audio_tagging_loss=0.00703, over 14640.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09034, pruned_loss=0.0126, audio_tagging_loss=0.008957, over 3044790.12 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:20:14,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2998873.3333333335, ans=10.0 2023-11-24 21:20:15,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2998873.3333333335, ans=0.125 2023-11-24 21:20:33,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449850 2023-11-24 21:20:43,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.508e+01 9.344e+01 9.867e+01 1.192e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-24 21:21:04,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2999140.0, ans=0.0 2023-11-24 21:21:06,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2999140.0, ans=0.0 2023-11-24 21:21:15,131 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5000, loss[loss=0.0648, simple_loss=0.0779, pruned_loss=0.01369, audio_tagging_loss=0.01216, over 14957.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09057, pruned_loss=0.01273, audio_tagging_loss=0.00888, over 3041490.32 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:21:22,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2999206.6666666665, ans=0.0 2023-11-24 21:21:36,329 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449900 2023-11-24 21:21:38,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2999340.0, ans=0.0 2023-11-24 21:21:48,360 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:21:55,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2999406.6666666665, ans=0.125 2023-11-24 21:22:05,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2999473.3333333335, ans=0.0 2023-11-24 21:22:11,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2999473.3333333335, ans=0.0 2023-11-24 21:22:16,950 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5050, loss[loss=0.07687, simple_loss=0.1102, pruned_loss=0.01579, audio_tagging_loss=0.00601, over 14780.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09155, pruned_loss=0.01289, audio_tagging_loss=0.008739, over 3047013.98 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:22:36,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2999606.6666666665, ans=0.0 2023-11-24 21:22:38,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 449950 2023-11-24 21:22:42,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2023-11-24 21:22:48,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.754e+01 9.208e+01 9.764e+01 1.294e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 21:22:49,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-11-24 21:22:49,761 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:23:19,847 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5100, loss[loss=0.07455, simple_loss=0.09285, pruned_loss=0.01863, audio_tagging_loss=0.009502, over 15988.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09019, pruned_loss=0.01264, audio_tagging_loss=0.008822, over 3042583.64 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:23:26,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2023-11-24 21:23:39,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2999940.0, ans=0.125 2023-11-24 21:23:40,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450000 2023-11-24 21:24:00,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3000073.3333333335, ans=0.125 2023-11-24 21:24:18,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-24 21:24:22,051 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5150, loss[loss=0.05303, simple_loss=0.07209, pruned_loss=0.008604, audio_tagging_loss=0.008386, over 14615.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09007, pruned_loss=0.01259, audio_tagging_loss=0.008759, over 3037168.94 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:24:25,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3000206.6666666665, ans=0.5 2023-11-24 21:24:37,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-24 21:24:42,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450050 2023-11-24 21:24:48,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3000340.0, ans=0.0 2023-11-24 21:24:53,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.721e+01 9.361e+01 9.842e+01 1.528e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-24 21:25:11,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3000473.3333333335, ans=0.125 2023-11-24 21:25:12,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3000473.3333333335, ans=0.0 2023-11-24 21:25:15,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3000473.3333333335, ans=0.0 2023-11-24 21:25:17,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2023-11-24 21:25:22,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3000473.3333333335, ans=0.0 2023-11-24 21:25:24,408 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5200, loss[loss=0.0624, simple_loss=0.09008, pruned_loss=0.009503, audio_tagging_loss=0.007856, over 15608.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09026, pruned_loss=0.01264, audio_tagging_loss=0.008737, over 3039400.00 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:25:41,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3000606.6666666665, ans=0.0 2023-11-24 21:25:45,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-24 21:25:45,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450100 2023-11-24 21:25:51,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3000673.3333333335, ans=0.125 2023-11-24 21:26:08,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=22.5 2023-11-24 21:26:26,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3000873.3333333335, ans=0.0 2023-11-24 21:26:27,389 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5250, loss[loss=0.05449, simple_loss=0.07429, pruned_loss=0.01057, audio_tagging_loss=0.00677, over 14960.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08995, pruned_loss=0.01268, audio_tagging_loss=0.008697, over 3038907.87 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:26:31,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3000873.3333333335, ans=0.1 2023-11-24 21:26:32,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3000873.3333333335, ans=0.0 2023-11-24 21:26:38,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2023-11-24 21:26:45,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-11-24 21:26:48,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450150 2023-11-24 21:26:59,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.390e+01 9.326e+01 9.807e+01 1.099e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-24 21:27:08,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3001073.3333333335, ans=0.0 2023-11-24 21:27:25,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3001140.0, ans=0.2 2023-11-24 21:27:29,178 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5300, loss[loss=0.06733, simple_loss=0.08774, pruned_loss=0.01534, audio_tagging_loss=0.008124, over 14168.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08946, pruned_loss=0.01256, audio_tagging_loss=0.008826, over 3043982.22 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:27:42,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3001273.3333333335, ans=0.125 2023-11-24 21:27:50,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450200 2023-11-24 21:27:50,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-24 21:27:59,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3001340.0, ans=0.05 2023-11-24 21:28:06,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3001406.6666666665, ans=0.0 2023-11-24 21:28:22,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3001473.3333333335, ans=0.125 2023-11-24 21:28:24,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3001473.3333333335, ans=0.2 2023-11-24 21:28:25,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3001473.3333333335, ans=0.125 2023-11-24 21:28:28,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3001473.3333333335, ans=0.025 2023-11-24 21:28:31,199 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5350, loss[loss=0.08016, simple_loss=0.1172, pruned_loss=0.01512, audio_tagging_loss=0.006433, over 16185.00 frames. ], tot_loss[loss=0.067, simple_loss=0.0909, pruned_loss=0.01279, audio_tagging_loss=0.008761, over 3045977.29 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:28:37,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=22.5 2023-11-24 21:28:50,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3001606.6666666665, ans=0.0 2023-11-24 21:28:52,497 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450250 2023-11-24 21:28:54,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-24 21:28:59,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-24 21:29:03,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.563e+01 9.107e+01 9.848e+01 1.290e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 21:29:04,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2023-11-24 21:29:13,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3001740.0, ans=0.125 2023-11-24 21:29:28,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3001806.6666666665, ans=0.2 2023-11-24 21:29:33,373 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5400, loss[loss=0.05621, simple_loss=0.07849, pruned_loss=0.008761, audio_tagging_loss=0.008197, over 14621.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0906, pruned_loss=0.01277, audio_tagging_loss=0.008805, over 3044871.20 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:29:33,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3001873.3333333335, ans=0.0 2023-11-24 21:29:39,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3001873.3333333335, ans=0.125 2023-11-24 21:29:47,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3001940.0, ans=0.1 2023-11-24 21:29:55,222 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450300 2023-11-24 21:29:56,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-24 21:30:00,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-24 21:30:34,888 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5450, loss[loss=0.05758, simple_loss=0.07286, pruned_loss=0.009891, audio_tagging_loss=0.01126, over 16278.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09062, pruned_loss=0.01284, audio_tagging_loss=0.008815, over 3049632.77 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:30:38,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2023-11-24 21:30:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3002273.3333333335, ans=0.0 2023-11-24 21:30:56,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450350 2023-11-24 21:30:58,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3002340.0, ans=0.0 2023-11-24 21:31:07,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.563e+01 9.171e+01 9.760e+01 1.153e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 21:31:09,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2023-11-24 21:31:11,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-24 21:31:36,962 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5500, loss[loss=0.06481, simple_loss=0.09251, pruned_loss=0.01197, audio_tagging_loss=0.006587, over 14653.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09083, pruned_loss=0.01293, audio_tagging_loss=0.008797, over 3049053.27 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:31:38,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3002540.0, ans=0.0 2023-11-24 21:31:55,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3002606.6666666665, ans=0.125 2023-11-24 21:31:57,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450400 2023-11-24 21:32:21,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-24 21:32:24,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3002740.0, ans=0.0 2023-11-24 21:32:27,774 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:32:38,613 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5550, loss[loss=0.05495, simple_loss=0.07238, pruned_loss=0.01064, audio_tagging_loss=0.008122, over 14869.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09074, pruned_loss=0.01302, audio_tagging_loss=0.0089, over 3048147.34 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 21:32:48,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3002873.3333333335, ans=0.0 2023-11-24 21:32:58,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3002940.0, ans=0.125 2023-11-24 21:32:59,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450450 2023-11-24 21:33:12,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.457e+01 9.165e+01 9.990e+01 1.185e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 21:33:15,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=22.5 2023-11-24 21:33:30,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3003140.0, ans=0.125 2023-11-24 21:33:35,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-24 21:33:40,968 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5600, loss[loss=0.06914, simple_loss=0.09848, pruned_loss=0.01153, audio_tagging_loss=0.008373, over 16220.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09123, pruned_loss=0.01311, audio_tagging_loss=0.009014, over 3050474.31 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:33:51,416 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:34:02,678 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450500 2023-11-24 21:34:16,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3003340.0, ans=0.125 2023-11-24 21:34:22,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3003406.6666666665, ans=0.1 2023-11-24 21:34:23,712 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 21:34:25,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3003406.6666666665, ans=0.0 2023-11-24 21:34:43,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3003540.0, ans=0.125 2023-11-24 21:34:44,286 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5650, loss[loss=0.08396, simple_loss=0.1093, pruned_loss=0.02101, audio_tagging_loss=0.008296, over 15805.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09195, pruned_loss=0.01338, audio_tagging_loss=0.009024, over 3059032.69 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:34:50,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-11-24 21:34:57,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3003606.6666666665, ans=0.125 2023-11-24 21:35:05,661 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450550 2023-11-24 21:35:10,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3003673.3333333335, ans=0.0 2023-11-24 21:35:17,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.565e+01 9.205e+01 1.008e+02 1.241e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-24 21:35:34,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-24 21:35:36,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3003806.6666666665, ans=0.125 2023-11-24 21:35:46,681 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5700, loss[loss=0.05708, simple_loss=0.08082, pruned_loss=0.008756, audio_tagging_loss=0.007914, over 14472.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09139, pruned_loss=0.01313, audio_tagging_loss=0.009033, over 3061839.85 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:36:08,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450600 2023-11-24 21:36:29,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3004073.3333333335, ans=0.125 2023-11-24 21:36:35,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3004073.3333333335, ans=0.125 2023-11-24 21:36:36,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3004140.0, ans=0.0 2023-11-24 21:36:49,716 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5750, loss[loss=0.05586, simple_loss=0.07802, pruned_loss=0.009464, audio_tagging_loss=0.007382, over 15793.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09109, pruned_loss=0.013, audio_tagging_loss=0.008882, over 3063648.16 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:37:11,199 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450650 2023-11-24 21:37:17,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3004340.0, ans=0.125 2023-11-24 21:37:23,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.366e+01 9.124e+01 1.023e+02 1.214e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-24 21:37:30,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3004406.6666666665, ans=0.0 2023-11-24 21:37:52,009 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5800, loss[loss=0.07701, simple_loss=0.1081, pruned_loss=0.01527, audio_tagging_loss=0.007672, over 14911.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0907, pruned_loss=0.01295, audio_tagging_loss=0.008819, over 3057353.59 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:37:54,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3004540.0, ans=0.125 2023-11-24 21:38:05,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3004606.6666666665, ans=0.125 2023-11-24 21:38:12,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-24 21:38:13,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450700 2023-11-24 21:38:14,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3004606.6666666665, ans=0.035 2023-11-24 21:38:37,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-24 21:38:50,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3004806.6666666665, ans=0.0 2023-11-24 21:38:54,022 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5850, loss[loss=0.0539, simple_loss=0.07229, pruned_loss=0.01049, audio_tagging_loss=0.007264, over 15174.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.0908, pruned_loss=0.01299, audio_tagging_loss=0.008841, over 3056212.04 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:39:05,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3004940.0, ans=0.1 2023-11-24 21:39:07,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3004940.0, ans=0.0 2023-11-24 21:39:12,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3004940.0, ans=0.125 2023-11-24 21:39:14,674 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450750 2023-11-24 21:39:26,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.737e+01 9.217e+01 9.924e+01 1.196e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-24 21:39:27,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-24 21:39:38,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3005073.3333333335, ans=0.0 2023-11-24 21:39:48,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3005140.0, ans=0.1 2023-11-24 21:39:52,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3005140.0, ans=0.2 2023-11-24 21:39:55,820 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5900, loss[loss=0.07223, simple_loss=0.09497, pruned_loss=0.01768, audio_tagging_loss=0.00706, over 15340.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09121, pruned_loss=0.01296, audio_tagging_loss=0.008781, over 3062276.52 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:40:16,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450800 2023-11-24 21:40:31,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3005340.0, ans=0.125 2023-11-24 21:40:57,896 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 5950, loss[loss=0.05332, simple_loss=0.06932, pruned_loss=0.008847, audio_tagging_loss=0.00981, over 14999.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09091, pruned_loss=0.01282, audio_tagging_loss=0.008795, over 3060907.95 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:41:01,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3005540.0, ans=0.2 2023-11-24 21:41:09,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3005606.6666666665, ans=0.125 2023-11-24 21:41:16,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3005606.6666666665, ans=0.125 2023-11-24 21:41:19,142 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450850 2023-11-24 21:41:31,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.441e+01 9.152e+01 9.875e+01 1.330e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 21:41:42,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2023-11-24 21:41:54,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-24 21:41:59,102 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6000, loss[loss=0.07814, simple_loss=0.1118, pruned_loss=0.01548, audio_tagging_loss=0.006748, over 16214.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09207, pruned_loss=0.01304, audio_tagging_loss=0.008662, over 3062586.72 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:41:59,103 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 21:42:40,941 INFO [train_asr.py:1253] (1/4) Epoch 38, validation: loss=0.05788, simple_loss=0.05074, pruned_loss=0.005119, audio_tagging_loss=0.02739, over 4681554.00 frames. 2023-11-24 21:42:40,942 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 21:42:43,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-24 21:42:46,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2023-11-24 21:42:53,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3005940.0, ans=0.04949747468305833 2023-11-24 21:42:58,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3005940.0, ans=0.125 2023-11-24 21:43:01,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450900 2023-11-24 21:43:01,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3005940.0, ans=0.2 2023-11-24 21:43:02,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3005940.0, ans=0.2 2023-11-24 21:43:22,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3006073.3333333335, ans=0.1 2023-11-24 21:43:23,363 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 21:43:42,799 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6050, loss[loss=0.06154, simple_loss=0.07641, pruned_loss=0.01386, audio_tagging_loss=0.00947, over 14641.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.0916, pruned_loss=0.01292, audio_tagging_loss=0.008679, over 3060068.79 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:43:49,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3006206.6666666665, ans=0.125 2023-11-24 21:44:04,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 450950 2023-11-24 21:44:15,329 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:44:17,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.516e+01 8.966e+01 9.859e+01 1.258e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 21:44:40,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-11-24 21:44:44,319 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6100, loss[loss=0.07266, simple_loss=0.09783, pruned_loss=0.01529, audio_tagging_loss=0.008462, over 14704.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09069, pruned_loss=0.01281, audio_tagging_loss=0.008729, over 3052474.23 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:44:53,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3006540.0, ans=0.0 2023-11-24 21:44:55,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3006540.0, ans=0.125 2023-11-24 21:45:06,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451000 2023-11-24 21:45:47,795 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6150, loss[loss=0.07571, simple_loss=0.09478, pruned_loss=0.0198, audio_tagging_loss=0.008522, over 14904.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09021, pruned_loss=0.0128, audio_tagging_loss=0.008793, over 3048347.09 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:46:04,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=22.5 2023-11-24 21:46:08,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451050 2023-11-24 21:46:17,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3007006.6666666665, ans=0.125 2023-11-24 21:46:21,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.834e+01 9.340e+01 1.011e+02 1.357e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-24 21:46:33,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3007073.3333333335, ans=0.0 2023-11-24 21:46:34,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3007073.3333333335, ans=0.0 2023-11-24 21:46:37,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3007140.0, ans=0.1 2023-11-24 21:46:45,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-24 21:46:49,584 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6200, loss[loss=0.06047, simple_loss=0.07301, pruned_loss=0.01155, audio_tagging_loss=0.01241, over 14689.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09013, pruned_loss=0.01268, audio_tagging_loss=0.008824, over 3043738.03 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:46:53,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-24 21:46:57,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-24 21:47:01,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3007273.3333333335, ans=0.0 2023-11-24 21:47:10,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451100 2023-11-24 21:47:11,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3007273.3333333335, ans=0.2 2023-11-24 21:47:30,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3007406.6666666665, ans=0.2 2023-11-24 21:47:31,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3007406.6666666665, ans=0.125 2023-11-24 21:47:42,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3007473.3333333335, ans=0.125 2023-11-24 21:47:50,823 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6250, loss[loss=0.08934, simple_loss=0.1167, pruned_loss=0.02121, audio_tagging_loss=0.009769, over 15377.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0901, pruned_loss=0.01268, audio_tagging_loss=0.008927, over 3048026.78 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:47:51,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3007540.0, ans=0.0 2023-11-24 21:47:56,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3007540.0, ans=0.2 2023-11-24 21:48:02,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3007606.6666666665, ans=0.1 2023-11-24 21:48:02,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=12.0 2023-11-24 21:48:07,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3007606.6666666665, ans=0.1 2023-11-24 21:48:12,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451150 2023-11-24 21:48:23,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3007673.3333333335, ans=0.1 2023-11-24 21:48:26,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.537e+01 9.157e+01 9.825e+01 1.470e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-24 21:48:47,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3007806.6666666665, ans=0.125 2023-11-24 21:48:51,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3007806.6666666665, ans=0.1 2023-11-24 21:48:53,252 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6300, loss[loss=0.05852, simple_loss=0.07758, pruned_loss=0.01182, audio_tagging_loss=0.007914, over 15064.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09166, pruned_loss=0.01293, audio_tagging_loss=0.008868, over 3044649.60 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:49:14,667 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451200 2023-11-24 21:49:24,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3008006.6666666665, ans=0.125 2023-11-24 21:49:25,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3008006.6666666665, ans=0.125 2023-11-24 21:49:44,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3008140.0, ans=0.125 2023-11-24 21:49:56,227 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6350, loss[loss=0.06132, simple_loss=0.08152, pruned_loss=0.01243, audio_tagging_loss=0.008135, over 15282.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09115, pruned_loss=0.01295, audio_tagging_loss=0.008933, over 3042087.09 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 21:50:15,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3008273.3333333335, ans=0.125 2023-11-24 21:50:17,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451250 2023-11-24 21:50:28,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3008340.0, ans=0.125 2023-11-24 21:50:30,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 8.435e+01 8.876e+01 9.772e+01 1.151e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-24 21:50:57,803 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6400, loss[loss=0.05458, simple_loss=0.06665, pruned_loss=0.008716, audio_tagging_loss=0.01254, over 15076.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08946, pruned_loss=0.01268, audio_tagging_loss=0.009174, over 3047716.03 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:51:19,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451300 2023-11-24 21:51:43,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3008740.0, ans=0.95 2023-11-24 21:51:46,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3008806.6666666665, ans=0.2 2023-11-24 21:51:52,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3008806.6666666665, ans=10.0 2023-11-24 21:51:55,989 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 21:51:59,916 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6450, loss[loss=0.06748, simple_loss=0.0867, pruned_loss=0.01414, audio_tagging_loss=0.009993, over 15836.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08979, pruned_loss=0.0127, audio_tagging_loss=0.009179, over 3045630.88 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:52:15,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-24 21:52:21,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451350 2023-11-24 21:52:34,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.612e+01 9.313e+01 1.021e+02 1.233e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 21:52:39,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3009073.3333333335, ans=0.2 2023-11-24 21:52:39,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3009073.3333333335, ans=0.125 2023-11-24 21:52:47,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3009073.3333333335, ans=0.025 2023-11-24 21:53:01,735 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6500, loss[loss=0.05993, simple_loss=0.08548, pruned_loss=0.008691, audio_tagging_loss=0.008498, over 16560.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08969, pruned_loss=0.01252, audio_tagging_loss=0.009182, over 3048615.37 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:53:07,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3009206.6666666665, ans=0.2 2023-11-24 21:53:22,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451400 2023-11-24 21:53:39,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-24 21:53:44,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3009406.6666666665, ans=0.2 2023-11-24 21:54:04,587 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6550, loss[loss=0.0651, simple_loss=0.09362, pruned_loss=0.01077, audio_tagging_loss=0.00752, over 14694.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08973, pruned_loss=0.01265, audio_tagging_loss=0.009101, over 3049098.93 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:54:08,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3009540.0, ans=0.125 2023-11-24 21:54:08,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-11-24 21:54:11,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2023-11-24 21:54:25,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451450 2023-11-24 21:54:35,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3009673.3333333335, ans=0.125 2023-11-24 21:54:38,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3009673.3333333335, ans=0.125 2023-11-24 21:54:39,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.549e+01 9.293e+01 1.004e+02 1.709e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-24 21:55:06,533 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6600, loss[loss=0.08573, simple_loss=0.1092, pruned_loss=0.02233, audio_tagging_loss=0.008811, over 14026.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08968, pruned_loss=0.01259, audio_tagging_loss=0.008919, over 3044930.19 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:55:14,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3009873.3333333335, ans=0.125 2023-11-24 21:55:28,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451500 2023-11-24 21:55:31,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3010006.6666666665, ans=0.125 2023-11-24 21:55:36,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3010006.6666666665, ans=0.125 2023-11-24 21:55:58,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3010140.0, ans=0.125 2023-11-24 21:56:08,550 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6650, loss[loss=0.04854, simple_loss=0.06721, pruned_loss=0.007702, audio_tagging_loss=0.007237, over 15792.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09021, pruned_loss=0.01276, audio_tagging_loss=0.008832, over 3048692.52 frames. ], batch size: 63, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:56:30,297 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451550 2023-11-24 21:56:30,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3010273.3333333335, ans=0.125 2023-11-24 21:56:33,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.02 vs. limit=10.0 2023-11-24 21:56:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3010340.0, ans=0.125 2023-11-24 21:56:44,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.506e+01 9.140e+01 9.928e+01 1.246e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-24 21:56:44,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3010340.0, ans=0.0 2023-11-24 21:56:47,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-24 21:57:04,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3010473.3333333335, ans=0.0 2023-11-24 21:57:11,177 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6700, loss[loss=0.06528, simple_loss=0.09489, pruned_loss=0.01141, audio_tagging_loss=0.006416, over 15939.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09019, pruned_loss=0.01265, audio_tagging_loss=0.008793, over 3040443.20 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:57:15,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3010540.0, ans=0.125 2023-11-24 21:57:16,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3010540.0, ans=0.125 2023-11-24 21:57:20,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3010540.0, ans=0.09899494936611666 2023-11-24 21:57:20,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3010540.0, ans=0.2 2023-11-24 21:57:32,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451600 2023-11-24 21:58:13,672 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6750, loss[loss=0.07751, simple_loss=0.1031, pruned_loss=0.0188, audio_tagging_loss=0.007179, over 17244.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09048, pruned_loss=0.01279, audio_tagging_loss=0.008737, over 3040063.95 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:58:28,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3010940.0, ans=0.125 2023-11-24 21:58:34,514 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451650 2023-11-24 21:58:36,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3010940.0, ans=0.0 2023-11-24 21:58:36,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3010940.0, ans=0.2 2023-11-24 21:58:49,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.416e+01 8.949e+01 9.766e+01 1.528e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-24 21:59:13,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3011140.0, ans=0.0 2023-11-24 21:59:15,541 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6800, loss[loss=0.05707, simple_loss=0.07226, pruned_loss=0.01047, audio_tagging_loss=0.01047, over 15745.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09031, pruned_loss=0.01272, audio_tagging_loss=0.008779, over 3044422.72 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 21:59:25,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3011206.6666666665, ans=0.125 2023-11-24 21:59:36,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451700 2023-11-24 21:59:48,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3011340.0, ans=0.0 2023-11-24 22:00:18,166 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6850, loss[loss=0.07818, simple_loss=0.1054, pruned_loss=0.01468, audio_tagging_loss=0.01083, over 14815.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09062, pruned_loss=0.0128, audio_tagging_loss=0.00877, over 3045272.95 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:00:31,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3011606.6666666665, ans=0.125 2023-11-24 22:00:39,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451750 2023-11-24 22:00:43,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3011673.3333333335, ans=0.125 2023-11-24 22:00:47,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3011673.3333333335, ans=0.0 2023-11-24 22:00:50,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3011673.3333333335, ans=0.125 2023-11-24 22:00:51,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3011673.3333333335, ans=0.0 2023-11-24 22:00:54,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.792e+01 9.209e+01 9.855e+01 1.264e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-24 22:00:56,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3011740.0, ans=0.0 2023-11-24 22:01:11,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3011806.6666666665, ans=0.125 2023-11-24 22:01:19,653 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6900, loss[loss=0.04867, simple_loss=0.0617, pruned_loss=0.008904, audio_tagging_loss=0.008918, over 16313.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0908, pruned_loss=0.01254, audio_tagging_loss=0.008698, over 3045437.00 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:01:25,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3011873.3333333335, ans=0.125 2023-11-24 22:01:41,160 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451800 2023-11-24 22:01:47,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3012006.6666666665, ans=0.125 2023-11-24 22:01:49,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3012006.6666666665, ans=0.0 2023-11-24 22:02:06,147 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 22:02:22,847 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 6950, loss[loss=0.05358, simple_loss=0.07422, pruned_loss=0.008678, audio_tagging_loss=0.007795, over 15017.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09036, pruned_loss=0.01251, audio_tagging_loss=0.008789, over 3042968.42 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:02:23,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3012206.6666666665, ans=0.125 2023-11-24 22:02:44,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451850 2023-11-24 22:02:52,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3012340.0, ans=0.125 2023-11-24 22:02:54,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3012340.0, ans=0.0 2023-11-24 22:03:00,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.404e+01 9.015e+01 9.725e+01 1.456e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-24 22:03:02,227 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:03:18,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3012473.3333333335, ans=0.0 2023-11-24 22:03:24,965 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7000, loss[loss=0.05448, simple_loss=0.07151, pruned_loss=0.007899, audio_tagging_loss=0.01083, over 15460.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09006, pruned_loss=0.01258, audio_tagging_loss=0.008818, over 3037385.31 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:03:32,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3012540.0, ans=0.125 2023-11-24 22:03:35,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012540.0, ans=0.1 2023-11-24 22:03:42,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3012606.6666666665, ans=0.0 2023-11-24 22:03:46,408 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451900 2023-11-24 22:03:46,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3012606.6666666665, ans=0.1 2023-11-24 22:03:55,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=22.5 2023-11-24 22:03:55,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-24 22:04:26,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3012873.3333333335, ans=0.125 2023-11-24 22:04:27,278 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7050, loss[loss=0.06491, simple_loss=0.09323, pruned_loss=0.01001, audio_tagging_loss=0.008282, over 15819.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08987, pruned_loss=0.01265, audio_tagging_loss=0.008845, over 3033924.01 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:04:48,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 451950 2023-11-24 22:04:51,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3013006.6666666665, ans=0.125 2023-11-24 22:04:58,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-24 22:05:04,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.687e+01 9.290e+01 1.031e+02 1.279e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-24 22:05:04,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3013073.3333333335, ans=0.2 2023-11-24 22:05:09,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3013073.3333333335, ans=0.125 2023-11-24 22:05:16,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-24 22:05:29,449 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7100, loss[loss=0.06652, simple_loss=0.08895, pruned_loss=0.01378, audio_tagging_loss=0.008271, over 15326.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08961, pruned_loss=0.01248, audio_tagging_loss=0.008898, over 3046601.95 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:05:33,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3013206.6666666665, ans=0.2 2023-11-24 22:05:42,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3013273.3333333335, ans=0.2 2023-11-24 22:05:43,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3013273.3333333335, ans=0.125 2023-11-24 22:05:44,332 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:05:50,299 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452000 2023-11-24 22:06:00,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3013340.0, ans=0.125 2023-11-24 22:06:00,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3013340.0, ans=0.0 2023-11-24 22:06:02,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3013340.0, ans=0.0 2023-11-24 22:06:12,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3013406.6666666665, ans=0.1 2023-11-24 22:06:35,702 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7150, loss[loss=0.06338, simple_loss=0.08467, pruned_loss=0.01512, audio_tagging_loss=0.005922, over 15005.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0896, pruned_loss=0.01242, audio_tagging_loss=0.008911, over 3045680.80 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:06:56,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452050 2023-11-24 22:07:07,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-24 22:07:12,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.770e+01 9.367e+01 1.013e+02 1.404e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-24 22:07:23,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3013740.0, ans=0.125 2023-11-24 22:07:28,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3013806.6666666665, ans=0.125 2023-11-24 22:07:32,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-24 22:07:37,804 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7200, loss[loss=0.04953, simple_loss=0.06852, pruned_loss=0.007767, audio_tagging_loss=0.007505, over 15823.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08913, pruned_loss=0.01237, audio_tagging_loss=0.009054, over 3039849.83 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:07:51,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2023-11-24 22:07:58,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452100 2023-11-24 22:07:59,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3013940.0, ans=0.125 2023-11-24 22:08:02,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3014006.6666666665, ans=0.1 2023-11-24 22:08:25,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-24 22:08:40,397 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7250, loss[loss=0.07872, simple_loss=0.1068, pruned_loss=0.01771, audio_tagging_loss=0.007616, over 15779.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08977, pruned_loss=0.01244, audio_tagging_loss=0.009054, over 3040851.14 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:08:40,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3014206.6666666665, ans=0.2 2023-11-24 22:08:55,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3014273.3333333335, ans=0.125 2023-11-24 22:09:01,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452150 2023-11-24 22:09:18,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.553e+01 9.150e+01 9.601e+01 1.170e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 22:09:41,857 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7300, loss[loss=0.07014, simple_loss=0.09365, pruned_loss=0.01688, audio_tagging_loss=0.006428, over 14691.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09021, pruned_loss=0.01269, audio_tagging_loss=0.008948, over 3037058.64 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:09:58,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3014606.6666666665, ans=0.125 2023-11-24 22:10:02,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452200 2023-11-24 22:10:11,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3014673.3333333335, ans=0.1 2023-11-24 22:10:16,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2023-11-24 22:10:39,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3014806.6666666665, ans=0.0 2023-11-24 22:10:43,898 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7350, loss[loss=0.07087, simple_loss=0.1026, pruned_loss=0.01049, audio_tagging_loss=0.009081, over 16042.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08997, pruned_loss=0.01266, audio_tagging_loss=0.008816, over 3040267.51 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:10:54,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3014873.3333333335, ans=0.125 2023-11-24 22:11:02,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3014940.0, ans=0.1 2023-11-24 22:11:05,590 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452250 2023-11-24 22:11:11,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3015006.6666666665, ans=0.125 2023-11-24 22:11:22,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.406e+01 8.894e+01 9.896e+01 1.273e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-24 22:11:27,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3015073.3333333335, ans=0.07 2023-11-24 22:11:46,719 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7400, loss[loss=0.04484, simple_loss=0.06323, pruned_loss=0.004757, audio_tagging_loss=0.008474, over 14926.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09053, pruned_loss=0.01269, audio_tagging_loss=0.008671, over 3046054.78 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:12:07,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452300 2023-11-24 22:12:25,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3015406.6666666665, ans=0.1 2023-11-24 22:12:46,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3015473.3333333335, ans=0.125 2023-11-24 22:12:46,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-24 22:12:48,013 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7450, loss[loss=0.09251, simple_loss=0.1343, pruned_loss=0.01717, audio_tagging_loss=0.008193, over 16338.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09036, pruned_loss=0.01275, audio_tagging_loss=0.008573, over 3047353.59 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:12:48,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3015540.0, ans=0.2 2023-11-24 22:13:08,739 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452350 2023-11-24 22:13:22,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3015673.3333333335, ans=0.125 2023-11-24 22:13:28,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.690e+01 9.275e+01 1.001e+02 1.240e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-24 22:13:44,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3015806.6666666665, ans=0.125 2023-11-24 22:13:49,770 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7500, loss[loss=0.06606, simple_loss=0.08908, pruned_loss=0.0112, audio_tagging_loss=0.01032, over 14999.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09069, pruned_loss=0.01285, audio_tagging_loss=0.008649, over 3040914.99 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:14:08,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2023-11-24 22:14:11,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452400 2023-11-24 22:14:17,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-24 22:14:52,039 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7550, loss[loss=0.06634, simple_loss=0.09559, pruned_loss=0.01042, audio_tagging_loss=0.008129, over 15741.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0906, pruned_loss=0.01285, audio_tagging_loss=0.008658, over 3044111.25 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:14:56,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3016206.6666666665, ans=0.0 2023-11-24 22:15:03,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2023-11-24 22:15:14,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452450 2023-11-24 22:15:18,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3016340.0, ans=0.0 2023-11-24 22:15:25,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3016340.0, ans=0.2 2023-11-24 22:15:28,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016406.6666666665, ans=0.1 2023-11-24 22:15:32,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.333e+01 9.038e+01 9.761e+01 1.363e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 22:15:32,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3016406.6666666665, ans=0.1 2023-11-24 22:15:33,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3016406.6666666665, ans=0.0 2023-11-24 22:15:33,734 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:15:43,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3016473.3333333335, ans=0.0 2023-11-24 22:15:45,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3016473.3333333335, ans=0.125 2023-11-24 22:15:46,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3016473.3333333335, ans=0.1 2023-11-24 22:15:55,614 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7600, loss[loss=0.05527, simple_loss=0.07924, pruned_loss=0.007166, audio_tagging_loss=0.008485, over 14808.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08986, pruned_loss=0.0128, audio_tagging_loss=0.008727, over 3039796.24 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:16:03,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3016540.0, ans=0.0 2023-11-24 22:16:04,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3016540.0, ans=0.125 2023-11-24 22:16:07,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3016606.6666666665, ans=0.0 2023-11-24 22:16:15,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3016606.6666666665, ans=0.2 2023-11-24 22:16:16,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452500 2023-11-24 22:16:20,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=22.5 2023-11-24 22:16:26,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3016673.3333333335, ans=0.0 2023-11-24 22:16:49,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-24 22:16:57,860 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7650, loss[loss=0.07705, simple_loss=0.1047, pruned_loss=0.01843, audio_tagging_loss=0.006296, over 15946.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08944, pruned_loss=0.01271, audio_tagging_loss=0.008626, over 3030956.24 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:17:19,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452550 2023-11-24 22:17:24,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3017006.6666666665, ans=0.1 2023-11-24 22:17:34,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3017073.3333333335, ans=0.0 2023-11-24 22:17:37,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.417e+01 9.176e+01 1.003e+02 1.926e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-24 22:17:45,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=22.5 2023-11-24 22:17:55,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:18:00,070 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7700, loss[loss=0.06592, simple_loss=0.08692, pruned_loss=0.01383, audio_tagging_loss=0.008627, over 15033.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09014, pruned_loss=0.01278, audio_tagging_loss=0.008586, over 3041155.87 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:18:00,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-24 22:18:02,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3017206.6666666665, ans=0.2 2023-11-24 22:18:07,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-24 22:18:10,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3017206.6666666665, ans=0.2 2023-11-24 22:18:11,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3017273.3333333335, ans=0.0 2023-11-24 22:18:18,112 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:18:21,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452600 2023-11-24 22:18:21,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3017273.3333333335, ans=0.2 2023-11-24 22:18:28,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3017340.0, ans=0.0 2023-11-24 22:18:34,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3017340.0, ans=0.1 2023-11-24 22:18:46,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-24 22:19:01,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3017540.0, ans=0.0 2023-11-24 22:19:02,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-11-24 22:19:02,800 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7750, loss[loss=0.05764, simple_loss=0.07795, pruned_loss=0.007691, audio_tagging_loss=0.01098, over 15096.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08983, pruned_loss=0.01264, audio_tagging_loss=0.008733, over 3046528.02 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:19:17,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3017606.6666666665, ans=0.1 2023-11-24 22:19:22,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3017606.6666666665, ans=0.0 2023-11-24 22:19:23,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452650 2023-11-24 22:19:34,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:19:40,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3017740.0, ans=0.2 2023-11-24 22:19:42,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3017740.0, ans=0.0 2023-11-24 22:19:42,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.572e+01 9.238e+01 9.882e+01 1.332e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-24 22:19:48,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3017740.0, ans=0.0 2023-11-24 22:20:02,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3017806.6666666665, ans=0.125 2023-11-24 22:20:05,360 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7800, loss[loss=0.06603, simple_loss=0.08189, pruned_loss=0.01395, audio_tagging_loss=0.01113, over 15576.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09003, pruned_loss=0.01268, audio_tagging_loss=0.008764, over 3049808.70 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:20:26,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452700 2023-11-24 22:21:06,943 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7850, loss[loss=0.06716, simple_loss=0.09323, pruned_loss=0.01313, audio_tagging_loss=0.007411, over 15396.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09032, pruned_loss=0.01282, audio_tagging_loss=0.008825, over 3051206.14 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:21:27,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3018273.3333333335, ans=0.2 2023-11-24 22:21:28,275 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452750 2023-11-24 22:21:32,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2023-11-24 22:21:46,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.800e+01 9.383e+01 9.995e+01 1.679e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-24 22:21:47,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3018406.6666666665, ans=0.125 2023-11-24 22:21:51,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2023-11-24 22:21:51,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-11-24 22:21:57,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3018473.3333333335, ans=0.125 2023-11-24 22:22:03,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3018473.3333333335, ans=0.125 2023-11-24 22:22:06,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3018473.3333333335, ans=0.0 2023-11-24 22:22:09,982 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7900, loss[loss=0.08069, simple_loss=0.1145, pruned_loss=0.01382, audio_tagging_loss=0.009627, over 15401.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09072, pruned_loss=0.01291, audio_tagging_loss=0.008889, over 3050980.50 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:22:12,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3018540.0, ans=0.1 2023-11-24 22:22:12,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3018540.0, ans=0.5 2023-11-24 22:22:18,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3018540.0, ans=0.125 2023-11-24 22:22:21,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3018606.6666666665, ans=0.125 2023-11-24 22:22:22,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3018606.6666666665, ans=0.025 2023-11-24 22:22:26,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3018606.6666666665, ans=0.0 2023-11-24 22:22:30,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452800 2023-11-24 22:22:42,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-11-24 22:22:51,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3018740.0, ans=0.0 2023-11-24 22:23:02,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3018806.6666666665, ans=0.2 2023-11-24 22:23:03,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3018806.6666666665, ans=0.1 2023-11-24 22:23:12,333 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 7950, loss[loss=0.0841, simple_loss=0.1219, pruned_loss=0.017, audio_tagging_loss=0.006148, over 16302.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0904, pruned_loss=0.01283, audio_tagging_loss=0.008989, over 3051649.76 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:23:15,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3018873.3333333335, ans=0.125 2023-11-24 22:23:25,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2023-11-24 22:23:25,700 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 22:23:32,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3018940.0, ans=0.2 2023-11-24 22:23:32,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-24 22:23:33,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452850 2023-11-24 22:23:50,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3019073.3333333335, ans=0.0 2023-11-24 22:23:52,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.496e+01 9.034e+01 9.610e+01 1.973e+02, threshold=1.807e+02, percent-clipped=1.0 2023-11-24 22:24:07,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3019140.0, ans=0.125 2023-11-24 22:24:13,974 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8000, loss[loss=0.0524, simple_loss=0.06087, pruned_loss=0.01028, audio_tagging_loss=0.01169, over 14708.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08935, pruned_loss=0.01263, audio_tagging_loss=0.009112, over 3050115.23 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:24:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3019206.6666666665, ans=0.125 2023-11-24 22:24:21,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-24 22:24:22,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-24 22:24:35,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452900 2023-11-24 22:24:35,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3019273.3333333335, ans=0.125 2023-11-24 22:24:49,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3019340.0, ans=0.0 2023-11-24 22:24:57,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3019406.6666666665, ans=0.125 2023-11-24 22:25:01,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-11-24 22:25:13,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3019473.3333333335, ans=0.07 2023-11-24 22:25:16,637 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8050, loss[loss=0.0797, simple_loss=0.1126, pruned_loss=0.01797, audio_tagging_loss=0.005421, over 15099.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09007, pruned_loss=0.01263, audio_tagging_loss=0.009084, over 3040449.62 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:25:21,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019540.0, ans=0.1 2023-11-24 22:25:36,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3019606.6666666665, ans=0.0 2023-11-24 22:25:38,415 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 452950 2023-11-24 22:25:49,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3019673.3333333335, ans=0.125 2023-11-24 22:25:50,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3019673.3333333335, ans=0.0 2023-11-24 22:25:59,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.598e+01 8.520e+01 9.232e+01 9.839e+01 1.227e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 22:26:18,638 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8100, loss[loss=0.07534, simple_loss=0.1036, pruned_loss=0.01252, audio_tagging_loss=0.01103, over 16603.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09069, pruned_loss=0.01276, audio_tagging_loss=0.008995, over 3042122.37 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:26:32,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3019940.0, ans=0.0 2023-11-24 22:26:40,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453000 2023-11-24 22:26:47,634 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:26:47,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3020006.6666666665, ans=0.125 2023-11-24 22:26:55,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3020073.3333333335, ans=0.125 2023-11-24 22:27:21,684 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8150, loss[loss=0.06038, simple_loss=0.08083, pruned_loss=0.01042, audio_tagging_loss=0.009547, over 15084.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09066, pruned_loss=0.01281, audio_tagging_loss=0.008867, over 3047327.94 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:27:32,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3020206.6666666665, ans=0.5 2023-11-24 22:27:43,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453050 2023-11-24 22:27:51,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3020340.0, ans=0.95 2023-11-24 22:28:04,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.627e+01 9.105e+01 9.732e+01 1.211e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-24 22:28:06,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3020406.6666666665, ans=0.125 2023-11-24 22:28:23,175 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 22:28:24,323 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8200, loss[loss=0.07029, simple_loss=0.1, pruned_loss=0.01422, audio_tagging_loss=0.006058, over 15545.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09106, pruned_loss=0.01279, audio_tagging_loss=0.008715, over 3049102.20 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:28:29,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-24 22:28:31,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3020540.0, ans=0.125 2023-11-24 22:28:33,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3020540.0, ans=0.125 2023-11-24 22:28:37,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3020606.6666666665, ans=0.125 2023-11-24 22:28:45,371 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453100 2023-11-24 22:29:00,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3020740.0, ans=0.125 2023-11-24 22:29:00,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2023-11-24 22:29:20,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3020806.6666666665, ans=0.2 2023-11-24 22:29:26,346 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8250, loss[loss=0.0581, simple_loss=0.08183, pruned_loss=0.009481, audio_tagging_loss=0.007699, over 15363.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09094, pruned_loss=0.01277, audio_tagging_loss=0.008672, over 3047972.03 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:29:44,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3020940.0, ans=0.125 2023-11-24 22:29:47,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453150 2023-11-24 22:30:08,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.522e+01 9.040e+01 9.801e+01 1.361e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-24 22:30:11,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-24 22:30:20,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3021140.0, ans=0.125 2023-11-24 22:30:28,675 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8300, loss[loss=0.05011, simple_loss=0.06284, pruned_loss=0.009611, audio_tagging_loss=0.009076, over 15095.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09146, pruned_loss=0.01285, audio_tagging_loss=0.008626, over 3051546.18 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:30:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3021206.6666666665, ans=0.0 2023-11-24 22:30:36,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3021206.6666666665, ans=0.125 2023-11-24 22:30:50,904 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453200 2023-11-24 22:30:58,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.46 vs. limit=10.0 2023-11-24 22:31:24,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3021473.3333333335, ans=0.125 2023-11-24 22:31:32,467 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8350, loss[loss=0.06255, simple_loss=0.07711, pruned_loss=0.01172, audio_tagging_loss=0.01228, over 16302.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09233, pruned_loss=0.01309, audio_tagging_loss=0.008605, over 3050050.19 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 8.0 2023-11-24 22:31:48,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-24 22:31:50,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3021606.6666666665, ans=0.1 2023-11-24 22:31:52,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453250 2023-11-24 22:31:54,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3021606.6666666665, ans=0.125 2023-11-24 22:32:00,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3021673.3333333335, ans=0.125 2023-11-24 22:32:14,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.493e+01 9.150e+01 9.803e+01 1.531e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 22:32:34,010 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8400, loss[loss=0.04644, simple_loss=0.06755, pruned_loss=0.004788, audio_tagging_loss=0.007876, over 15480.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09237, pruned_loss=0.0132, audio_tagging_loss=0.008641, over 3053127.38 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:32:50,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3021940.0, ans=0.0 2023-11-24 22:32:50,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3021940.0, ans=0.2 2023-11-24 22:32:53,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3021940.0, ans=0.125 2023-11-24 22:32:54,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453300 2023-11-24 22:32:56,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3021940.0, ans=0.0 2023-11-24 22:33:07,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2023-11-24 22:33:12,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3022073.3333333335, ans=0.2 2023-11-24 22:33:22,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3022140.0, ans=0.1 2023-11-24 22:33:24,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3022140.0, ans=0.0 2023-11-24 22:33:36,193 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8450, loss[loss=0.06646, simple_loss=0.08656, pruned_loss=0.01188, audio_tagging_loss=0.01131, over 15552.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09131, pruned_loss=0.01298, audio_tagging_loss=0.008707, over 3047104.53 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:33:46,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3022206.6666666665, ans=0.2 2023-11-24 22:33:49,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3022273.3333333335, ans=0.1 2023-11-24 22:33:58,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453350 2023-11-24 22:34:18,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.708e+01 9.231e+01 1.022e+02 1.410e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-24 22:34:18,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3022406.6666666665, ans=0.2 2023-11-24 22:34:23,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3022406.6666666665, ans=0.2 2023-11-24 22:34:38,765 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8500, loss[loss=0.09169, simple_loss=0.1277, pruned_loss=0.02234, audio_tagging_loss=0.005486, over 14729.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09083, pruned_loss=0.01286, audio_tagging_loss=0.008796, over 3047450.30 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:34:55,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-11-24 22:34:58,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3022606.6666666665, ans=0.125 2023-11-24 22:34:59,560 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453400 2023-11-24 22:35:36,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3022806.6666666665, ans=0.125 2023-11-24 22:35:39,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3022806.6666666665, ans=0.125 2023-11-24 22:35:41,408 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8550, loss[loss=0.0767, simple_loss=0.1105, pruned_loss=0.01307, audio_tagging_loss=0.008387, over 15738.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09127, pruned_loss=0.01285, audio_tagging_loss=0.008771, over 3044349.33 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:35:44,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3022873.3333333335, ans=0.125 2023-11-24 22:35:50,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3022873.3333333335, ans=0.125 2023-11-24 22:36:02,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453450 2023-11-24 22:36:05,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-24 22:36:23,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.608e+01 9.249e+01 9.944e+01 1.269e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 22:36:31,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3023140.0, ans=0.1 2023-11-24 22:36:43,369 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8600, loss[loss=0.04901, simple_loss=0.06488, pruned_loss=0.007661, audio_tagging_loss=0.008915, over 15230.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09061, pruned_loss=0.01275, audio_tagging_loss=0.00884, over 3042739.45 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:36:50,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3023206.6666666665, ans=0.0 2023-11-24 22:37:05,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453500 2023-11-24 22:37:33,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3023473.3333333335, ans=0.125 2023-11-24 22:37:45,783 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8650, loss[loss=0.06826, simple_loss=0.09055, pruned_loss=0.01476, audio_tagging_loss=0.00822, over 15769.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09143, pruned_loss=0.01297, audio_tagging_loss=0.008894, over 3037034.16 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:38:07,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453550 2023-11-24 22:38:19,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3023673.3333333335, ans=0.2 2023-11-24 22:38:27,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.604e+01 9.306e+01 1.022e+02 1.267e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-24 22:38:45,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3023806.6666666665, ans=0.125 2023-11-24 22:38:48,882 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8700, loss[loss=0.04832, simple_loss=0.05076, pruned_loss=0.008539, audio_tagging_loss=0.0144, over 15426.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09112, pruned_loss=0.01304, audio_tagging_loss=0.008961, over 3037268.50 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:39:09,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453600 2023-11-24 22:39:12,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3024006.6666666665, ans=0.125 2023-11-24 22:39:15,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3024006.6666666665, ans=0.125 2023-11-24 22:39:15,140 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:39:51,304 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8750, loss[loss=0.05868, simple_loss=0.0722, pruned_loss=0.009492, audio_tagging_loss=0.01309, over 15804.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09131, pruned_loss=0.01301, audio_tagging_loss=0.008968, over 3039591.94 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:39:54,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3024206.6666666665, ans=0.5 2023-11-24 22:40:12,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453650 2023-11-24 22:40:32,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3024406.6666666665, ans=0.0 2023-11-24 22:40:33,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.979e+01 9.554e+01 1.051e+02 1.529e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-24 22:40:49,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3024473.3333333335, ans=0.125 2023-11-24 22:40:51,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3024540.0, ans=0.09899494936611666 2023-11-24 22:40:52,681 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8800, loss[loss=0.06941, simple_loss=0.08616, pruned_loss=0.0174, audio_tagging_loss=0.008927, over 14944.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09122, pruned_loss=0.01293, audio_tagging_loss=0.008997, over 3042381.83 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:40:58,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3024540.0, ans=0.125 2023-11-24 22:41:01,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=22.5 2023-11-24 22:41:09,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3024606.6666666665, ans=0.0 2023-11-24 22:41:14,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453700 2023-11-24 22:41:28,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3024673.3333333335, ans=0.1 2023-11-24 22:41:29,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3024740.0, ans=0.1 2023-11-24 22:41:56,233 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8850, loss[loss=0.07627, simple_loss=0.1116, pruned_loss=0.01278, audio_tagging_loss=0.007694, over 15428.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09175, pruned_loss=0.01283, audio_tagging_loss=0.009053, over 3048284.98 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:42:05,733 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 22:42:14,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3024940.0, ans=0.1 2023-11-24 22:42:16,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453750 2023-11-24 22:42:36,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3025073.3333333335, ans=0.125 2023-11-24 22:42:37,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.570e+01 9.101e+01 9.826e+01 1.259e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-24 22:42:57,356 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8900, loss[loss=0.07577, simple_loss=0.1055, pruned_loss=0.01691, audio_tagging_loss=0.006121, over 15286.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09188, pruned_loss=0.01289, audio_tagging_loss=0.008918, over 3048468.01 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:42:59,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3025206.6666666665, ans=0.125 2023-11-24 22:43:02,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-24 22:43:18,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453800 2023-11-24 22:43:23,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3025340.0, ans=0.125 2023-11-24 22:43:26,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3025340.0, ans=0.125 2023-11-24 22:43:43,826 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:43:59,727 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 8950, loss[loss=0.07852, simple_loss=0.1148, pruned_loss=0.0149, audio_tagging_loss=0.006223, over 15120.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09237, pruned_loss=0.01301, audio_tagging_loss=0.008781, over 3046240.16 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 32.0 2023-11-24 22:44:01,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2023-11-24 22:44:20,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3025606.6666666665, ans=0.125 2023-11-24 22:44:21,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453850 2023-11-24 22:44:23,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-24 22:44:25,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3025673.3333333335, ans=0.125 2023-11-24 22:44:43,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 8.565e+01 9.194e+01 9.956e+01 1.408e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 22:44:58,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3025806.6666666665, ans=0.125 2023-11-24 22:45:00,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2023-11-24 22:45:02,682 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9000, loss[loss=0.0563, simple_loss=0.07322, pruned_loss=0.008877, audio_tagging_loss=0.01081, over 16085.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09144, pruned_loss=0.01281, audio_tagging_loss=0.008635, over 3045320.39 frames. ], batch size: 63, lr: 1.78e-03, grad_scale: 16.0 2023-11-24 22:45:02,683 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 22:45:46,034 INFO [train_asr.py:1253] (1/4) Epoch 38, validation: loss=0.05855, simple_loss=0.05069, pruned_loss=0.005085, audio_tagging_loss=0.02812, over 4681554.00 frames. 2023-11-24 22:45:46,035 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 22:46:07,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453900 2023-11-24 22:46:27,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-24 22:46:27,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-24 22:46:47,361 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9050, loss[loss=0.06215, simple_loss=0.08139, pruned_loss=0.01322, audio_tagging_loss=0.008233, over 13660.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09034, pruned_loss=0.01263, audio_tagging_loss=0.008634, over 3045896.73 frames. ], batch size: 52, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:46:52,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3026206.6666666665, ans=0.125 2023-11-24 22:47:07,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3026273.3333333335, ans=0.025 2023-11-24 22:47:08,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 453950 2023-11-24 22:47:22,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-11-24 22:47:25,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3026406.6666666665, ans=0.125 2023-11-24 22:47:30,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.549e+01 9.034e+01 9.764e+01 1.451e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 22:47:33,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3026406.6666666665, ans=0.125 2023-11-24 22:47:36,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3026473.3333333335, ans=0.0 2023-11-24 22:47:50,109 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9100, loss[loss=0.06779, simple_loss=0.09387, pruned_loss=0.01406, audio_tagging_loss=0.006791, over 14755.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08967, pruned_loss=0.01264, audio_tagging_loss=0.008614, over 3045751.36 frames. ], batch size: 54, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:48:02,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3026606.6666666665, ans=0.125 2023-11-24 22:48:06,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-24 22:48:11,705 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454000 2023-11-24 22:48:23,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-11-24 22:48:46,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3026806.6666666665, ans=0.125 2023-11-24 22:48:49,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-24 22:48:53,403 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9150, loss[loss=0.06353, simple_loss=0.08728, pruned_loss=0.009163, audio_tagging_loss=0.01072, over 14142.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09049, pruned_loss=0.0129, audio_tagging_loss=0.008678, over 3045532.66 frames. ], batch size: 54, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:48:56,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3026873.3333333335, ans=0.1 2023-11-24 22:49:14,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454050 2023-11-24 22:49:36,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.421e+01 9.061e+01 9.734e+01 1.251e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 22:49:47,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3027140.0, ans=0.125 2023-11-24 22:49:50,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3027140.0, ans=0.2 2023-11-24 22:49:54,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3027206.6666666665, ans=0.0 2023-11-24 22:49:55,224 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9200, loss[loss=0.06678, simple_loss=0.08574, pruned_loss=0.01214, audio_tagging_loss=0.01177, over 15569.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08907, pruned_loss=0.0125, audio_tagging_loss=0.008749, over 3046228.78 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 22:50:10,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3027273.3333333335, ans=0.2 2023-11-24 22:50:16,191 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454100 2023-11-24 22:50:17,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3027273.3333333335, ans=0.125 2023-11-24 22:50:21,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3027340.0, ans=0.0 2023-11-24 22:50:22,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3027340.0, ans=0.1 2023-11-24 22:50:30,578 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:50:30,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3027340.0, ans=0.0 2023-11-24 22:50:36,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-24 22:50:42,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3027406.6666666665, ans=0.125 2023-11-24 22:50:57,383 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9250, loss[loss=0.09563, simple_loss=0.1345, pruned_loss=0.02244, audio_tagging_loss=0.005956, over 15831.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.0908, pruned_loss=0.01288, audio_tagging_loss=0.00868, over 3054288.94 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:51:08,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-24 22:51:12,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-24 22:51:18,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454150 2023-11-24 22:51:22,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3027673.3333333335, ans=0.04949747468305833 2023-11-24 22:51:24,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3027673.3333333335, ans=0.0 2023-11-24 22:51:37,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3027740.0, ans=0.125 2023-11-24 22:51:41,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.619e+01 9.253e+01 1.002e+02 1.218e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-24 22:51:46,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3027806.6666666665, ans=0.125 2023-11-24 22:51:54,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-24 22:51:58,226 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9300, loss[loss=0.049, simple_loss=0.0639, pruned_loss=0.006772, audio_tagging_loss=0.01027, over 15651.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08977, pruned_loss=0.01279, audio_tagging_loss=0.008761, over 3050354.30 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:52:06,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3027873.3333333335, ans=0.125 2023-11-24 22:52:15,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3027940.0, ans=0.1 2023-11-24 22:52:20,077 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454200 2023-11-24 22:52:52,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3028140.0, ans=0.125 2023-11-24 22:52:55,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3028140.0, ans=0.125 2023-11-24 22:53:00,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-24 22:53:00,879 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9350, loss[loss=0.077, simple_loss=0.09026, pruned_loss=0.01869, audio_tagging_loss=0.01318, over 15057.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08973, pruned_loss=0.01287, audio_tagging_loss=0.008803, over 3053251.40 frames. ], batch size: 56, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:53:02,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3028206.6666666665, ans=0.2 2023-11-24 22:53:22,139 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454250 2023-11-24 22:53:27,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3028340.0, ans=0.125 2023-11-24 22:53:29,416 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-24 22:53:37,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3028406.6666666665, ans=0.125 2023-11-24 22:53:38,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3028406.6666666665, ans=0.125 2023-11-24 22:53:43,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3028406.6666666665, ans=0.0 2023-11-24 22:53:45,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.676e+01 8.804e+01 9.416e+01 1.006e+02 1.917e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-24 22:53:51,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3028473.3333333335, ans=0.1 2023-11-24 22:53:54,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3028473.3333333335, ans=0.1 2023-11-24 22:54:03,455 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9400, loss[loss=0.0766, simple_loss=0.1003, pruned_loss=0.01779, audio_tagging_loss=0.008671, over 15370.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09025, pruned_loss=0.01293, audio_tagging_loss=0.008777, over 3053009.37 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:54:24,258 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454300 2023-11-24 22:54:27,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3028673.3333333335, ans=0.125 2023-11-24 22:54:31,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-11-24 22:54:43,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3028740.0, ans=0.0 2023-11-24 22:54:46,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3028740.0, ans=0.125 2023-11-24 22:55:01,316 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 22:55:05,007 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9450, loss[loss=0.06787, simple_loss=0.08976, pruned_loss=0.01327, audio_tagging_loss=0.009723, over 14723.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08998, pruned_loss=0.01275, audio_tagging_loss=0.008943, over 3047276.45 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:55:06,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3028873.3333333335, ans=0.2 2023-11-24 22:55:12,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3028873.3333333335, ans=0.1 2023-11-24 22:55:25,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3028940.0, ans=0.0 2023-11-24 22:55:26,441 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454350 2023-11-24 22:55:49,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.713e+01 9.194e+01 9.823e+01 1.241e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-24 22:55:55,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-24 22:55:59,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-24 22:56:02,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-24 22:56:07,692 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9500, loss[loss=0.04743, simple_loss=0.05261, pruned_loss=0.00796, audio_tagging_loss=0.01317, over 15406.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08885, pruned_loss=0.01257, audio_tagging_loss=0.009077, over 3049180.27 frames. ], batch size: 60, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:56:25,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3029273.3333333335, ans=0.0 2023-11-24 22:56:29,262 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454400 2023-11-24 22:56:33,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-11-24 22:56:48,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-24 22:56:49,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3029406.6666666665, ans=0.125 2023-11-24 22:57:04,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3029473.3333333335, ans=0.1 2023-11-24 22:57:08,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3029473.3333333335, ans=0.0 2023-11-24 22:57:10,760 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9550, loss[loss=0.09328, simple_loss=0.1268, pruned_loss=0.02163, audio_tagging_loss=0.008229, over 16032.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08939, pruned_loss=0.01261, audio_tagging_loss=0.009116, over 3050275.19 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 22:57:16,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3029540.0, ans=0.05 2023-11-24 22:57:23,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2023-11-24 22:57:26,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3029606.6666666665, ans=0.125 2023-11-24 22:57:31,106 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454450 2023-11-24 22:57:56,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.429e+01 9.044e+01 9.550e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-24 22:58:00,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=12.0 2023-11-24 22:58:00,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2023-11-24 22:58:08,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3029806.6666666665, ans=0.125 2023-11-24 22:58:09,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3029806.6666666665, ans=0.125 2023-11-24 22:58:12,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-24 22:58:12,994 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9600, loss[loss=0.06433, simple_loss=0.08121, pruned_loss=0.0112, audio_tagging_loss=0.01253, over 14947.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09026, pruned_loss=0.01275, audio_tagging_loss=0.009196, over 3059145.78 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 22:58:13,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3029873.3333333335, ans=0.125 2023-11-24 22:58:33,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454500 2023-11-24 22:58:52,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3030073.3333333335, ans=0.1 2023-11-24 22:58:54,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3030073.3333333335, ans=0.1 2023-11-24 22:58:56,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3030073.3333333335, ans=0.0 2023-11-24 22:59:14,823 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9650, loss[loss=0.08899, simple_loss=0.1289, pruned_loss=0.01901, audio_tagging_loss=0.005526, over 16581.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.09244, pruned_loss=0.01307, audio_tagging_loss=0.009016, over 3059244.51 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 22:59:19,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2023-11-24 22:59:19,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3030206.6666666665, ans=0.2 2023-11-24 22:59:20,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3030206.6666666665, ans=0.0 2023-11-24 22:59:24,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-24 22:59:30,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2023-11-24 22:59:36,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454550 2023-11-24 22:59:43,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-11-24 22:59:46,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3030340.0, ans=0.125 2023-11-24 22:59:53,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3030406.6666666665, ans=0.125 2023-11-24 23:00:01,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.723e+01 9.168e+01 1.005e+02 1.303e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-24 23:00:03,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3030473.3333333335, ans=0.0 2023-11-24 23:00:09,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3030473.3333333335, ans=0.125 2023-11-24 23:00:15,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3030473.3333333335, ans=0.1 2023-11-24 23:00:18,269 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9700, loss[loss=0.07051, simple_loss=0.09823, pruned_loss=0.01127, audio_tagging_loss=0.01013, over 14659.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09296, pruned_loss=0.01305, audio_tagging_loss=0.008845, over 3060448.06 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:00:24,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3030540.0, ans=0.1 2023-11-24 23:00:26,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3030540.0, ans=0.0 2023-11-24 23:00:38,624 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454600 2023-11-24 23:00:51,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3030673.3333333335, ans=0.125 2023-11-24 23:00:52,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:00:54,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3030740.0, ans=0.125 2023-11-24 23:01:00,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3030740.0, ans=0.0 2023-11-24 23:01:05,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3030740.0, ans=0.2 2023-11-24 23:01:17,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3030806.6666666665, ans=0.2 2023-11-24 23:01:20,983 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9750, loss[loss=0.06382, simple_loss=0.09467, pruned_loss=0.008792, audio_tagging_loss=0.00769, over 16068.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09272, pruned_loss=0.01289, audio_tagging_loss=0.008831, over 3050266.52 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:01:24,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3030873.3333333335, ans=0.2 2023-11-24 23:01:25,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3030873.3333333335, ans=0.2 2023-11-24 23:01:29,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3030873.3333333335, ans=0.125 2023-11-24 23:01:41,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454650 2023-11-24 23:01:44,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3031006.6666666665, ans=0.125 2023-11-24 23:01:46,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-11-24 23:02:06,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.357e+01 8.880e+01 9.692e+01 1.346e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-24 23:02:11,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3031140.0, ans=0.0 2023-11-24 23:02:17,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-24 23:02:19,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3031140.0, ans=0.0 2023-11-24 23:02:22,673 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9800, loss[loss=0.05899, simple_loss=0.08358, pruned_loss=0.01016, audio_tagging_loss=0.007036, over 14312.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09141, pruned_loss=0.01275, audio_tagging_loss=0.008681, over 3051336.94 frames. ], batch size: 53, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:02:28,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3031206.6666666665, ans=0.0 2023-11-24 23:02:39,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.28 vs. limit=22.5 2023-11-24 23:02:42,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3031273.3333333335, ans=0.1 2023-11-24 23:02:44,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454700 2023-11-24 23:02:59,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3031406.6666666665, ans=0.125 2023-11-24 23:03:06,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3031406.6666666665, ans=0.125 2023-11-24 23:03:10,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3031406.6666666665, ans=15.0 2023-11-24 23:03:16,390 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:03:19,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3031473.3333333335, ans=0.0 2023-11-24 23:03:23,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3031473.3333333335, ans=0.125 2023-11-24 23:03:25,427 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9850, loss[loss=0.09334, simple_loss=0.1274, pruned_loss=0.02112, audio_tagging_loss=0.008539, over 16170.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09107, pruned_loss=0.01276, audio_tagging_loss=0.008601, over 3046616.49 frames. ], batch size: 60, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:03:40,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3031606.6666666665, ans=0.125 2023-11-24 23:03:47,044 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454750 2023-11-24 23:04:10,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3031740.0, ans=0.125 2023-11-24 23:04:13,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.592e+01 9.130e+01 1.016e+02 1.240e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-24 23:04:17,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3031806.6666666665, ans=0.1 2023-11-24 23:04:28,571 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9900, loss[loss=0.07102, simple_loss=0.09447, pruned_loss=0.01451, audio_tagging_loss=0.009273, over 16461.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09123, pruned_loss=0.01275, audio_tagging_loss=0.008584, over 3048671.47 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:04:28,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3031873.3333333335, ans=0.125 2023-11-24 23:04:28,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3031873.3333333335, ans=0.2 2023-11-24 23:04:47,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=22.5 2023-11-24 23:04:49,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454800 2023-11-24 23:05:02,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=22.5 2023-11-24 23:05:31,862 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 9950, loss[loss=0.05482, simple_loss=0.07614, pruned_loss=0.008622, audio_tagging_loss=0.008127, over 14155.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09136, pruned_loss=0.01273, audio_tagging_loss=0.008591, over 3047705.50 frames. ], batch size: 56, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:05:47,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-24 23:05:53,011 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454850 2023-11-24 23:05:55,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3032340.0, ans=0.0 2023-11-24 23:06:05,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3032340.0, ans=0.125 2023-11-24 23:06:12,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3032406.6666666665, ans=0.0 2023-11-24 23:06:18,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.463e+01 8.965e+01 9.807e+01 1.516e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-24 23:06:24,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-11-24 23:06:33,795 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10000, loss[loss=0.07417, simple_loss=0.09281, pruned_loss=0.0174, audio_tagging_loss=0.01036, over 15345.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08984, pruned_loss=0.01253, audio_tagging_loss=0.008688, over 3049403.70 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:06:38,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-24 23:06:46,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3032606.6666666665, ans=0.1 2023-11-24 23:06:54,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454900 2023-11-24 23:07:35,228 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10050, loss[loss=0.04854, simple_loss=0.07123, pruned_loss=0.004364, audio_tagging_loss=0.00856, over 15225.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08926, pruned_loss=0.01233, audio_tagging_loss=0.00873, over 3044520.95 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:07:41,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-24 23:07:56,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 454950 2023-11-24 23:08:02,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3033006.6666666665, ans=0.2 2023-11-24 23:08:05,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3033006.6666666665, ans=0.0 2023-11-24 23:08:23,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.450e+01 9.036e+01 9.785e+01 1.299e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-24 23:08:27,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3033140.0, ans=0.2 2023-11-24 23:08:37,230 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10100, loss[loss=0.06291, simple_loss=0.0811, pruned_loss=0.01428, audio_tagging_loss=0.008074, over 16424.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08956, pruned_loss=0.0124, audio_tagging_loss=0.008741, over 3048575.81 frames. ], batch size: 62, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:08:37,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3033206.6666666665, ans=0.02 2023-11-24 23:08:56,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3033273.3333333335, ans=0.0 2023-11-24 23:08:59,052 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455000 2023-11-24 23:09:09,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3033340.0, ans=0.125 2023-11-24 23:09:15,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-24 23:09:26,546 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:09:40,104 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10150, loss[loss=0.05548, simple_loss=0.07797, pruned_loss=0.01046, audio_tagging_loss=0.006034, over 15537.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09021, pruned_loss=0.01252, audio_tagging_loss=0.00868, over 3052556.22 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:09:47,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3033540.0, ans=0.1 2023-11-24 23:10:01,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455050 2023-11-24 23:10:01,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3033606.6666666665, ans=0.2 2023-11-24 23:10:07,820 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:10:18,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3033740.0, ans=0.125 2023-11-24 23:10:29,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.639e+01 9.334e+01 9.887e+01 1.234e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-24 23:10:34,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3033806.6666666665, ans=0.0 2023-11-24 23:10:43,034 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10200, loss[loss=0.07945, simple_loss=0.1106, pruned_loss=0.01603, audio_tagging_loss=0.008116, over 15170.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09039, pruned_loss=0.01253, audio_tagging_loss=0.008757, over 3050257.10 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:11:04,513 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:11:04,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455100 2023-11-24 23:11:04,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3033940.0, ans=0.125 2023-11-24 23:11:14,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3034006.6666666665, ans=0.125 2023-11-24 23:11:28,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3034073.3333333335, ans=0.125 2023-11-24 23:11:30,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2023-11-24 23:11:41,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3034140.0, ans=0.125 2023-11-24 23:11:45,347 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10250, loss[loss=0.04867, simple_loss=0.06552, pruned_loss=0.007469, audio_tagging_loss=0.008443, over 15311.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08995, pruned_loss=0.01244, audio_tagging_loss=0.008785, over 3058201.78 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:12:06,498 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455150 2023-11-24 23:12:07,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2023-11-24 23:12:07,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3034273.3333333335, ans=0.0 2023-11-24 23:12:25,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3034406.6666666665, ans=0.09899494936611666 2023-11-24 23:12:29,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3034406.6666666665, ans=0.125 2023-11-24 23:12:33,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.646e+01 9.167e+01 9.835e+01 1.179e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-24 23:12:42,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3034473.3333333335, ans=0.07 2023-11-24 23:12:47,004 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10300, loss[loss=0.06982, simple_loss=0.09406, pruned_loss=0.01483, audio_tagging_loss=0.007957, over 15648.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09026, pruned_loss=0.01256, audio_tagging_loss=0.00889, over 3056785.01 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:12:47,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-24 23:12:48,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3034540.0, ans=0.0 2023-11-24 23:12:51,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3034540.0, ans=0.1 2023-11-24 23:12:53,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3034540.0, ans=0.025 2023-11-24 23:12:53,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3034540.0, ans=0.0 2023-11-24 23:12:58,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3034540.0, ans=0.125 2023-11-24 23:13:05,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-24 23:13:08,688 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455200 2023-11-24 23:13:10,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3034606.6666666665, ans=0.2 2023-11-24 23:13:10,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-11-24 23:13:27,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2023-11-24 23:13:32,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3034740.0, ans=0.0 2023-11-24 23:13:35,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3034740.0, ans=0.0 2023-11-24 23:13:39,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3034806.6666666665, ans=0.0 2023-11-24 23:13:50,468 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10350, loss[loss=0.05726, simple_loss=0.07054, pruned_loss=0.01146, audio_tagging_loss=0.01052, over 15472.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09094, pruned_loss=0.01276, audio_tagging_loss=0.008919, over 3055026.34 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:14:11,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455250 2023-11-24 23:14:13,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3034940.0, ans=0.0 2023-11-24 23:14:28,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2023-11-24 23:14:29,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3035073.3333333335, ans=0.0 2023-11-24 23:14:38,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.693e+01 9.178e+01 1.025e+02 1.318e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-24 23:14:40,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3035140.0, ans=0.125 2023-11-24 23:14:51,328 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10400, loss[loss=0.06482, simple_loss=0.08918, pruned_loss=0.01183, audio_tagging_loss=0.008403, over 16030.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09063, pruned_loss=0.01279, audio_tagging_loss=0.009096, over 3040332.83 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:15:02,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3035206.6666666665, ans=0.125 2023-11-24 23:15:12,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455300 2023-11-24 23:15:17,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3035340.0, ans=0.0 2023-11-24 23:15:51,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3035473.3333333335, ans=0.125 2023-11-24 23:15:53,551 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10450, loss[loss=0.05649, simple_loss=0.07017, pruned_loss=0.01251, audio_tagging_loss=0.008887, over 15103.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.0901, pruned_loss=0.0128, audio_tagging_loss=0.009042, over 3042152.86 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:16:05,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3035606.6666666665, ans=0.125 2023-11-24 23:16:14,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455350 2023-11-24 23:16:28,117 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:16:36,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3035740.0, ans=0.2 2023-11-24 23:16:42,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.886e+01 9.517e+01 1.018e+02 1.385e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-24 23:16:43,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3035806.6666666665, ans=0.0 2023-11-24 23:16:53,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3035806.6666666665, ans=0.5 2023-11-24 23:16:56,067 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10500, loss[loss=0.04829, simple_loss=0.06607, pruned_loss=0.007017, audio_tagging_loss=0.008244, over 15320.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08998, pruned_loss=0.01278, audio_tagging_loss=0.008878, over 3041520.28 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:16:56,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3035873.3333333335, ans=0.125 2023-11-24 23:17:16,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455400 2023-11-24 23:17:41,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3036073.3333333335, ans=0.125 2023-11-24 23:17:56,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3036140.0, ans=0.0 2023-11-24 23:17:58,251 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10550, loss[loss=0.07786, simple_loss=0.108, pruned_loss=0.01568, audio_tagging_loss=0.008162, over 15661.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09077, pruned_loss=0.0128, audio_tagging_loss=0.008691, over 3041878.66 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:18:07,420 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:18:19,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455450 2023-11-24 23:18:34,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-11-24 23:18:42,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3036406.6666666665, ans=0.125 2023-11-24 23:18:46,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 8.547e+01 9.162e+01 9.903e+01 1.156e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-24 23:18:59,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2023-11-24 23:19:00,138 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10600, loss[loss=0.0544, simple_loss=0.07269, pruned_loss=0.0089, audio_tagging_loss=0.009154, over 14437.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09083, pruned_loss=0.01281, audio_tagging_loss=0.008673, over 3047006.66 frames. ], batch size: 54, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:19:07,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3036540.0, ans=0.125 2023-11-24 23:19:22,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455500 2023-11-24 23:19:22,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-24 23:19:26,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3036673.3333333335, ans=0.5 2023-11-24 23:19:36,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3036740.0, ans=0.0 2023-11-24 23:19:39,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3036740.0, ans=0.0 2023-11-24 23:19:48,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3036740.0, ans=0.125 2023-11-24 23:19:52,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3036806.6666666665, ans=0.125 2023-11-24 23:19:53,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:19:53,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3036806.6666666665, ans=0.125 2023-11-24 23:20:03,968 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10650, loss[loss=0.07033, simple_loss=0.1054, pruned_loss=0.008065, audio_tagging_loss=0.009554, over 16035.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09069, pruned_loss=0.01294, audio_tagging_loss=0.008694, over 3045421.00 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:20:04,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3036873.3333333335, ans=0.125 2023-11-24 23:20:10,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3036873.3333333335, ans=0.125 2023-11-24 23:20:13,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3036873.3333333335, ans=0.0 2023-11-24 23:20:19,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3036940.0, ans=0.1 2023-11-24 23:20:22,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3036940.0, ans=0.125 2023-11-24 23:20:23,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3036940.0, ans=0.125 2023-11-24 23:20:24,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455550 2023-11-24 23:20:25,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3036940.0, ans=0.125 2023-11-24 23:20:52,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.714e+01 9.529e+01 1.031e+02 1.468e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-24 23:20:57,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3037140.0, ans=0.1 2023-11-24 23:21:05,485 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10700, loss[loss=0.046, simple_loss=0.05176, pruned_loss=0.008348, audio_tagging_loss=0.01177, over 14424.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09061, pruned_loss=0.01286, audio_tagging_loss=0.008718, over 3035815.37 frames. ], batch size: 56, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:21:23,968 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:21:25,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-24 23:21:26,155 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455600 2023-11-24 23:22:07,790 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10750, loss[loss=0.06756, simple_loss=0.08475, pruned_loss=0.0142, audio_tagging_loss=0.01099, over 15663.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09024, pruned_loss=0.01276, audio_tagging_loss=0.008725, over 3037279.09 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:22:11,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3037540.0, ans=0.1 2023-11-24 23:22:29,841 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455650 2023-11-24 23:22:36,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3037673.3333333335, ans=0.0 2023-11-24 23:22:38,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-24 23:22:51,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3037740.0, ans=0.125 2023-11-24 23:22:52,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3037740.0, ans=0.0 2023-11-24 23:22:56,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.544e+01 9.121e+01 9.679e+01 1.132e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-24 23:23:10,652 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10800, loss[loss=0.08044, simple_loss=0.09912, pruned_loss=0.01918, audio_tagging_loss=0.0117, over 15844.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09025, pruned_loss=0.01279, audio_tagging_loss=0.008705, over 3038855.95 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:23:23,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.43 vs. limit=22.5 2023-11-24 23:23:30,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455700 2023-11-24 23:23:49,404 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:23:57,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3038073.3333333335, ans=0.0 2023-11-24 23:24:09,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3038140.0, ans=0.125 2023-11-24 23:24:11,711 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10850, loss[loss=0.06953, simple_loss=0.09161, pruned_loss=0.01523, audio_tagging_loss=0.008496, over 15271.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09025, pruned_loss=0.01275, audio_tagging_loss=0.008668, over 3043940.46 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:24:14,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3038206.6666666665, ans=0.05 2023-11-24 23:24:17,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3038206.6666666665, ans=0.125 2023-11-24 23:24:32,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455750 2023-11-24 23:24:59,976 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.552e+01 9.146e+01 9.923e+01 1.188e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-24 23:25:07,131 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:25:13,537 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10900, loss[loss=0.08175, simple_loss=0.103, pruned_loss=0.01902, audio_tagging_loss=0.01121, over 15041.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09032, pruned_loss=0.01279, audio_tagging_loss=0.00875, over 3038678.45 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:25:18,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3038540.0, ans=0.125 2023-11-24 23:25:28,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3038606.6666666665, ans=0.0 2023-11-24 23:25:35,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455800 2023-11-24 23:25:38,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3038673.3333333335, ans=0.125 2023-11-24 23:25:59,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2023-11-24 23:26:07,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-24 23:26:09,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3038806.6666666665, ans=0.2 2023-11-24 23:26:11,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3038806.6666666665, ans=0.2 2023-11-24 23:26:16,211 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 10950, loss[loss=0.05569, simple_loss=0.07135, pruned_loss=0.01158, audio_tagging_loss=0.008432, over 13592.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09001, pruned_loss=0.01264, audio_tagging_loss=0.008767, over 3040537.66 frames. ], batch size: 53, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:26:22,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3038873.3333333335, ans=0.1 2023-11-24 23:26:27,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.41 vs. limit=10.0 2023-11-24 23:26:37,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455850 2023-11-24 23:26:37,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3038940.0, ans=0.05 2023-11-24 23:26:43,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2023-11-24 23:26:53,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3039073.3333333335, ans=0.0 2023-11-24 23:26:59,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3039073.3333333335, ans=0.125 2023-11-24 23:27:05,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.762e+01 9.251e+01 9.897e+01 1.242e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 23:27:08,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3039140.0, ans=0.125 2023-11-24 23:27:18,791 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11000, loss[loss=0.06688, simple_loss=0.08487, pruned_loss=0.01467, audio_tagging_loss=0.009774, over 15231.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08954, pruned_loss=0.01248, audio_tagging_loss=0.008805, over 3041005.51 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:27:19,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3039206.6666666665, ans=0.1 2023-11-24 23:27:26,053 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:27:38,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3039273.3333333335, ans=0.125 2023-11-24 23:27:39,748 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455900 2023-11-24 23:27:59,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3039406.6666666665, ans=0.07 2023-11-24 23:28:01,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3039406.6666666665, ans=0.125 2023-11-24 23:28:11,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3039473.3333333335, ans=0.125 2023-11-24 23:28:20,378 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11050, loss[loss=0.05242, simple_loss=0.06564, pruned_loss=0.005807, audio_tagging_loss=0.0138, over 15909.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08954, pruned_loss=0.01268, audio_tagging_loss=0.008966, over 3037139.25 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:28:29,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3039540.0, ans=0.125 2023-11-24 23:28:35,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3039606.6666666665, ans=0.1 2023-11-24 23:28:42,227 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 455950 2023-11-24 23:28:47,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2023-11-24 23:29:08,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.642e+01 9.315e+01 1.003e+02 1.536e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-24 23:29:22,458 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11100, loss[loss=0.07596, simple_loss=0.09871, pruned_loss=0.0165, audio_tagging_loss=0.01011, over 14681.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.08993, pruned_loss=0.01286, audio_tagging_loss=0.00911, over 3042744.02 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:29:31,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-24 23:29:34,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-24 23:29:44,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456000 2023-11-24 23:29:45,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3039940.0, ans=0.0 2023-11-24 23:29:57,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3040006.6666666665, ans=0.125 2023-11-24 23:30:00,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=22.5 2023-11-24 23:30:01,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.68 vs. limit=10.0 2023-11-24 23:30:11,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3040073.3333333335, ans=0.2 2023-11-24 23:30:30,062 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11150, loss[loss=0.05728, simple_loss=0.07631, pruned_loss=0.008858, audio_tagging_loss=0.01026, over 15403.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09053, pruned_loss=0.01298, audio_tagging_loss=0.009136, over 3044246.46 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:30:31,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3040206.6666666665, ans=0.0 2023-11-24 23:30:41,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3040273.3333333335, ans=0.09899494936611666 2023-11-24 23:30:50,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456050 2023-11-24 23:30:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3040273.3333333335, ans=0.1 2023-11-24 23:30:54,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3040340.0, ans=0.0 2023-11-24 23:31:00,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2023-11-24 23:31:16,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-24 23:31:18,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.591e+01 9.320e+01 9.957e+01 1.253e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-24 23:31:18,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3040473.3333333335, ans=0.125 2023-11-24 23:31:25,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3040473.3333333335, ans=0.2 2023-11-24 23:31:31,378 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11200, loss[loss=0.06716, simple_loss=0.0851, pruned_loss=0.01395, audio_tagging_loss=0.01065, over 15209.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09126, pruned_loss=0.01304, audio_tagging_loss=0.009126, over 3050254.18 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 32.0 2023-11-24 23:31:32,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3040540.0, ans=0.125 2023-11-24 23:31:38,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3040540.0, ans=0.125 2023-11-24 23:31:53,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456100 2023-11-24 23:32:04,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3040673.3333333335, ans=0.2 2023-11-24 23:32:19,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3040740.0, ans=0.125 2023-11-24 23:32:30,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3040806.6666666665, ans=0.2 2023-11-24 23:32:33,943 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11250, loss[loss=0.07194, simple_loss=0.1032, pruned_loss=0.0135, audio_tagging_loss=0.006836, over 15171.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.0911, pruned_loss=0.01293, audio_tagging_loss=0.009111, over 3051619.61 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:32:41,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3040873.3333333335, ans=0.2 2023-11-24 23:32:54,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3040940.0, ans=0.125 2023-11-24 23:32:55,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456150 2023-11-24 23:33:04,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2023-11-24 23:33:07,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3041006.6666666665, ans=0.2 2023-11-24 23:33:14,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3041073.3333333335, ans=0.0 2023-11-24 23:33:23,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.456e+01 9.058e+01 9.747e+01 1.230e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-24 23:33:34,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3041206.6666666665, ans=0.015 2023-11-24 23:33:35,952 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11300, loss[loss=0.08472, simple_loss=0.1275, pruned_loss=0.01478, audio_tagging_loss=0.006185, over 15242.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09073, pruned_loss=0.01284, audio_tagging_loss=0.009031, over 3058644.45 frames. ], batch size: 56, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:33:49,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3041273.3333333335, ans=0.025 2023-11-24 23:33:57,276 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456200 2023-11-24 23:34:11,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-24 23:34:12,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3041406.6666666665, ans=0.125 2023-11-24 23:34:37,453 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11350, loss[loss=0.05495, simple_loss=0.07635, pruned_loss=0.006809, audio_tagging_loss=0.009971, over 15866.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09066, pruned_loss=0.01281, audio_tagging_loss=0.008922, over 3058513.55 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:34:58,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456250 2023-11-24 23:34:59,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3041606.6666666665, ans=0.5 2023-11-24 23:35:03,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3041673.3333333335, ans=0.1 2023-11-24 23:35:06,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3041673.3333333335, ans=0.125 2023-11-24 23:35:21,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3041740.0, ans=0.2 2023-11-24 23:35:22,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3041740.0, ans=0.0 2023-11-24 23:35:23,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3041740.0, ans=0.125 2023-11-24 23:35:25,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=12.0 2023-11-24 23:35:27,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3041806.6666666665, ans=0.1 2023-11-24 23:35:28,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.600e+01 9.248e+01 9.972e+01 1.236e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 23:35:29,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.15 vs. limit=15.0 2023-11-24 23:35:34,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3041806.6666666665, ans=0.0 2023-11-24 23:35:35,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3041806.6666666665, ans=0.0 2023-11-24 23:35:39,395 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11400, loss[loss=0.05226, simple_loss=0.0688, pruned_loss=0.007541, audio_tagging_loss=0.01032, over 14455.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09021, pruned_loss=0.01266, audio_tagging_loss=0.008783, over 3049020.55 frames. ], batch size: 53, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:35:43,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3041873.3333333335, ans=0.2 2023-11-24 23:35:54,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-24 23:36:00,240 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456300 2023-11-24 23:36:12,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3042006.6666666665, ans=0.125 2023-11-24 23:36:14,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3042006.6666666665, ans=0.1 2023-11-24 23:36:25,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3042073.3333333335, ans=0.125 2023-11-24 23:36:30,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3042140.0, ans=0.0 2023-11-24 23:36:31,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3042140.0, ans=0.0 2023-11-24 23:36:41,451 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11450, loss[loss=0.06237, simple_loss=0.07865, pruned_loss=0.01394, audio_tagging_loss=0.009104, over 15242.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0903, pruned_loss=0.0127, audio_tagging_loss=0.008807, over 3053102.63 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:36:42,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3042206.6666666665, ans=0.125 2023-11-24 23:37:01,419 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:37:02,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456350 2023-11-24 23:37:08,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3042340.0, ans=0.0 2023-11-24 23:37:08,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3042340.0, ans=0.125 2023-11-24 23:37:28,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3042406.6666666665, ans=0.125 2023-11-24 23:37:31,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.517e+01 9.262e+01 1.018e+02 1.314e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 23:37:40,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3042473.3333333335, ans=0.125 2023-11-24 23:37:42,264 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11500, loss[loss=0.06183, simple_loss=0.08414, pruned_loss=0.009726, audio_tagging_loss=0.01004, over 15528.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09064, pruned_loss=0.01288, audio_tagging_loss=0.008862, over 3047880.05 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:37:58,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3042606.6666666665, ans=0.02 2023-11-24 23:38:03,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456400 2023-11-24 23:38:16,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3042673.3333333335, ans=0.0 2023-11-24 23:38:28,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2023-11-24 23:38:44,369 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11550, loss[loss=0.05565, simple_loss=0.0796, pruned_loss=0.007026, audio_tagging_loss=0.008822, over 16032.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08998, pruned_loss=0.01277, audio_tagging_loss=0.008902, over 3059539.43 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:38:53,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-24 23:39:01,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3042940.0, ans=0.125 2023-11-24 23:39:02,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2023-11-24 23:39:04,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3042940.0, ans=0.125 2023-11-24 23:39:05,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456450 2023-11-24 23:39:17,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-24 23:39:18,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3043006.6666666665, ans=0.0 2023-11-24 23:39:19,199 WARNING [train_asr.py:1462] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-24 23:39:23,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3043073.3333333335, ans=0.125 2023-11-24 23:39:27,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3043073.3333333335, ans=0.125 2023-11-24 23:39:34,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.770e+01 9.475e+01 1.009e+02 1.231e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-24 23:39:34,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3043140.0, ans=0.1 2023-11-24 23:39:45,844 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11600, loss[loss=0.08845, simple_loss=0.1208, pruned_loss=0.02054, audio_tagging_loss=0.007497, over 16312.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09053, pruned_loss=0.01287, audio_tagging_loss=0.00892, over 3062986.69 frames. ], batch size: 60, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:40:06,292 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456500 2023-11-24 23:40:39,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3043473.3333333335, ans=0.125 2023-11-24 23:40:45,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3043540.0, ans=0.0 2023-11-24 23:40:46,562 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11650, loss[loss=0.07477, simple_loss=0.1012, pruned_loss=0.01599, audio_tagging_loss=0.008178, over 15147.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09109, pruned_loss=0.01299, audio_tagging_loss=0.008899, over 3060770.08 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:40:47,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3043540.0, ans=0.0 2023-11-24 23:41:03,539 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:41:05,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3043606.6666666665, ans=0.0 2023-11-24 23:41:08,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456550 2023-11-24 23:41:09,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3043606.6666666665, ans=0.2 2023-11-24 23:41:20,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3043673.3333333335, ans=0.125 2023-11-24 23:41:20,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3043673.3333333335, ans=0.125 2023-11-24 23:41:24,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3043740.0, ans=0.125 2023-11-24 23:41:36,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.725e+01 9.251e+01 1.014e+02 1.338e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-24 23:41:45,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3043806.6666666665, ans=0.0 2023-11-24 23:41:48,074 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11700, loss[loss=0.08336, simple_loss=0.1159, pruned_loss=0.01656, audio_tagging_loss=0.008872, over 15544.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09163, pruned_loss=0.01308, audio_tagging_loss=0.008907, over 3050403.55 frames. ], batch size: 57, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:41:55,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=12.0 2023-11-24 23:42:09,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456600 2023-11-24 23:42:12,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3044006.6666666665, ans=0.1 2023-11-24 23:42:29,037 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:42:34,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3044073.3333333335, ans=0.125 2023-11-24 23:42:39,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3044140.0, ans=0.05 2023-11-24 23:42:42,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3044140.0, ans=0.1 2023-11-24 23:42:50,839 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11750, loss[loss=0.06604, simple_loss=0.09153, pruned_loss=0.01107, audio_tagging_loss=0.009203, over 15025.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09124, pruned_loss=0.01301, audio_tagging_loss=0.00891, over 3050559.00 frames. ], batch size: 58, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:42:52,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3044206.6666666665, ans=0.125 2023-11-24 23:42:59,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3044206.6666666665, ans=0.125 2023-11-24 23:43:11,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456650 2023-11-24 23:43:23,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3044340.0, ans=0.125 2023-11-24 23:43:30,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3044406.6666666665, ans=0.1 2023-11-24 23:43:31,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3044406.6666666665, ans=0.125 2023-11-24 23:43:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3044473.3333333335, ans=0.125 2023-11-24 23:43:42,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.689e+01 9.260e+01 1.011e+02 1.469e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-24 23:43:48,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3044473.3333333335, ans=0.125 2023-11-24 23:43:51,907 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11800, loss[loss=0.0709, simple_loss=0.0947, pruned_loss=0.01538, audio_tagging_loss=0.008177, over 15461.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09075, pruned_loss=0.0129, audio_tagging_loss=0.008939, over 3051176.98 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:44:12,663 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456700 2023-11-24 23:44:14,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2023-11-24 23:44:26,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3044673.3333333335, ans=0.125 2023-11-24 23:44:36,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3044740.0, ans=0.0 2023-11-24 23:44:49,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3044806.6666666665, ans=0.5 2023-11-24 23:44:52,994 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11850, loss[loss=0.05547, simple_loss=0.0827, pruned_loss=0.007309, audio_tagging_loss=0.006814, over 15525.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09014, pruned_loss=0.01296, audio_tagging_loss=0.009071, over 3048681.97 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:44:53,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3044873.3333333335, ans=0.125 2023-11-24 23:45:14,822 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456750 2023-11-24 23:45:23,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3045006.6666666665, ans=0.125 2023-11-24 23:45:29,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-11-24 23:45:41,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3045140.0, ans=0.125 2023-11-24 23:45:44,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.737e+01 9.408e+01 9.939e+01 1.281e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-24 23:45:54,843 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11900, loss[loss=0.07073, simple_loss=0.09248, pruned_loss=0.01398, audio_tagging_loss=0.01052, over 16772.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09047, pruned_loss=0.0129, audio_tagging_loss=0.00916, over 3059654.45 frames. ], batch size: 61, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:46:05,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3045206.6666666665, ans=0.0 2023-11-24 23:46:12,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-24 23:46:14,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3045273.3333333335, ans=0.2 2023-11-24 23:46:15,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-24 23:46:15,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456800 2023-11-24 23:46:21,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-24 23:46:23,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3045340.0, ans=0.1 2023-11-24 23:46:24,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2023-11-24 23:46:28,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3045340.0, ans=0.125 2023-11-24 23:46:56,501 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 11950, loss[loss=0.06498, simple_loss=0.0785, pruned_loss=0.01493, audio_tagging_loss=0.01081, over 15149.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09022, pruned_loss=0.01288, audio_tagging_loss=0.009232, over 3053905.60 frames. ], batch size: 59, lr: 1.77e-03, grad_scale: 8.0 2023-11-24 23:47:15,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3045606.6666666665, ans=0.125 2023-11-24 23:47:17,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456850 2023-11-24 23:47:22,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3045673.3333333335, ans=0.1 2023-11-24 23:47:39,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3045740.0, ans=0.0 2023-11-24 23:47:40,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3045740.0, ans=0.1 2023-11-24 23:47:43,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3045740.0, ans=0.2 2023-11-24 23:47:47,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.455e+01 9.152e+01 9.873e+01 1.157e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-24 23:47:52,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3045806.6666666665, ans=0.0 2023-11-24 23:47:56,417 INFO [train_asr.py:1221] (1/4) Epoch 38, batch 12000, loss[loss=0.06666, simple_loss=0.08826, pruned_loss=0.01422, audio_tagging_loss=0.008309, over 14625.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09095, pruned_loss=0.0129, audio_tagging_loss=0.009152, over 3056521.11 frames. ], batch size: 55, lr: 1.77e-03, grad_scale: 16.0 2023-11-24 23:47:56,418 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 23:48:21,744 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8992, 2.3449, 2.7167, 2.5501], device='cuda:1') 2023-11-24 23:48:40,307 INFO [train_asr.py:1253] (1/4) Epoch 38, validation: loss=0.05738, simple_loss=0.0508, pruned_loss=0.005195, audio_tagging_loss=0.02678, over 4681554.00 frames. 2023-11-24 23:48:40,308 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 23:48:41,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-24 23:48:46,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-11-24 23:48:59,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456900 2023-11-24 23:49:37,956 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 0, loss[loss=0.08627, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.01828, over 15635.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1088, pruned_loss=0.01357, audio_tagging_loss=0.01828, over 15635.00 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:49:37,956 INFO [train_asr.py:1244] (1/4) Computing validation loss 2023-11-24 23:50:14,445 INFO [train_asr.py:1253] (1/4) Epoch 39, validation: loss=0.0578, simple_loss=0.05083, pruned_loss=0.005244, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-24 23:50:14,446 INFO [train_asr.py:1254] (1/4) Maximum memory allocated so far is 25607MB 2023-11-24 23:50:14,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046020.0, ans=0.1 2023-11-24 23:50:15,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3046020.0, ans=0.0 2023-11-24 23:50:32,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3046086.6666666665, ans=0.125 2023-11-24 23:50:36,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3046086.6666666665, ans=0.2 2023-11-24 23:50:38,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3046153.3333333335, ans=0.125 2023-11-24 23:50:41,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3046153.3333333335, ans=0.0 2023-11-24 23:50:50,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-24 23:51:09,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-24 23:51:13,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=22.5 2023-11-24 23:51:15,699 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 50, loss[loss=0.05854, simple_loss=0.06949, pruned_loss=0.008617, audio_tagging_loss=0.01517, over 15517.00 frames. ], tot_loss[loss=0.07263, simple_loss=0.08562, pruned_loss=0.01227, audio_tagging_loss=0.01755, over 682278.77 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:51:21,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-24 23:51:39,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.557e+01 1.022e+02 1.093e+02 1.810e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-24 23:51:52,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-24 23:52:11,438 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-24 23:52:18,481 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 100, loss[loss=0.0629, simple_loss=0.06519, pruned_loss=0.0115, audio_tagging_loss=0.0188, over 15076.00 frames. ], tot_loss[loss=0.07313, simple_loss=0.08825, pruned_loss=0.01244, audio_tagging_loss=0.01656, over 1208240.38 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:52:18,868 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:52:20,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-24 23:52:31,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046753.3333333335, ans=0.1 2023-11-24 23:52:34,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=22.5 2023-11-24 23:52:38,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3046753.3333333335, ans=0.0 2023-11-24 23:52:40,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3046753.3333333335, ans=0.125 2023-11-24 23:53:13,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3046953.3333333335, ans=0.125 2023-11-24 23:53:14,573 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-24 23:53:14,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3046953.3333333335, ans=0.125 2023-11-24 23:53:17,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3046953.3333333335, ans=0.0 2023-11-24 23:53:21,496 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 150, loss[loss=0.08143, simple_loss=0.1111, pruned_loss=0.01644, audio_tagging_loss=0.009435, over 15227.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.08966, pruned_loss=0.01267, audio_tagging_loss=0.01497, over 1618943.79 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:53:39,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2023-11-24 23:53:45,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.772e+01 8.928e+01 9.394e+01 1.010e+02 1.309e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-24 23:54:18,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-24 23:54:24,057 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 200, loss[loss=0.07414, simple_loss=0.1042, pruned_loss=0.01453, audio_tagging_loss=0.007518, over 13762.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09276, pruned_loss=0.01308, audio_tagging_loss=0.0131, over 1937467.52 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:55:04,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.75 vs. limit=10.0 2023-11-24 23:55:19,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-24 23:55:22,770 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-24 23:55:26,003 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 250, loss[loss=0.07736, simple_loss=0.1069, pruned_loss=0.01512, audio_tagging_loss=0.00877, over 15077.00 frames. ], tot_loss[loss=0.07175, simple_loss=0.09328, pruned_loss=0.01326, audio_tagging_loss=0.01186, over 2183155.01 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:55:34,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3047686.6666666665, ans=0.2 2023-11-24 23:55:42,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3047753.3333333335, ans=0.2 2023-11-24 23:55:50,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.824e+01 9.475e+01 1.039e+02 1.300e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-24 23:55:53,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3047820.0, ans=0.0 2023-11-24 23:56:13,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3047886.6666666665, ans=0.2 2023-11-24 23:56:19,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-24 23:56:21,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-24 23:56:21,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3047953.3333333335, ans=0.02 2023-11-24 23:56:28,290 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 300, loss[loss=0.07268, simple_loss=0.09924, pruned_loss=0.01484, audio_tagging_loss=0.008216, over 15846.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.09262, pruned_loss=0.01311, audio_tagging_loss=0.01098, over 2375918.68 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:56:33,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3048020.0, ans=0.2 2023-11-24 23:57:13,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3048220.0, ans=0.125 2023-11-24 23:57:20,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3048286.6666666665, ans=0.0 2023-11-24 23:57:24,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-24 23:57:27,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3048286.6666666665, ans=0.04949747468305833 2023-11-24 23:57:30,464 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 350, loss[loss=0.07952, simple_loss=0.1088, pruned_loss=0.01785, audio_tagging_loss=0.007263, over 15340.00 frames. ], tot_loss[loss=0.07014, simple_loss=0.09339, pruned_loss=0.01316, audio_tagging_loss=0.01028, over 2523160.65 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:57:30,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3048353.3333333335, ans=0.2 2023-11-24 23:57:32,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048353.3333333335, ans=0.125 2023-11-24 23:57:35,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-24 23:57:43,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-24 23:57:56,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.715e+01 9.321e+01 9.899e+01 1.393e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-24 23:58:08,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3048553.3333333335, ans=0.04949747468305833 2023-11-24 23:58:25,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-24 23:58:27,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3048620.0, ans=0.0 2023-11-24 23:58:29,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3048620.0, ans=0.125 2023-11-24 23:58:32,352 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 400, loss[loss=0.05069, simple_loss=0.06896, pruned_loss=0.008187, audio_tagging_loss=0.008021, over 15973.00 frames. ], tot_loss[loss=0.06894, simple_loss=0.09218, pruned_loss=0.01291, audio_tagging_loss=0.009934, over 2641171.11 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:58:40,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3048686.6666666665, ans=0.0 2023-11-24 23:58:50,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-24 23:59:01,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-24 23:59:07,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3048886.6666666665, ans=0.125 2023-11-24 23:59:22,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-24 23:59:26,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-24 23:59:27,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-24 23:59:33,734 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 450, loss[loss=0.06953, simple_loss=0.0947, pruned_loss=0.01183, audio_tagging_loss=0.01035, over 14756.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09194, pruned_loss=0.01287, audio_tagging_loss=0.009677, over 2729127.75 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-24 23:59:35,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3049020.0, ans=0.2 2023-11-25 00:00:00,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 8.552e+01 9.288e+01 9.905e+01 1.638e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-25 00:00:00,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3049153.3333333335, ans=0.125 2023-11-25 00:00:02,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3049153.3333333335, ans=0.2 2023-11-25 00:00:16,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3049220.0, ans=0.1 2023-11-25 00:00:30,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-25 00:00:36,778 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 500, loss[loss=0.06859, simple_loss=0.09786, pruned_loss=0.01228, audio_tagging_loss=0.007376, over 16455.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09202, pruned_loss=0.0129, audio_tagging_loss=0.009367, over 2804365.24 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 00:00:55,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3049420.0, ans=0.1 2023-11-25 00:00:57,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3049420.0, ans=0.125 2023-11-25 00:00:57,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-25 00:01:30,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3049620.0, ans=0.0 2023-11-25 00:01:32,579 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-25 00:01:39,078 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 550, loss[loss=0.05964, simple_loss=0.0817, pruned_loss=0.01055, audio_tagging_loss=0.008242, over 14690.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09018, pruned_loss=0.01248, audio_tagging_loss=0.009342, over 2856755.40 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 00:01:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3049686.6666666665, ans=0.0 2023-11-25 00:01:41,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-11-25 00:02:05,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.951e+01 8.616e+01 9.440e+01 1.005e+02 1.281e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 00:02:13,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3049820.0, ans=0.2 2023-11-25 00:02:17,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-25 00:02:35,465 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-25 00:02:41,278 INFO [train_asr.py:1221] (1/4) Epoch 39, batch 600, loss[loss=0.04701, simple_loss=0.06012, pruned_loss=0.006858, audio_tagging_loss=0.01009, over 16490.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09124, pruned_loss=0.01268, audio_tagging_loss=0.009123, over 2905622.83 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 00:02:42,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3050020.0, ans=0.5 2023-11-25 00:02:43,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=22.5